ISSS608 2016-17 T1 Assign2 Nguyen Tien Duong Implementation

From Visual Analytics and Applications
Jump to navigation Jump to search

Vaa1.jpg ISSS608 2016-17 Assignment 2 - Nguyen Tien Duong


Data Preparation

Mapping index with values

  • The below dimensions results are "digitalize" with index 1, 2, 3..., which is hard to read for human brain. Therefore, they are mapped back accordingly to its definition.
GENDER: 0=Male; 1=Female
DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics; 6=Social Science
PhD: 0=No; 1=Yes
UNIVERSITY: 1=UOC; 2=UPF
UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
OTHER (main job in another university for part-time members): 1=Yes; 2=No
OTHER_POSITION (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
USERWIKI (Wikipedia registered user): 0=No; 1=Yes 

Data Cleanse

DOMAIN:

  • There are 5 available definitions, however, data showed index number 6. It is either a missing definition or missing value.
  • Expanding research to different original Wiki4HE project, it is found that index 6 is Social Science Domain. Therefore, assign "6" to "Social Science"

MISSING DATA:

  • Missing data is not a major part of the whole set. It may infer the person's interest of not to answer. So we retain missing data as it is.

Data transformation

  • Continuous, Integer Data: Difference approaches are considered to bin Continous data. However, to ensure a fair comparison, a fixed range-binning was used to create bin for Continous data.
*AGE: 5-year-bin
*EXPERIENCE: 5-year-bin

Header name

  • For the seek of interpretation, header names are transformed by adding short description.
  • There are 43 questions that coded with abbreviation codes such as "PU1, PU2…" which is hard to interpret and not userfriendly.
  • In order to provide the sense of data to reader faster, enable user to get the idea of what the questions are all about without flipping back and ford the variable dictionary, a short description is added.
  • By nature, human brand is not functioning well to match too many information not in the same page. Furthermore, too wordy desciption is also distract the focus of users. Therefore, a short, interpretable description is useful to remind users about the questions yet not too wordy.
Original Header Short Desciption New Header
PU1 DEV_STUD_SKILL PU1-DEV_STUD_SKILL
PU2 IMPRV_STUD_LEARN PU2-IMPRV_STUD_LEARN
PU3 USEFUL_FOR_TEACH PU3-USEFUL_FOR_TEACH
PEU1 USER_FRIENDLY PEU1-USER_FRIENDLY
PEU2 EZ_FIND_INFO PEU2-EZ_FIND_INFO
PEU3 EZ_EDIT_INFO PEU3-EZ_EDIT_INFO
ENJ1 STIMULATE_CERIOSITY ENJ1-STIMULATE_CERIOSITY
ENJ2 ENTERTAINING ENJ2-ENTERTAINING
Qu1 CONTENT_RELIABLE Qu1-CONTENT_RELIABLE
Qu2 CONTENT_UPDATED Qu2-CONTENT_UPDATED
Qu3 CONTENT_COMPRESV Qu3-CONTENT_COMPRESV
Qu4 CONTENT_LOWER_QUALITY Qu4-CONTENT_LOWER_QUALITY
Qu5 CONTENT_EDIT_TRUST Qu5-CONTENT_EDIT_TRUST
Vis1 VIZ_IMPROV Vis1-VIZ_IMPROV
Vis2 EZ_RECORD_CONTRIB Vis2-EZ_RECORD_CONTRIB
Vis3 USED_CITE_PAPER Vis3-USED_CITE_PAPER
Im1 POPULAR_COLLEAGUES Im1-POPULAR_COLLEAGUES
Im2 APPREC_EDU_SHARE Im2-APPREC_EDU_SHARE
Im3 COLLEAGUES_USING Im3-COLLEAGUES_USING
SA1 IMPT_SHARE_ACADEMIC SA1-IMPT_SHARE_ACADEMIC
SA2 IMPT_SHARE_RESEARCH SA2-IMPT_SHARE_RESEARCH
SA3 IMPT_STUD_ONLINE_COL SA3-IMPT_STUD_ONLINE_COL
Use1 USED_TO_TEACH_MTRIAL Use1-USED_TO_TEACH_MTRIAL
Use2 USED_TO_DEV_ACTIV Use2-USED_TO_DEV_ACTIV
Use3 REC_STUD_USE Use3-REC_STUD_USE
Use4 REC_COLL_USE Use4-REC_COLL_USE
Use5 AGGREE_STUD_USE Use5-AGGREE_STUD_USE
Pf1 CONTRIB_BLOG Pf1-CONTRIB_BLOG
Pf2 CONTRIB_SOCIAL_NET Pf2-CONTRIB_SOCIAL_NET
Pf3 PUBLISH_ACAD_OPEN_PLAT Pf3-PUBLISH_ACAD_OPEN_PLAT
JR1 UNIV_PROMOTE_OPEN_COLLAB JR1-UNIV_PROMOTE_OPEN_COLLAB
JR2 UNIV_CONSIDER_OPEN_COLLAB JR2-UNIV_CONSIDER_OPEN_COLLAB
BI1 FUTURE_REC_USE_COLL_STUD BI1-FUTURE_REC_USE_COLL_STUD
BI2 FUTURE_USE_TEACH BI2-FUTURE_USE_TEACH
Inc1 HELPFUL_BEST_PRAC Inc1-HELPFUL_BEST_PRAC
Inc2 HELPFUL_GET_INST_COLL Inc2-HELPFUL_GET_INST_COLL
Inc3 HELPFUL_GET_TRAIN Inc3-HELPFUL_GET_TRAIN
Inc4 HELPFUL_INSTITUTION_RECGN Inc4-HELPFUL_INSTITUTION_RECGN
Exp1 CONSULT_EXPERTISE Exp1-CONSULT_EXPERTISE
Exp2 CONSULT_ACAD_ISSUES Exp2-CONSULT_ACAD_ISSUES
Exp3 CONSULT_PERSONAL_ISSUES Exp3-CONSULT_PERSONAL_ISSUES
Exp4 CONTRIB_WIKI Exp4-CONTRIB_WIKI
Exp5 USING_WORK_WITH_STUD Exp5-USING_WORK_WITH_STUD

Application Deployment

To make a decision for deployment options, there are 3 main choices:

  • Run a stand alone server:
+ Server run with all flexibility of deployment and wide range of technologies can be applied
- Domain Cost ~$10/year
- Hosting Cost ~$20-100/year
- May not be able to maintain in long-life
  • Run a Free server:
+ Server run with all flexibility/or some limit of deployment and wide range of technologies can be applied
- Server uptime is usually very bad
- Domain Cost ~$0/year
- Hosting Cost ~$0/year
- May not be able to maintain in long-life
  • Deploy to github:
+ Domain Cost $0/year
+ Hosting Cost $0/year
+ Sustain in long-life
+ Portfolio on GitHub
- Restriction to only html, js
- No customize domain

Therefore, at the end, GitHub is selected as the deployment solution.