ISSS608 2016-17 T1 Assign2 Nguyen Tien Duong Implementation
Contents
Data Preparation
Mapping index with values
- The below dimensions results are "digitalize" with index 1, 2, 3..., which is hard to read for human brain. Therefore, they are mapped back accordingly to its definition.
GENDER: 0=Male; 1=Female DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics; 6=Social Science PhD: 0=No; 1=Yes UNIVERSITY: 1=UOC; 2=UPF UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct OTHER (main job in another university for part-time members): 1=Yes; 2=No OTHER_POSITION (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct USERWIKI (Wikipedia registered user): 0=No; 1=Yes
Data Cleanse
DOMAIN:
- There are 5 available definitions, however, data showed index number 6. It is either a missing definition or missing value.
- Expanding research to different original Wiki4HE project, it is found that index 6 is Social Science Domain. Therefore, assign "6" to "Social Science"
MISSING DATA:
- Missing data is not a major part of the whole set. It may infer the person's interest of not to answer. So we retain missing data as it is.
Data transformation
- Continuous, Integer Data: Difference approaches are considered to bin Continous data. However, to ensure a fair comparison, a fixed range-binning was used to create bin for Continous data.
- *AGE: 5-year-bin
- *EXPERIENCE: 5-year-bin
Header name
- For the seek of interpretation, header names are transformed by adding short description.
- There are 43 questions that coded with abbreviation codes such as "PU1, PU2…" which is hard to interpret and not userfriendly.
- In order to provide the sense of data to reader faster, enable user to get the idea of what the questions are all about without flipping back and ford the variable dictionary, a short description is added.
- By nature, human brand is not functioning well to match too many information not in the same page. Furthermore, too wordy desciption is also distract the focus of users. Therefore, a short, interpretable description is useful to remind users about the questions yet not too wordy.
Original Header | Short Desciption | New Header |
---|---|---|
PU1 | DEV_STUD_SKILL | PU1-DEV_STUD_SKILL |
PU2 | IMPRV_STUD_LEARN | PU2-IMPRV_STUD_LEARN |
PU3 | USEFUL_FOR_TEACH | PU3-USEFUL_FOR_TEACH |
PEU1 | USER_FRIENDLY | PEU1-USER_FRIENDLY |
PEU2 | EZ_FIND_INFO | PEU2-EZ_FIND_INFO |
PEU3 | EZ_EDIT_INFO | PEU3-EZ_EDIT_INFO |
ENJ1 | STIMULATE_CERIOSITY | ENJ1-STIMULATE_CERIOSITY |
ENJ2 | ENTERTAINING | ENJ2-ENTERTAINING |
Qu1 | CONTENT_RELIABLE | Qu1-CONTENT_RELIABLE |
Qu2 | CONTENT_UPDATED | Qu2-CONTENT_UPDATED |
Qu3 | CONTENT_COMPRESV | Qu3-CONTENT_COMPRESV |
Qu4 | CONTENT_LOWER_QUALITY | Qu4-CONTENT_LOWER_QUALITY |
Qu5 | CONTENT_EDIT_TRUST | Qu5-CONTENT_EDIT_TRUST |
Vis1 | VIZ_IMPROV | Vis1-VIZ_IMPROV |
Vis2 | EZ_RECORD_CONTRIB | Vis2-EZ_RECORD_CONTRIB |
Vis3 | USED_CITE_PAPER | Vis3-USED_CITE_PAPER |
Im1 | POPULAR_COLLEAGUES | Im1-POPULAR_COLLEAGUES |
Im2 | APPREC_EDU_SHARE | Im2-APPREC_EDU_SHARE |
Im3 | COLLEAGUES_USING | Im3-COLLEAGUES_USING |
SA1 | IMPT_SHARE_ACADEMIC | SA1-IMPT_SHARE_ACADEMIC |
SA2 | IMPT_SHARE_RESEARCH | SA2-IMPT_SHARE_RESEARCH |
SA3 | IMPT_STUD_ONLINE_COL | SA3-IMPT_STUD_ONLINE_COL |
Use1 | USED_TO_TEACH_MTRIAL | Use1-USED_TO_TEACH_MTRIAL |
Use2 | USED_TO_DEV_ACTIV | Use2-USED_TO_DEV_ACTIV |
Use3 | REC_STUD_USE | Use3-REC_STUD_USE |
Use4 | REC_COLL_USE | Use4-REC_COLL_USE |
Use5 | AGGREE_STUD_USE | Use5-AGGREE_STUD_USE |
Pf1 | CONTRIB_BLOG | Pf1-CONTRIB_BLOG |
Pf2 | CONTRIB_SOCIAL_NET | Pf2-CONTRIB_SOCIAL_NET |
Pf3 | PUBLISH_ACAD_OPEN_PLAT | Pf3-PUBLISH_ACAD_OPEN_PLAT |
JR1 | UNIV_PROMOTE_OPEN_COLLAB | JR1-UNIV_PROMOTE_OPEN_COLLAB |
JR2 | UNIV_CONSIDER_OPEN_COLLAB | JR2-UNIV_CONSIDER_OPEN_COLLAB |
BI1 | FUTURE_REC_USE_COLL_STUD | BI1-FUTURE_REC_USE_COLL_STUD |
BI2 | FUTURE_USE_TEACH | BI2-FUTURE_USE_TEACH |
Inc1 | HELPFUL_BEST_PRAC | Inc1-HELPFUL_BEST_PRAC |
Inc2 | HELPFUL_GET_INST_COLL | Inc2-HELPFUL_GET_INST_COLL |
Inc3 | HELPFUL_GET_TRAIN | Inc3-HELPFUL_GET_TRAIN |
Inc4 | HELPFUL_INSTITUTION_RECGN | Inc4-HELPFUL_INSTITUTION_RECGN |
Exp1 | CONSULT_EXPERTISE | Exp1-CONSULT_EXPERTISE |
Exp2 | CONSULT_ACAD_ISSUES | Exp2-CONSULT_ACAD_ISSUES |
Exp3 | CONSULT_PERSONAL_ISSUES | Exp3-CONSULT_PERSONAL_ISSUES |
Exp4 | CONTRIB_WIKI | Exp4-CONTRIB_WIKI |
Exp5 | USING_WORK_WITH_STUD | Exp5-USING_WORK_WITH_STUD |