ISSS608 2016-17 T1 Assign2 Thian Fong Mei

From Visual Analytics and Applications
Revision as of 11:25, 26 September 2016 by Fmthian.2015 (talk | contribs)
Jump to navigation Jump to search

Abstract

This assignment delves into data discovery of high dimensional Data, using Visual Analytics techniques and methods. The data set used here is the wiki4HE Data Set (https://archive.ics.uci.edu/ml/datasets/wiki4HE)

Problem and Motivation

Wikipedia is an open collaboration model. There are certain reservations towards the use of Wikipedia in academia, as it was commonly perceived as a “flawed knowledge community” and "a collaboratively generated encyclopedia (which) cannot meet the high standards of quality." The said data set relates to the research/survey on university faculty perceptions and practices of using Wikipedia as a teaching resource, and taps on the Technology Acceptance Model (TAM) to examines the interrelationships. It was said from the Wiki4HE webpage that the "both the perception of colleagues' opinions about Wikipedia and the perceived quality of the information in Wikipedia play a central role in the obtained model".

Theme of Interest

This assignment does not seek to validate the TAM, but seeks to uncover highlights of the results, and the relationship between the profile of the faculty survey participants and the results.

The below image attempts to classify the survey categories according to the TAM model. One would attempt to use the below to assist in the discovery process. Tfm TAM.png

Approach

Data Source

The data source used wiki4HE Data Set (https://archive.ics.uci.edu/ml/datasets/wiki4HE). There are 913 records/survey participants, with 53 variables. 10 of the variables are user profile information. The 43 survey questions variables are classified into 13 main categories.

Data Preparation & Approach

Data preparation is mainly done with JMP, with some analysis conducted in JMP.

1st iteration: The data is imported into JMP for a first round of exploration. The GENDER,DOMAIN, PhD, YEARSEXP,UNIVERSITY, UOC_POSITION, OTHER_POSITION, OTHERSTATUS and USERWIKI variables are recoded from nominal score to categorical names so to make these user profile variables more meaningful, and easier to interpret. Missing values of profile related variables are recoded as "Unknown" accordingly. It is observed that the OTHER_POSITION, OTHERSTATUS variable names are swapped when the attribute information given is compared. These 2 variable names are changed to better reflect the variables. Univariate/Bivariate/Multivariate distribution, Scatterplot and Graph Builder are run to provide an overview of the data's distribution.

2nd iteration: Missing values in the survey questions are recoded as 0 in the interim to denote no response. The individual survey questions scoring are totaled up under 13 categories (where 13 additional columns are created).Ternary plots are run to for observation of any association/pattern between any 3 variables. Parallel plots are also run to check for relations between each survey question. Secondary tools like Mondrian, Treemap, High-D are also used.

The work from 1st and 2nd iteration do not prove to be very useful.

3rd iteration: The survey questions are transposed/stacked in JMP, with survey faculty participant tagged with a new creation of a column ID. Additional 2 columns are created for survey category and recoding of numeric survey responses to nominal/text. Subsequently, a metadata is also created for the survey question wording for greater clarity. The age and years of experience are also binned in bins of 5. In this 3rd iteration, Tableau is used to create divergent bar chart, heatmap and treemap.

Findings

The interactive visualization, using Tableau can be found in the following link. (https://public.tableau.com/profile/christine.thian#!/vizhome/Wiki4HESurvey/FacultyProfile)

Who took the survey: Faculty Participants Profile

Faculty Profile

  • Male participants forms 57.5% of survey participants.
  • Majority - 87.62% of them come from UOC.
  • 46.44% of them have PhD.
  • Majority - 85.87% of them are not WikiUsers.
  • The biggest age group - 23.3% of them comes from the 40-44 age range. The age groups generally follows a normal distribution.
  • Close to half of the participants (46.22%) have less than 10 years of experience.
  • The biggest domain group - Others - accounts for 39.54% of the participants, followed by Arts & Humanities (20.04%), Engineering & Architecture (15.01%).

Tfm Faculty Profile.png

Faculty Position

  • Majority - 87.62% of them come from UOC. 12.38% comes from UPF.
  • Most of the UOC staff are adjunct staff, with 29.35% of them having main job in another university. Those denoted with Unknown UOC position comes from UPF university.
  • For those are UPF members, and work part time in another university, those with unknown positions stood at 59.15%.

Tfm Faculty Position.png