ISSS608 2016-17 T1 Assign2 Ye Jiatao
Contents
Abstract
Wikipedia is a multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content. Anyone can share their knowledge and insight through Wikipedia, which also make it a very useful tool for education purpose. Teachers can design educational activities, sharing teaching materialist and searching specific information using Wikipedia. In this project, we will use data visualization to explore the wikiHE4 data set and try to deliver some useful insight to our readers. The theme of this case focus on the popularity and usefulness of Wikipedia among different user groups.
Problems
In this project, we will mainly answer several questions relating to the theme mentioned above.
- What is the popularity of Wikipedia among different user groups?
- What about the different user groups' attitude toward usefulness and ease of use of Wikipedia?
- Is there a correlation between social image and behavioral intention?
Data-set
The Data set WikiHE4 is from UCI which extracted from a survey of faculty members from two Spanish universities on teaching uses of Wikipedia. The data-set mainly consists of 2 parts: the first part is about demographic information in terms of each participant, the second part is the result of a series of Likert Scale Questions regarding to a wide range of user experience of Wikipedia. The raw data-set is .csv format as showed below.
Approaches
Data Preparation
Before using the data to perform visualization task, we need to carefully clean and reshape the data into appropriate format. In this case, we cannot directly use the original data-set in tableau, because most of the dimensions are related to Likert Scale Questions. In addition, to make our visualization more friendly to readers, we also need to map the raw data with original meaning using data dictionary. The detailed data preparation processes as below.
- Using excel to separate the .csv data into corresponding columns.
- Mapping the demographic variables with readable value using data dictionary.
- Combining variables "UOC_POSITON" and "OTHERSTATUS" to derive a new master variable "POSITON", which indicate each participant's occupational title. The logic of "POSITION" as below.
IF("UOC_POSITION=='?'"){ POSITION=UOC_POSITION } ELSE{ POSITION=OTHERSTATUS }
- Giving each row one unique ID.
- Delete 5 rows whose value of "POSITON" are unknown.
- Using excel tableau add-in to reshape data-set, which separate one row into multiple rows according to Likert Scale Questions.
Popularity of Wikipedia
Click to access the interactive data viz application.
https://public.tableau.com/views/assignment_02/Sheet5?:embed=y&:display_count=yes
http://yejiatao.com/parallel-set/
In this step, we want to get some insights about using condition of Wikipedia among different user segments. There is a wide range of demographic dimensions in the data-set which can be used in users segmentation. Firstly, we use JMP to explore the data. From the mosaic plot below, we can have a better understanding how Wikipedia be used in different domain and different users.
From the first chart above, we can get that the Wikipedia is more popular among non-PhD users in different domain except from Engineering & Architecture. In addition, we also can draw the conclusion that male teachers have more open mind toward using Wikipedia in various domain.
From the chart above, we can know that female instructor and Professor are more willing to use Wikipedia comparing to their counterparts, while female lecturer almost never register in Wikipedia, which maybe a chance to expand the use of wiki as teaching tool in this area.
The trellis chart above combined the position and domain to check out the using condition of wiki in these segments. From the figure above, we can get that the penetration rate of Wiki is highest in terms of Health Science & Professor and Science & Lecturer. Lecturers in art & humanity are other user group willing to use wiki as teaching tool. On the other hands, we can get a rough understanding about which area still have high potential to expand the use of Wikipedia for teaching, for example, professor in science.
The objective of these 2 plots above is the same as Mosaic plot built using jmp. In this case, we just implement the mosaic plot using tableau, although there is no original mosaic plot in tableau. Form the charts above, we can get that the use of wiki as teaching tool is most popular in engineering and architecture. In addition, instructors are seem very into use of wiki comparing to other user segments.
The chart above just a normal histogram to illustrate the distribution of use of wiki among different age and working experience segment. From the figure above, we can get that the main wiki user are teachers from 40 to 50 in this case. In addition, it shows that there is a negative correlation between teaching experience and use of wiki, which means that a teacher with long teaching experience would less likely to use wiki as teaching tool.
Ease of use of Wikipedia
In the step, we want to explore the usefulness and ease of use of wiki among various user segments, which would be mainly presented using tree map and stack bar. Beside the original questions, our readers can also further explore the question they are interesting about by interaction with our data visualization view.
The plot above shows the result of a series of Likert Scale Questions which we are interesting in. Basically, the question of this chart are mainly concerned about fullness of wiki, ease of use of wiki, social image and behavioral intention. The answer of each question is Likert scale (1-5) ranging from strongly disagree (1) to strongly agree (5). The grey circle above each bar show the average score of each question. As a result, we can know that most of users agree with the statement that wiki is user-friendly and it is easy to find information in wiki. On the other hand, we can realize the use of wiki is not well considered among colleagues of survey participant. We can explore other question and their average score by manipulating the filter section of question.
The tree map above can show the average score for each question in the survey in different segments. The layers used here to generate user segments include "Domain", "Position", "Gender". In addition, the gradually changing color representing average score range from 2 to 5. We can get the average score distribution of each survey question using filter section, where the filter set to "Question". IN this case, we are interesting in the ease of use of Wikipedia in different user segments, so that we select the "PEU2" in the filter section, which stand for question -- "Is it easy to find in Wikipedia the information you seek." From the tree map above, we can find out several segments with a relative high average score toward question "PEU2", which indicate these kinds of user can easily find the information they want in Wiki. On the opposite, Lecturer from Health Science would disagree with ease of use of wiki to find out useful information. We can also explore other question by reelect the question in filter section.
Relationship between social image and behavioral intention
In this step, we can check out whether there is a correlation between social image and behavioral intention. More specific, would the colleagues's attitude strongly influence the use of wiki for someone at teaching activity in the future. In this case, we used parallel set to illustrate this question.
Data Dictionary
Social Image
- IM1: The use of Wikipedia is well considered among colleagues
- IM2: In academia, sharing open educational resources is appreciated
- IM3: My colleagues use Wikipedia
Behavioral intention
- BI1: In the future I will recommend the use of Wikipedia to my colleagues and students
- BI2: In the future I will use Wikipedia in my teaching activity
From the parallel set above, we can clearly find out a pattern between social image and behavioral intention, which is that high value (4 or 5) for questions Im3 and Im1 would have high chance to lead to high value in BI2 (5). As a result, We can draw the conclusion that one user whose colleagues using wiki would more likely consider using wiki in the future.
Beside using exited parallel set application, there is a new version of parallel set implemented by D3.JS. In this case, our users can explore the relationship within categorical variables by themselves. To use this web-based parallel set, just drag and drop the dimensions tag to generate the parallel set you like. In addition, our users can also upload their own CSV dataset to do a exploration through parallel set.
Tool Utilized
Tool used: Tableau, JMP, Parallel Set.
Chart used: Tree-plot, mosaic plot, Trellis, stack bar, parallel set, bar chart.
Result
From the data visualization above, we have answer all the 3 question discussed. The summary of some key find toward each question as follows.
- Q1: What is the popularity of Wikipedia among different user groups?
- Using of wiki as teaching tool is most popular in engineering and architecture.
- Penetration rate of Wiki is highest in terms of Health Science & Professor and Science & Lecturer.
- There is still a very high potential to expand the use of Wikipedia for teaching for professors in science.
- Q2: What about the different user groups' attitude toward usefulness and ease of use of Wikipedia?
- Most of users agree with the statement that wiki is user-friendly and it is easy to find information in wiki.
- Q3: Is there a strong relationship between social image and behavioral intention?
- There is a positive relationship between social image and behavioral intention, which means that others' use of wiki would inspire one to use wiki as teaching tool.