Difference between revisions of "ISSS608 2016-17 T1 Assign2 Ye Jiatao"
Line 35: | Line 35: | ||
=== Popularity of Wikipedia === | === Popularity of Wikipedia === | ||
+ | [https://public.tableau.com/views/assignment_02/Sheet5?:embed=y&:display_count=yes Data Visualization for Q1&Q2]<br /> | ||
+ | |||
In this step, we want to get some insights about using condition of Wikipedia among different user group. There is a wide range of demographic dimensions in the data-set which can be used in users segmentation. Firstly, we use JMP to explore the data. From the mosaic plot below, we can have a better understanding how Wikipedia be used in different domain and different users. | In this step, we want to get some insights about using condition of Wikipedia among different user group. There is a wide range of demographic dimensions in the data-set which can be used in users segmentation. Firstly, we use JMP to explore the data. From the mosaic plot below, we can have a better understanding how Wikipedia be used in different domain and different users. | ||
Line 51: | Line 53: | ||
From the chart above, we can know that female instructor and Professor are more willing to use Wikipedia comparing to their counterparts, while female lecturer almost never register in Wikipedia. | From the chart above, we can know that female instructor and Professor are more willing to use Wikipedia comparing to their counterparts, while female lecturer almost never register in Wikipedia. | ||
− | + | ||
+ | |||
Revision as of 17:03, 25 September 2016
Contents
Abstract
Wikipedia is a multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content. Anyone can share their knowledge and insight through Wikipedia, which also make it a very useful tool for education purpose. Teachers can design educational activities, sharing teaching materialist and searching specific information using Wikipedia. In this project, we will use data visualization to explore the wikiHE4 data set and try to deliver some useful insight to our readers. The theme of this case focus on the popularity and usefulness of Wikipedia among different user groups.
Problems
In this project, we will mainly answer several questions relating to the theme mentioned above.
- What is the popularity of Wikipedia among different user groups?
- What about the different user groups' attitude toward usefulness of Wikipedia?
- Is there a strong relationship between social image and behavioral intention?
Data-set
The Data set WikiHE4 is from UCI which extracted from a survey of faculty members from two Spanish universities on teaching uses of Wikipedia. The data-set mainly consists of 2 parts: the first part is about demographic information in terms of each participant, the second part is the result of a series of Likert Scale Questions regarding to a wide range of user experience of Wikipedia. The raw data-set is .csv format as showed below.
Approaches
Data Preparation
Before using the data to perform visualization task, we need to carefully clean and reshape the data into appropriate format. In this case, we cannot directly use the original data-set in tableau, because most of the dimensions are related to Likert Scale Questions. In addition, to make our visualization more friendly to readers, we also need to map the raw data with original meaning using data dictionary. The detailed data preparation processes as below.
- Using excel to separate the .csv data into corresponding columns.
- Mapping the demographic variables with readable value using data dictionary.
- Combining variables "UOC_POSITON" and "OTHERSTATUS" to derive a new variable "POSITON", which indicate each participant's occupational title.
- Giving each row one unique ID.
- Delete 5 rows whose value of "POSITON" are unknown.
- Using excel tableau add-in to reshape data-set, which separate one row into multiple rows according to Likert Scale Questions.
Popularity of Wikipedia
In this step, we want to get some insights about using condition of Wikipedia among different user group. There is a wide range of demographic dimensions in the data-set which can be used in users segmentation. Firstly, we use JMP to explore the data. From the mosaic plot below, we can have a better understanding how Wikipedia be used in different domain and different users.
From the first chart above, we can get that the Wikipedia is more popular used among non-PhD users in different domain except from Engineering & Architecture. In addition, we also can draw the conclusion that male teacher have more open mind toward using Wikipedia in various domain.
From the chart above, we can know that female instructor and Professor are more willing to use Wikipedia comparing to their counterparts, while female lecturer almost never register in Wikipedia.
Usefulness of Wikipedia
Relationship between social image and behavioral intention
Tool Utilized
Tool used: Tableau, JMP, Parallel Set.
Chart used: Tree-plot, mosaic plot, Trellis, stack bar, parallel set, bar chart.