Difference between revisions of "ISSS608 2016-17 T1 Assign2 XU Qiuhui"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 7: Line 7:
 
This Analysis aims to find out overall impressions of different user segments on Wikipedia and their use behavior according to high dimensional survey question answers. Then propose recommendations for Wikipedia's future development.
 
This Analysis aims to find out overall impressions of different user segments on Wikipedia and their use behavior according to high dimensional survey question answers. Then propose recommendations for Wikipedia's future development.
 
In this analysis, we'll mainly answer the following questions:
 
In this analysis, we'll mainly answer the following questions:
# Numbered list item Segmentations of Wiki users.
+
# Segmentations of Wiki users.
# Numbered list item Different impressions and use behaviors of different segments.
+
# Different impressions and use behaviors of different segments.
# Numbered list item Relationships among user impressions, user behaviors, and external environments.
+
# Relationships among user impressions, user behaviors, and external environments.
  
 
=Data Preparation=
 
=Data Preparation=

Revision as of 12:16, 26 September 2016

Data Sources

Dataset from UCI, Survey of faculty members from two Spanish universities on teaching uses of Wikipedia

Source: E. Aibar, J. Lladós, A. Meseguer, J. Minguillón (jminguillona[at]uoc[dot]edu), M. Lerga. Universitat Oberta de Catalunya, Barcelona, Spain.

Theme of Interest and Motivation

This Analysis aims to find out overall impressions of different user segments on Wikipedia and their use behavior according to high dimensional survey question answers. Then propose recommendations for Wikipedia's future development. In this analysis, we'll mainly answer the following questions:

  1. Segmentations of Wiki users.
  2. Different impressions and use behaviors of different segments.
  3. Relationships among user impressions, user behaviors, and external environments.

Data Preparation

Transfer Data Type

Variables Original Data Type Transferred Data Type Reason
Gender Numeric Categorical According to dataset dictionary, gender is meaningless while using numeric value to do analysis.
PhD Numeric Categorical According to dataset dictionary, PhD is meaningless while using numeric value to do analysis.
University Numeric Categorical According to dataset dictionary, University is meaningless while using numeric value to do analysis.
YearsExp Categorical Numeric Years of experience should be continuous data, so that we can firstly bin them into several groups, then use groups to classify them.

Bin Numeric Data

Variables Original Transferred Variables Formula
Age
Age
Age(bin) If(:AGE <= 30,"20~30",If(:AGE <= 40,"30~40",If(:AGE <= 50,"40~50",If(:AGE <= 60,"50~60","60~70"))))
YearsExp
YearsExp
YearsExp(bin) If( :YEARSEXP <= 10,"0~10",If( :YEARSEXP <= 20,"10~20",If( :YEARSEXP <= 30,"20~30","more than 30")))

Group Categorical Data

Transform all survey question answers with 1-5 scores to “High, Mid, Low” degree.

Scores Degree
1 Low
2 Low
3 Mid
4 High
5 High

Visualization

Parallel Set

Analysis

Tools Utilized

  1. High-D - For initial data exploration and analysis
  2. JMP 12, MS Excel – For data preparation
  3. d3.js - For data visualization