Difference between revisions of "ISSS608 2016-17 T1 Assign2 Liu Jialin"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 99: Line 99:
 
<h1>Iteration Process</h1>
 
<h1>Iteration Process</h1>
 
# Screen through the questions to see the contents of the questions. Realised the need to re-group the questions and grouped them into different categories. Took out the less relevant questions to teaching, researching and comparing.
 
# Screen through the questions to see the contents of the questions. Realised the need to re-group the questions and grouped them into different categories. Took out the less relevant questions to teaching, researching and comparing.
# Try to find the emphasis in sample description. Realised that one chart cannot describe all, hence resorted to different charts to highlight different aspects of the data (in Dashboard 1).
+
# Try to find the emphasis in sample description. Realised that one chart cannot describe all, hence resorted to different charts to highlight different aspects of the data (in Dashboard 1 and 2).
 
# Use Treemap showing median to determine the overall responses. Realised it does not represent the full picture, gave up on Treemap and used divergent bar chart.
 
# Use Treemap showing median to determine the overall responses. Realised it does not represent the full picture, gave up on Treemap and used divergent bar chart.
 
# Try to use parallel coordinate chart. The implementation is hard without D3.js therefore interactive visualisation is not achievable. However, using parallel coordinates chart to find some big picture traits are still doable. Decided to screenshot parallel coordinates and add into the visualisation.
 
# Try to use parallel coordinate chart. The implementation is hard without D3.js therefore interactive visualisation is not achievable. However, using parallel coordinates chart to find some big picture traits are still doable. Decided to screenshot parallel coordinates and add into the visualisation.
Line 107: Line 107:
 
<h2>Dashboard 1: Visualisation of the survey respondents</h2>
 
<h2>Dashboard 1: Visualisation of the survey respondents</h2>
 
Output:[https://10az.online.tableau.com/t/jialinliuthedataanalyst/views/Wikipediasurveyintwouniversities/Sample?:embed=y&:showShareOptions=true&:display_count=no&:showVizHome=no Sample description]
 
Output:[https://10az.online.tableau.com/t/jialinliuthedataanalyst/views/Wikipediasurveyintwouniversities/Sample?:embed=y&:showShareOptions=true&:display_count=no&:showVizHome=no Sample description]
<h4>Profile Make-up table</h4>
 
Using heatmap to show the distribution of Wikiusers in both universities, the positions they hold and if they have a PhD degree.<br>
 
Colour and text labels are percentages of row total in the same pane.<br>
 
Point of interest: PhD research process will have significant impact on a person’s research habits. We are interested in finding out the distribution of registered Wikipedia users across PhD degree holders and their positions in the two universities.<br>
 
Highlights:<br>
 
# In UOC, holding PhD and Positions constant, non wiki users far outnumber wiki users. PhD does not seem to be a differentiating factor when comes to registered Wikipedia membership.
 
# In UPF, PhD holders are seldom registered Wikipedia users. However, for faculties without PhD, the proportions of wiki users increase slightly.
 
 
<h4>University and Domain Plot</h4>
 
<h4>University and Domain Plot</h4>
 
Using Mosaic plot to show the number of respondents in UOC and UPF, and the domains they belong to.<br>
 
Using Mosaic plot to show the number of respondents in UOC and UPF, and the domains they belong to.<br>
Line 120: Line 113:
 
# Most of the respondents come from UOC.
 
# Most of the respondents come from UOC.
 
# The domain compositions of respondents are similar across the two schools.
 
# The domain compositions of respondents are similar across the two schools.
 +
<h4>University and Positions Plot</h4>
 +
Highlights:<br>
 +
# UOC has large proportion of Adjunct faculty.
 +
# UPF has a rather mixed faculty. There are more Associates than any other position. However, nearly 30% of the faculty did not give a specific indication of their position, which needs serious follow up to correct.
 
<h4>Age and Experience plot:</h4>
 
<h4>Age and Experience plot:</h4>
 
Plotting Age against Experience for all respondents. Colours represent different positions. Filters on age, gender, university, PhD, gender and domain.<br>
 
Plotting Age against Experience for all respondents. Colours represent different positions. Filters on age, gender, university, PhD, gender and domain.<br>
 
Point of interest: <br>
 
Point of interest: <br>
 
# Presence of linear correlation between age, experience and position.
 
# Presence of linear correlation between age, experience and position.
# Faculty composition in two schools.
 
 
Highlights:<br>
 
Highlights:<br>
# UOC has large proportion of Adjunct faculty, but UPF has a rather mixed position composition.
 
 
# Excluding Adjunct, there is evidence of linear relationship between age and experience.
 
# Excluding Adjunct, there is evidence of linear relationship between age and experience.
 +
# Filters allow interactive display of specific information.
 
<br>
 
<br>
 +
<h2>Dashboard 2: Visualisation of sample profile and Wiki user
 +
Output:[https://10az.online.tableau.com/t/jialinliuthedataanalyst/views/Wikipediasurveyintwouniversities/Sample?:embed=y&:showShareOptions=true&:display_count=no&:showVizHome=no Sample description]
 +
<h4>Profile Make-up table</h4>
 +
Using heatmap to show the distribution of Wikiusers in both universities, the positions they hold and if they have a PhD degree.<br>
 +
Colour and text labels are percentages of row total in the same pane.<br>
 +
Point of interest: PhD research process will have significant impact on a person’s research habits. We are interested in finding out the distribution of registered Wikipedia users across PhD degree holders and their positions in the two universities.<br>
 +
Highlights:<br>
 +
# In UOC, holding PhD and Positions constant, non wiki users far outnumber wiki users. PhD does not seem to be a differentiating factor when comes to registered Wikipedia membership.
 +
# In UPF, PhD holders are seldom registered Wikipedia users. However, for faculties without PhD, the proportions of wiki users increase slightly.
 +
<h4>Age and Experience Plot.
 +
This is the same plot as in Dashboard 1 but a filter on Wiki user is added.
 +
Highlights:<br>
 +
# By selecting PhD and non PhD, there are roughly equal number of PhD who are Wiki user as number of non PhD who are Wiki user.
 +
# Most of the Wiki users are below 55 years old. However, there are also Wiki users above 55 years old. For example, a teacher who is 65 years old with 43 years of experience is a registered Wiki user.
  
<h2>Dashboard 2: Visualisation of questions by the order of best overall response.</h2>
+
 
 +
<h2>Dashboard 3: Visualisation of questions by the order of best overall response.</h2>
 
Output:[https://10az.online.tableau.com/t/jialinliuthedataanalyst/views/Wikipediasurveyintwouniversities/AllQn?:embed=y&:showShareOptions=true&:display_count=no&:showVizHome=no Question list]<br>
 
Output:[https://10az.online.tableau.com/t/jialinliuthedataanalyst/views/Wikipediasurveyintwouniversities/AllQn?:embed=y&:showShareOptions=true&:display_count=no&:showVizHome=no Question list]<br>
 
<br>
 
<br>
Line 142: Line 153:
 
# However, faculties do consult Wikipedia for personal issues. They believe Wikipedia stimulates curiosity.
 
# However, faculties do consult Wikipedia for personal issues. They believe Wikipedia stimulates curiosity.
 
<br>
 
<br>
<h2>Dashboard 3: Visualisation of questions by category.</h2>
+
<h2>Dashboard 4: Visualisation of questions by category.</h2>
 
Output:[https://10az.online.tableau.com/t/jialinliuthedataanalyst/views/Wikipediasurveyintwouniversities/Categories?:embed=y&:showShareOptions=true&:display_count=no&:showVizHome=no Question categories]<br>
 
Output:[https://10az.online.tableau.com/t/jialinliuthedataanalyst/views/Wikipediasurveyintwouniversities/Categories?:embed=y&:showShareOptions=true&:display_count=no&:showVizHome=no Question categories]<br>
 
<br>
 
<br>
Line 153: Line 164:
 
# Law faculty has the worst responses overall, whereas Engineering and Architecture has, in comparison, the best responses overall.
 
# Law faculty has the worst responses overall, whereas Engineering and Architecture has, in comparison, the best responses overall.
  
 +
<h1>Methods Iteration</h1>
 +
 +
<h1>Analysis Tools Evaluation</h1>
 +
 +
<h1>Suggested Improvements on Questions</h1>
  
 
<h1>Reference</h1>
 
<h1>Reference</h1>

Revision as of 22:42, 7 October 2016

Initial Question

Wikipedia is a free online encyclopedia that allows its users to contribute and edit almost any article. It is the most popular general reference work on the Internet and is ranked among the ten most popular websites.[1] While Wikipedia is extremely popular among ordinary Internet users, in this assignment we are interested in how academia people--the true professionals in their specialty, perceive the usefulness of Wikipedia in the following aspects:

  1. Research and teaching
  2. Recognition of Wikipedia among colleagues
  3. Perception on students using Wikipedia
  4. General quality of Wikipedia.
  5. Other online contributions made by the same group of academia people.




Dataset

The dataset is taken from UC Irvine Machine Learning Repository: wiki4HE Data Set.[2], which is: “[an]Ongoing research on university faculty perceptions and practices of using Wikipedia as a teaching resource. Based on a Technology Acceptance Model, the relationships within the internal and external constructs of the model are analysed. Both the perception of colleagues’ opinion about Wikipedia and the perceived quality of the information in Wikipedia play a central role in the obtained model.”

Metadata of the Dataset

Attribute Information:
AGE: numeric
GENDER: 0=Male; 1=Female
DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics
PhD: 0=No; 1=Yes
YEARSEXP (years of university teaching experience): numeric
UNIVERSITY: 1=UOC; 2=UPF
UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
OTHER (main job in another university for part-time members): 1=Yes; 2=No
OTHER_POSITION (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
USERWIKI (Wikipedia registered user): 0=No; 1=Yes

Survey Questions

The survey items are Likert scale (1-5) ranging from strongly disagree / never (1) to strongly agree / always (5).
There are 43 survey questions in total, out of which 30 were selected to be analysed and presented. The 30 questions were divided into 5 categories:
A. Research and Teaching:

  1. VIS3: I cite Wikipedia in my academic papers
  2. EXP1: I consult Wikipedia for issues related to my field of expertise
  3. EXP2: I consult Wikipedia for other academic related issues
  4. QU4: In my area of expertise, Wikipedia has a lower quality than other educational resources
  5. USE1: I use Wikipedia to develop my teaching materials
  6. PU3: Wikipedia is useful for teaching
  7. BI2: In the future I will use Wikipedia in my teaching activity

B. Recognition of Wikipedia among colleagues:

  1. IM1: The use of Wikipedia is well considered among colleagues
  2. IM3: My colleagues use Wikipedia
  3. USE4: I recommend my colleagues to use Wikipedia
  4. BI1: In the future I will recommend the use of Wikipedia to my colleagues and students

C. Perception on students using Wikipedia:

  1. USE2: I use Wikipedia as a platform to develop educational activities with students
  2. EXP5: I use wikis to work with my students
  3. USE5: I agree my students use Wikipedia in my courses
  4. USE3: I recommend my students to use Wikipedia
  5. ENJ1: The use of Wikipedia stimulates curiosity
  6. PU1: The use of Wikipedia makes it easier for students to develop new skills
  7. PU2: The use of Wikipedia improves students' learning

D. Other online contributions:

  1. SA2: It is important to publish research results in other media than academic journals or books
  2. SA1: It is important to share academic content in open platforms
  3. EXP4: I contribute to Wikipedia (editions, revisions, articles improvement...)
  4. PF1: I contribute to blogs
  5. PF3: I publish academic content in open platforms
  6. PF2: I actively participate in social networks
  7. IM2: In academia, sharing open educational resources is appreciated Students

E. General quality of Wikipedia

  1. EXP3: I consult Wikipedia for personal issues
  2. QU1: Articles in Wikipedia are reliable
  3. QU2: Articles in Wikipedia are updated
  4. QU3: Articles in Wikipedia are comprehensive
  5. QU5: I trust in the editing system of Wikipedia




Question Refinement based on the Dataset

After viewing the dataset, the following questions came to consideration:

  1. What is the sample make-up in terms of positions?
  2. What is the sample make-up in terms of domains?
  3. Does having a PhD or not affect a teacher's perception towards Wikipedia?
  4. Do age and Years of Experience affect a teacher's perception towards Wikipedia?
  5. Do the above questions differ for UOC and UPF?

We will answer these questions together with findings from the data.


Data Transformation

Data transformation using JMP (used for JMP):

  1. Change data into correct data type (ordinal, nominal). Likert scale is treated as ordinal.
  2. Recode the numbers into respective responses according to metadata. Discover that there are some numbers not described in the metadata.
  3. For Domain, recode "6" and "?" as Others.
  4. For UOC Position, filter on University=UPF, copy Other_Position into UOC_Position, replacing the "?".
  5. For the remaining "?" in UOC_Position, recode as Others.
  6. Rename UOC_Position as "University Position"
  7. For User Wiki: recode "?" as Unknown.
  8. Save file as wiki4HE cleaned.csv


After using JMP, transform wiki4HE cleaned.csv using Excel add-in:

  1. Download Tableau Add-In for Reshaping Data in Excel.[3]
  2. Select all answers to the questions to be reshaped, reshape the data.
  3. Save as wiki4HE cleaned for Tableau.csv, use data for Tableau analysis.


Iteration Process

  1. Screen through the questions to see the contents of the questions. Realised the need to re-group the questions and grouped them into different categories. Took out the less relevant questions to teaching, researching and comparing.
  2. Try to find the emphasis in sample description. Realised that one chart cannot describe all, hence resorted to different charts to highlight different aspects of the data (in Dashboard 1 and 2).
  3. Use Treemap showing median to determine the overall responses. Realised it does not represent the full picture, gave up on Treemap and used divergent bar chart.
  4. Try to use parallel coordinate chart. The implementation is hard without D3.js therefore interactive visualisation is not achievable. However, using parallel coordinates chart to find some big picture traits are still doable. Decided to screenshot parallel coordinates and add into the visualisation.


Data Visualisation

Dashboard 1: Visualisation of the survey respondents

Output:Sample description

University and Domain Plot

Using Mosaic plot to show the number of respondents in UOC and UPF, and the domains they belong to.
Point of interest: domain composition and school composition of the sample.
Highlights:

  1. Most of the respondents come from UOC.
  2. The domain compositions of respondents are similar across the two schools.

University and Positions Plot

Highlights:

  1. UOC has large proportion of Adjunct faculty.
  2. UPF has a rather mixed faculty. There are more Associates than any other position. However, nearly 30% of the faculty did not give a specific indication of their position, which needs serious follow up to correct.

Age and Experience plot:

Plotting Age against Experience for all respondents. Colours represent different positions. Filters on age, gender, university, PhD, gender and domain.
Point of interest:

  1. Presence of linear correlation between age, experience and position.

Highlights:

  1. Excluding Adjunct, there is evidence of linear relationship between age and experience.
  2. Filters allow interactive display of specific information.


Dashboard 2: Visualisation of sample profile and Wiki user Output:Sample description

Profile Make-up table

Using heatmap to show the distribution of Wikiusers in both universities, the positions they hold and if they have a PhD degree.
Colour and text labels are percentages of row total in the same pane.
Point of interest: PhD research process will have significant impact on a person’s research habits. We are interested in finding out the distribution of registered Wikipedia users across PhD degree holders and their positions in the two universities.
Highlights:

  1. In UOC, holding PhD and Positions constant, non wiki users far outnumber wiki users. PhD does not seem to be a differentiating factor when comes to registered Wikipedia membership.
  2. In UPF, PhD holders are seldom registered Wikipedia users. However, for faculties without PhD, the proportions of wiki users increase slightly.

Age and Experience Plot. This is the same plot as in Dashboard 1 but a filter on Wiki user is added. Highlights:
  1. By selecting PhD and non PhD, there are roughly equal number of PhD who are Wiki user as number of non PhD who are Wiki user.
  2. Most of the Wiki users are below 55 years old. However, there are also Wiki users above 55 years old. For example, a teacher who is 65 years old with 43 years of experience is a registered Wiki user.

Dashboard 3: Visualisation of questions by the order of best overall response.

Output:Question list

Using stacked bar chart to represent the percentage of each answers in every question.
Point of interest: overall representation of the responses. The questions with best response overall and worst response overall. Prepare for detailed question analyses by categories.
Highlights:

  1. Faculties agree the most that it is important to share academic content in open platforms. To the uttermost irony, they agree the least that they contribute to Wikipedia.
  2. Faculties have an overall positive view that Wikipedia is useful for teaching, but they also indicate that they do not use Wikipedia to develop their teaching materials.
  3. Usage of Wikipedia as a platform to develop educational activities with students is low.
  4. Citation on Wikipedia in academic papers is low.
  5. However, faculties do consult Wikipedia for personal issues. They believe Wikipedia stimulates curiosity.


Dashboard 4: Visualisation of questions by category.

Output:Question categories

Using divergent bar chart to represent the answers for questions under each category.
Filters on age, university, PhD, position and domain.
Point of interest: across the same category, how do the views change depending on activities of various formalities.
Highlights:

  1. Faculties in general trust the quality of Wikipedia. They consult Wikipedia for personal issues.
  2. Faculties contribute the least to Wikipedia, followed by blogs and academic content in open platforms.
  3. Law faculty has the worst responses overall, whereas Engineering and Architecture has, in comparison, the best responses overall.

Methods Iteration

Analysis Tools Evaluation

Suggested Improvements on Questions

Reference

About Wikipedia:en.wikipedia.org/wiki/Wikipedia
Dataset:wiki4HE
Tableau Excel Add-in:Reshaping data for tableau