ISSS608 2016-17 T1 Assign2 Lim Hui Ting Jaclyn

From Visual Analytics and Applications
Revision as of 05:24, 26 September 2016 by Jaclyn.lim.2016 (talk | contribs)
Jump to navigation Jump to search

Introduction

Most faculty members have a tendency to avoid using Wikipedia as a source of reference and teaching material. In the given dataset, the survey was conducted to faculty members with regards to Wikipedia and online platforms. The theme that I have chosen would be the usage of Wikipedia. As such, I would like to find out why faculty members would use Wikipedia, what kind of faculty members would tend to use Wikipedia, and if there are differences between UOC and UPF members with regards to their usage and perception towards Wikipedia. By answering these questions, I will be able to understand why faculty members have, or if not, will encourage the use of Wikipedia.

Questions for Investigation:

  1. What are the main reasons for faculty members to use Wikipedia?
  2. What is the profile of faculty members who are users of Wikipedia?
  3. Are there differences between faculty members solely in UOC, and faculty members in UPF, with regards to their usage of Wikipedia? And if so, why?

Data

The data was taken from Wiki4HE. It consists of survey questions that were given to university faculty members, in order to find out about the perception and practices of them using Wikipedia.

The attribute table can also be found from Wiki4HE.

AGE: numeric

GENDER: 0=Male; 1=Female

DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics

PhD: 0=No; 1=Yes

YEARSEXP (years of university teaching experience): numeric

UNIVERSITY: 1=UOC; 2=UPF

UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct

OTHER (main job in another university for part-time members): 1=Yes; 2=No

OTHER_POSITION (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct

USERWIKI (Wikipedia registered user): 0=No; 1=Yes

The 43 survey items are ranked on a Likert scale (1-5) ranging from strongly disagree / never (1) to strongly agree / always (5).

Data Cleaning

  1. Recoding data
  2. There were columns with “?” cells. Although they meant that there was no response, the usage of “?” resulted in the data type of the column to be Character instead of Numeric. These cells were recoded from “?” to NULL. I also recoded data in OTHER_POSITION from “2” to “0” as they represented faculty members with no other positions.
  3. Inconsistent attributes with the given attributes information table
  4. There were only 5 variables in the “Domain” attribute table given. However, there are 6 different variables, and null values inside the column itself. I assumed that “6” belonged to the category of others. Also, in the column “OTHERSTATUS”, there were 7 variables instead of 6 listed. Hence, I assumed that the variables with value of “7” represented other faculty positions.
  5. Inconsistent data
  6. OTHER_POSITION only had values such as ?, 1, 2. Unlike stated in the attribute table. Also, there is an additional column named “OTHERSTATUS” which was not mentioned in the attribute table given. This column, “OTHERSTATUS”, has values ranging from ?, and 0 to 7. As such, a probable guess is that OTHER_POSITION contains variables of whether faculty members hold other positions, and OTHERSTATUS refers to the position taken in the other position that the faculty member holds.
  7. Inverse data values
  8. A majority of the questions were positively phrased except for QU4 that was negatively phrased. “QU4: In my area of expertise, Wikipedia has a lower quality than other educational resources “ Hence, the values had to be swapped inversely. For example, “5” would represent “Strongly Agree”, such that Wikipedia has a lower quality than other educational resources. By recoding it to “1”, it would mean that these are the people who agree that Wikipedia has a lower quality. The new value of “5” would represent people who agree that Wikipedia is not of a low quality than other educational resources.
  9. New columns
  10. As I found that the following columns, “University”, “Other Position”, “Other status” was quite confusing and also difficult to do analysis in, I decided to create additional columns on JMP. Hence, I created Columns such as “UOC” that contains 0 or 1, 1 if the faculty member is from UOC, and “UPF” that contains 0 or 1, 1 if the faculty member is from UPF. Also, “UOC_Position” contains a value of 1 to 6 if the faculty member is from UOC, and “UPF_Position” contains a value from 1 to 7 if the faculty member is from UPF.
  11. Grouping survey questions into different categories
    • Question Categories

    I categorised the questions according to their codes.

    Perceived Usefulness

    PU1: The use of Wikipedia makes it easier for students to develop new skills

    PU2: The use of Wikipedia improves students' learning

    PU3: Wikipedia is useful for teaching

    In this case, PU1, PU2 and PU3 will be placed in the group named “PU”. I categorised all of the other questions the same way.

    • Group Categories

    Categories

    Questions

    Code

    Teaching Resource QU1, QU2, QU3, QU4, VIS3, USE1, BI1, BI2, EXP1, EXP2 TR
    Collaborative Platform VIS1, VIS2, EXP4, EXP5, USE2 CP
    Perception of Online Platforms PF1, PF2, PF3, SA1, SA2, SA3 OPP
    Perception of Wikipedia ENJ1, ENJ2, PEU1, PEU2, PEU3, PU1, PU2, PU3 WP
  12. Transpose data on excel sheet
  13. In order to create a “response” column to find out the scores, and a “questions” column, I had to create a column listing arbitrary ID numbers, and to use a Tableau add-in function on Excel to create a pivot table.

Data Exploration

Iteration 1

Iteration 2

Dashboard


Visualisation 1: What are the main reasons for faculty members to use Wikipedia?

Methodology

To answer the question, I have decided to display a visualisation related to the two categories of the two main reasons why faculty members would use Wikipedia.

First, divergent bar charts were used to display the questions related to using Wikipedia as a Teaching Resource or a Collaboration Platform. This is because divergent bar charts help to display the percentages of likert scale values of 1-5 on the same bar, and users will be able to see the distribution of scores for each question. The average score was also included, in the bar charts. Although the average score cannot be relied on, by its own, it can come handy when paired with a divergent bar chart.

The divergent bar charts were coloured on a spectrum of two colours, red to blue. This allows us to see the values of 5, that represent "Strongly Agree" can be identified with the portions of the bars that are in dark blue. The values of 1, that represent "Strongly Disagree" can be identified with the portion of the bars that are in dark red.

Other variables were added to the dashboard as well, such as Domain, Position, and UserWiki. These variables were represented in bar charts. The area of the bar represents the distinct count of IDs within each attribute of the variable. A filter function was added to each of these bar charts as well. As such, in the dashboard, the user will be able see which category of each variable uses Wikipedia more as a teaching resource and/ or a collaborative platform, as the divergent bar charts will change according to the filtered variables.

A screenshot of the dashboard can be seen below. It was done using Tableau.

insert picture - dashboard of visulisation 1

Second, parallel coordinates were also used to allow for better comparison. In the following visualisation that was done using Tibco Spotfire, I plotted the parallel coordinates plot of both categories against each other, and included an additional variable "Domain". As such, one will be able to view how the distribution of coordinates changes when a domain changes.

A screenshot of the parallel coordinate plot can be seen below.

insert picture- parallel coordinate plot ALL

Visualisation

Qn Code

Question - Teaching Resource

QU1 Articles in Wikipedia are reliable
QU2 Articles in Wikipedia are updated
QU3 Articles in Wikipedia are comprehensive
Qu4 In my area of expertise, Wikipedia has a lower quality than other educational resources
VIS3 I cite Wikipedia in my academic papers
USE1 I use Wikipedia to develop my teaching materials
BI1 In the future I will recommend the use of Wikipedia to my colleagues and students
BI2 In the future I will use Wikipedia in my teaching activity
EXP1 I consult Wikipedia for issues related to my field of expertise
EXP2 I consult Wikipedia for other academic related issues

Qn Code

Question - Collaborative Platform

VIS1 Wikipedia improves visibility of students' work
VIS2 It is easy to have a record of the contributions made in Wikipedia
EXP4 I contribute to Wikipedia (editions, revisions, articles improvement...)
EXP5 I use wikis to work with my students
USE2 I use Wikipedia as a platform to develop educational activities with students

Parallel Coordinates: insert picture

Divergent Bar Charts: Dashboard

Observation & Insights


Visualisation 2: What is the profile of faculty members who are users of Wikipedia?

Methodology

For this visualisation, I decided to use a Treemap Representation to display the findings. Treemaps allow us to visualise and analyse hierarchical data. In this case, as I wanted to find out the profile of faculty members who will be more likely to be users of Wikipedia, I could achieve an organised multivariate hierarchical visualisation. In this case, I used a pivot-by-size layout. I have made 2 different Treemaps, one using Tibco Spotfire, and the other using Tableau. Both treemaps differ due to the visualisation options provided by both softwares.

The hierarchy that I've set for the visualisation (in Tibco Spotfire) is as follows:

  • Domain
  • PHD
  • Gender
  • Years of Experience

I also added a filter to ensure that only the registered users of Wikipedia were captured in the treemap visualisation. The size and the colour of the treemap representation is based on the count of distinct ID values. I used a range of blues, from the lightest shade that represents the smallest area, to the darkest shade of blue that represents the largest area.

The hierarchy that I've set for the visualisation (in Tableau) is as follows:

  • University
  • Position in UPF
  • Position in UOC
  • Domain
  • PHD

I added 3 different filters to this representation. The first one being Categories, to allow users to see the typical profile of users based on each categorical grouping. The second one being Likert Value, to capture the respondents' response. The last one being a filter with regards to the years of experience that each faculty member had, so that users view the difference between members who have had a lot of experience, and those with the least expereince.

The size and the colour of the treemap representation is based on the count of distinct ID values. I used a range of blues, from the lightest shade that represents the smallest area, to the darkest shade of blue that represents the largest area.

Visualisation

Tibco Spotfire: insert picture of profile treemap

Tableau: Dashboard

Observation & Insights


Visualisation 3: Are there differences between faculty members solely in UOC, and faculty members in UPF, with regards to their usage of Wikipedia? And if so, why?

Methodology

Visualisation

Tableau: Dashboard

Observation & Insights


Comparison of Software

Final Deliverable