ISSS608 2016-17 T1 Assign2 Lim Hui Ting Jaclyn

From Visual Analytics and Applications
Jump to navigation Jump to search

Introduction

Most faculty members have a tendency to avoid using Wikipedia as a source of reference and teaching material. In the given dataset, the survey was conducted to faculty members with regards to Wikipedia and online platforms. The theme that I have chosen would be the usage of Wikipedia. As such, I would like to find out why faculty members would use Wikipedia, what kind of faculty members would tend to use Wikipedia, and if there are differences between UOC and UPF members with regards to their usage and perception towards Wikipedia. By answering these questions, I will be able to understand why faculty members have, or if not, will encourage the use of Wikipedia.

Questions for Investigation:

  1. What are the main reasons for faculty members to use Wikipedia?
  2. What is the profile of faculty members who are users of Wikipedia?
  3. Are there differences between faculty members solely in UOC, and in UPF, with regards to their usage of Wikipedia?

Data

The data was taken from Wiki4HE. It consists of survey questions that were given to university faculty members, in order to find out about the perception and practices of them using Wikipedia.

The attribute table can also be found from Wiki4HE.

AGE: numeric

GENDER: 0=Male; 1=Female

DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics

PhD: 0=No; 1=Yes

YEARSEXP (years of university teaching experience): numeric

UNIVERSITY: 1=UOC; 2=UPF

UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct

OTHER (main job in another university for part-time members): 1=Yes; 2=No

OTHER_POSITION (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct

USERWIKI (Wikipedia registered user): 0=No; 1=Yes

The 43 survey items are ranked on a Likert scale (1-5) ranging from strongly disagree / never (1) to strongly agree / always (5).

Data Cleaning

  1. Recoding data
  2. There were columns with “?” cells. Although they meant that there was no response, the usage of “?” resulted in the data type of the column to be Character instead of Numeric. These cells were recoded from “?” to NULL. I also recoded data in OTHER_POSITION from “2” to “0” as they represented faculty members with no other positions.
  3. Inconsistent attributes with the given attributes information table
  4. There were only 5 variables in the “Domain” attribute table given. However, there are 6 different variables, and null values inside the column itself. I assumed that “6” belonged to the category of others. Also, in the column “OTHERSTATUS”, there were 7 variables instead of 6 listed. Hence, I assumed that the variables with value of “7” represented other faculty positions.
  5. Inconsistent data
  6. OTHER_POSITION only had values such as ?, 1, 2. Unlike stated in the attribute table. Also, there is an additional column named “OTHERSTATUS” which was not mentioned in the attribute table given. This column, “OTHERSTATUS”, has values ranging from ?, and 0 to 7. As such, a probable guess is that OTHER_POSITION contains variables of whether faculty members hold other positions, and OTHERSTATUS refers to the position taken in the other position that the faculty member holds.
  7. Inverse data values
  8. A majority of the questions were positively phrased except for QU4 that was negatively phrased. “QU4: In my area of expertise, Wikipedia has a lower quality than other educational resources “ Hence, the values had to be swapped inversely. For example, “5” would represent “Strongly Agree”, such that Wikipedia has a lower quality than other educational resources. By recoding it to “1”, it would mean that these are the people who agree that Wikipedia has a lower quality. The new value of “5” would represent people who agree that Wikipedia is not of a low quality than other educational resources.
  9. New columns
  10. As I found that the following columns, “University”, “Other Position”, “Other status” was quite confusing and also difficult to do analysis in, I decided to create additional columns on JMP. Hence, I created Columns such as “UOC” that contains 0 or 1, 1 if the faculty member is from UOC, and “UPF” that contains 0 or 1, 1 if the faculty member is from UPF. Also, “UOC_Position” contains a value of 1 to 6 if the faculty member is from UOC, and “UPF_Position” contains a value from 1 to 7 if the faculty member is from UPF.
  11. Grouping survey questions into different categories
    • Question Categories

    I categorised the questions according to their codes.

    Perceived Usefulness

    PU1: The use of Wikipedia makes it easier for students to develop new skills

    PU2: The use of Wikipedia improves students' learning

    PU3: Wikipedia is useful for teaching

    In this case, PU1, PU2 and PU3 will be placed in the group named “PU”. I categorised all of the other questions the same way.

    • Group Categories

    Categories

    Questions

    Code

    Teaching Resource QU1, QU2, QU3, QU4, VIS3, USE1, BI1, BI2, EXP1, EXP2 TR
    Collaborative Platform VIS1, VIS2, EXP4, EXP5, USE2 CP
    Perception of Online Platforms PF1, PF2, PF3, SA1, SA2, SA3 OPP
    Perception of Wikipedia ENJ1, ENJ2, PEU1, PEU2, PEU3, PU1, PU2, PU3 WP
  12. Transpose data on excel sheet
  13. In order to create a “response” column to find out the scores, and a “questions” column, I had to create a column listing arbitrary ID numbers, and to use a Tableau add-in function on Excel to create a pivot table.

Data Exploration

Iteration 1

In iteration 1, I initially found the mean and median values of each survey question and plotted them in order to compare them. However, I found out that the values were on a Likert Scale, and this meant that the mean and median values cannot be compared with each other as it is not meaningful. A question that has a mean score of 2.5, would not allow the user to understand the actual percentage of responses and the variation within a question.

Iteration 2

As such, I had to re-look at my data and find alternative ways to analyse them. In this case, I created divergent bar charts by referencing to Data Revelations. The step-by-step instructions can be found here and here. With the help of the online reference, I managed to create a basic Divergent Bar Chart for the survey questions. Below is a screenshot of these charts.

Jl a2 Questions Dashboard.png

The actual dashboard can also be found: Here.


Visualisation 1: What are the main reasons for faculty members to use Wikipedia?

Methodology

To answer the question, I have decided to display a visualisation related to the two categories of the two main reasons why faculty members would use Wikipedia.

First, Divergent Bar Charts were used to display the questions related to using Wikipedia as a Teaching Resource or a Collaboration Platform. This is because divergent bar charts help to display the percentages of likert scale values of 1-5 on the same bar, and users will be able to see the distribution of scores for each question. The average score was also included, in the bar charts. Although the average score cannot be relied on, by its own, it can come handy when paired with a divergent bar chart.

The divergent bar charts were coloured on a spectrum of two colours, red to blue. This allows us to see the values of 5, that represent "Strongly Agree" can be identified with the portions of the bars that are in dark blue. The values of 1, that represent "Strongly Disagree" can be identified with the portion of the bars that are in dark red.

Other variables were added to the dashboard as well, such as Domain, Position, and UserWiki. These variables were represented in bar charts. The area of the bar represents the distinct count of IDs within each attribute of the variable. A filter function was added to each of these bar charts as well. As such, in the dashboard, the user will be able see which category of each variable uses Wikipedia more as a teaching resource and/ or a collaborative platform, as the divergent bar charts will change according to the filtered variables.

A screenshot of the dashboard can be seen below. It was done using Tableau.

Second, Parallel Coordinates were also used to allow for better comparison. In the following visualisation that was done using Tibco Spotfire, I plotted the parallel coordinates plot of both categories against each other, and included an additional variable "Domain". As such, one will be able to view how the distribution of coordinates changes when a domain changes. In this visualisation, as seen in the screenshot, I highlighted the values that corresponded with faculty members who were registered users of Wikipedia.

A screenshot of the parallel coordinate plot can be seen below.

Questions

Qn Code

Question - Teaching Resource

QU1 Articles in Wikipedia are reliable
QU2 Articles in Wikipedia are updated
QU3 Articles in Wikipedia are comprehensive
Qu4 In my area of expertise, Wikipedia has a lower quality than other educational resources
VIS3 I cite Wikipedia in my academic papers
USE1 I use Wikipedia to develop my teaching materials
BI1 In the future I will recommend the use of Wikipedia to my colleagues and students
BI2 In the future I will use Wikipedia in my teaching activity
EXP1 I consult Wikipedia for issues related to my field of expertise
EXP2 I consult Wikipedia for other academic related issues

Qn Code

Question - Collaborative Platform

VIS1 Wikipedia improves visibility of students' work
VIS2 It is easy to have a record of the contributions made in Wikipedia
EXP4 I contribute to Wikipedia (editions, revisions, articles improvement...)
EXP5 I use wikis to work with my students
USE2 I use Wikipedia as a platform to develop educational activities with students

Visualisation

Parallel Coordinates by Tibco Spotfire:

Jl a2 Usage byuserwiki.jpg

Divergent Bar Charts by Tableau:

Jl a2 Usage Dashboard.png

The actual Dashboard can be found here.

Observation & Insights


Visualisation 2: What is the profile of faculty members who are users of Wikipedia?

Methodology

For this visualisation, I decided to use a Treemap Representation to display the findings. Treemaps allow us to visualise and analyse hierarchical data. In this case, as I wanted to find out the profile of faculty members who will be more likely to be users of Wikipedia, I could achieve an organised multivariate hierarchical visualisation, by using a pivot-by-size layout. I have made 2 different Treemaps, one using Tibco Spotfire, and the other using Tableau. Both treemaps differ due to the visualisation options provided by both softwares.

The hierarchy that I've set for the visualisation (in Tibco Spotfire) is as follows:

  • Domain: values of 1-6 refers differing categories of domains (refer to attribute table above)
  • Phd: 0 represents no Phd, 1 represents Phd
  • Gender: 0 represents Male, 1 represents Female
  • Years of Experience in bins

I also added a filter to ensure that only the registered users of Wikipedia were captured in the treemap visualisation. The size and the colour of the treemap representation is based on the count of distinct ID values. I used a range of blues, from the lightest shade that represents the smallest area, to the darkest shade of blue that represents the largest area.

The hierarchy that I've set for the visualisation (in Tableau) is as follows:

  • University
  • Position in UPF
  • Position in UOC
  • Domain
  • PHD

I added 3 different filters to this representation. The first one being Categories, to allow users to see the typical profile of users based on each categorical grouping. The second one being Likert Value, to capture the respondents' response. The last one being a filter with regards to the years of experience that each faculty member had, so that users view the difference between members who have had a lot of experience, and those with the least expereince.

The size and the colour of the treemap representation is based on the count of distinct ID values. I used a range of blues, from the lightest shade that represents the smallest area, to the darkest shade of blue that represents the largest area.

Visualisation

Tibco Spotfire Treemap:

Jl a2 Profile UsersofWiki.jpg

Tableau Treemap:

Jl a2 Profile Tableau.png

The Dashboard for the visualisation above can be found here: User Profile Dashboard

Observation & Insights


Visualisation 3: Are there differences between faculty members solely in UOC, and in UPF, with regards to their usage of Wikipedia?

Methodology

I decided to find out the differences between faculty members who solely teach in UOC, as compared to UPF professors (of which some may be teaching in UOC) and their usage of Wikipedia because of the differences in school environments. UOC is an internet- centered open university, whereas UPF is a public university. Both of them are based in Spain. As such, by finding out the differences, I would be able to see if the environment that these faculty members were from will impact the survey results.

To compare the differences, I used three different visualisation methods to be able to understand the survey results better.

First, using Tibco Spotfire, I created another Treemap Representation. In this case, as I wanted to find out the profile of faculty members of differing universities and their likelihood of using Wikipedia, I could achieve an organised multivariate hierarchical visualisation, by using a pivot-by-size layout

The hierarchy that I've set for the visualisation (in Tibco Spotfire) is as follows:

  • Userwiki: 0 represents not a registered Wikipedia user, 1 represents a registered Wikipedia user
  • UPF: 0 represents faculty members solely based in UOC, 1 represents faculty members in UPF
  • Domain: values of 1-6 refers differing categories of domains (refer to attribute table above)
  • Phd: 0 represents no Phd, 1 represents Phd
  • Years of Experience in bins

I also added a filter to ensure that only the registered users of Wikipedia were captured in the treemap visualisation. The size and the colour of the treemap representation is based on the count of distinct ID values. I used a range of blues, from the lightest shade that represents the smallest area, to the darkest shade of blue that represents the largest area.

The next visualisation representation I used was Parallel Coordinates using Tibco Spotfire. This visual representation was done for three categories: Perception and Usage. I used parallel coordinates as it would allow the user to look at the variation of answers per question, and to spot trends. In this case, the charts are coloured as well. Blue represents responses of faculty members who were only from UOC. Green represents responses of faculty members who are from UPF. From the visualisation charts below, we will be able to see the differences in responses between the two universities. I also filtered out responses of users who were not registered users of Wikipedia, as the key question relates to registered users of Wikipedia.

The last visualisation method I used was Divergent Bar Charts. There are 3 categories of Divergent Bar Charts that were introduced in the dashboard, University Environment, Perception, and Usage. These divergent bar charts allow us to see the variation of responses of individuals.

The divergent bar charts were coloured on a spectrum of two colours, red to blue. This allows us to see the values of 5, that represent "Strongly Agree" can be identified with the portions of the bars that are in dark blue. The values of 1, that represent "Strongly Disagree" can be identified with the portion of the bars that are in dark red. Additional filters, such as "YearsExp" and "Age" were added to allow the user to filter the data to see the changes in responses (in the divergent bar charts).

Visualisation

Tibco Spotfire Treemap:

Jl a2 University Profile.jpg

Tibco Spotfire Parallel Coordinates:

Jl a2 Perception.jpg
Jl a2 Usage.jpg
Jl a2 University Environment.jpg

Tableau Dashboard:

Jl a2 University Dashboard Tableau.png

The Dashboard for the visualisation above can be found here: Dashboard

Observation & Insights


Comparison of Software

Conclusion