ISSS608 2016-17 T1 Assign2 Abhinav Ghildiyal
Abstract
Wikipedia is a multilingual, web-based, free-content encyclopedia project supported by the Wikimedia Foundation and based on a model of openly editable content. Now days many people use Wikipedia because of its ease, usefulness, visibility, quality, social image, incentives and many other factors. In the similar fashion many faculty members across the world use Wikipedia as a teaching tool in recent years. In this assignment we will try to investigate what are the main factors which makes Wikipedia a likable or an unlikable tool based on the surveyed data collected from the 2 Spanish universities, by surveying the faculty members of those universities.
Theme of Interest
In the context of Wiki4HE assignment, i am undertaking an investigation on the perception of the faculty members towards the use of Wikipedia as a teaching tool and do a study on their attitude towards Wikipedia. I have specifically selected the Profile and Sharing Attitude to answer the following question -
- What are the perceptions of people about contributing to Wikipedia ?
- How many people participate in social network and what is their perception about social networking ?
- Perception about publishing the work in open platforms ?
- How the perception of people change with Age and PhD degree?
- Perception when it comes to gender and domain ?
- How perception of people change when they are registered users of Wiki across all age types ?
Data Preparation
The data for this assignment is taken from the UCI Machine Learning Repository. This data is about the ongoing research on university faculty perceptions and practices of using Wikipedia as a teaching resource.
The data set wiki4HE is in the csv format. The first step here is to make the data in the readable format. The wiki4HE.csv file is delimited with semicolon, using the Text to Column the csv file that is delimited with semi-colon is changed to tabular format.
Loading the data in JMP and doing the Initial and Exploratory Analysis -
- There are few variables whose data type need to be changed such as Gender, Phd and University, which need to be changed from Continuous to nominal. At the same time Years of Experience need to be changed to Continuous.
- The data types of the responses are in Character - Nominal, which should be changed to Numeric - Nominal.
- Checking the missing values using the Missing Value Pattern of JMP.
- Using the Distribution, I will check the type of values captured in the columns and the statistics of that variable. While analyzing the distribution I saw that many columns has “?” as the value. Now based on the column we need to change “?” to “Other Domian” in Domains and “Others” in other columns.
- Now ease we will recode the data, based on the attribute information that we have, we will recode the data in JMP.
a. AGE: numeric
b. GENDER: 0=Male; 1=Female
c. DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics
d. PhD: 0=No; 1=Yes
e. YEARSEXP (years of university teaching experience): numeric
f. UNIVERSITY: 1=UOC; 2=UPF
g. UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
h. OTHER (main job in another university for part-time members): 1=Yes; 2=No
i. ‘OTHER_POSITION (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
j. USERWIKI (Wikipedia registered user): 0=No; 1=Yes
- There are 43 questions, for which we have the response, which were asked by the professors and the ratings were taken, however these 43 questions are categorized under 13 questions. So I have taken the mean of the scores of the subcategory questions and formed the new column with the main category question with that mean value. So now the 43 questions columns are reduced to 13 columns. Below is the formula that I have used to take the mean.