ISSS608 2016-17 T1 Assign2 Parikshit Ravindra MAYEE
Contents
Abstract
Surveys are generally conducted to understand and uncover answers to questions which can help improve services. Survey results, with proper analysis, can help in decision making. In this assignment, I analysed the survey results of faculty members from two Spanish universities on teaching uses of Wikipedia. My visual analysis of for this survey has been posted to Tableau public.
Theme
My theme for this assignment is to visualize the sentiments about Wikipedia as a teaching resource.
Questions
With this assignment I have tries to analysis and answer following questions about the survey results.
1. Who are my survey respondents?
1.1. Which universities did the respondents belong to?
1.2. What’s the demographic profile of my respondents?
1.3. Have the participants used Wikipedia or are they new to the concept?
2. What is the overall sentiment of my respondents for each question?
3. Does the sentiment change with change in demographics and other factors?
3.1. How does the sentiment vary with the change in participant’s age, experience, education, position, gender, etc.
3.2. Is there any difference in the sentiment for universities? Does it change if the participants have PhD?
The Visual analysis posted in Tableau public is interactive and helps to answer more questions than listed below.
Datasets
Following Dataset was used from for analysis : wiki4HE Data Set
Data Preparation
SAS JMP pro was used for initial data analysis and cleaning. Listed below are some of the important observations and corresponding actions taken for cleaning data.
Domain: Recoded data points with values “1” to “5” as per the Attribute Information. Data points with value “6” were recoded to “Others”. Missing values(?) were recoded to “No Response”.
Gender, PhD, University: Response recoded to words as per attribute Information. No missing values.
UOC_Position: Response recoded to position title as per attribute Information. For all records of Respondents from UPF, corresponding UOC position missing values changed to “N/A”. After this change, no missing values.
USERWIKI: Response recoded to Yes/No as per attribute information. Missing values were recoded to “No Response”.
OTHER & OTHER_POSITION: Details of the column names "OTHER" and "OTHER_POSITION” in the dataset doesn't match with the attribute details provided in attribute Information. It is possible that the description has been interchanged. Also, the name of column doesn’t match with attribute details for: ‘OTHERSTATUS’.
To avoid confusion and ease of further analysis I decided to update & redefine column names as follows in dataset:
OTHERSTATUS column name in dataset renamed to OTHER_POSITION_Title. This will represent the attribute ‘OTHER_POSITION’ (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct (main job in another university for part-time members): 1=Yes; 2=No
OTHER_POSITION column name in dataset was left unchanged. This will represent the attribute ‘OTHER’ (main job in another university for part-time members): 1=Yes; 2=No
Rationale to handle and recode data points for OTHER_POSITION & OTHER_POSITION_Title :
OTHER_POSITION_Title will be applicable only when the survey respondent has a job in other university. So, if the respondent has provided an answer for OTHER_POSITION_Title and if the corresponding OTHER_POSITION value is missing then the missing value for OTHER_POSITION is marked as “Yes”.
Now, If the OTHER_POSITION is marked as “No” then the corresponding missing value for OTHER_POSITION_Title was recoded to “N/A”.
Now, for columns OTHER_POSITION & OTHER_POSITION_Title, all remaining missing values were recoded to “No Response”.
Missing Values (?) for 5-Point Likert scale Questions: I checked for records with no response (?) for questions. No record was found where all questions were left unanswered. So all respondents has answered at least some part of survey and hence the records cannot be excluded from analysis.
I decided to replace all missing values (?) for questions with score of “3” which represents neutral sentiment on the 5-point Likert scale for a given question.
Exploration & Analysis
Tools Utilized
Results
References