Difference between revisions of "ISSS608 2016-17 T1 Assign2 Shishir Nehete"
Line 109: | Line 109: | ||
Using above variables, the questions for investigation can be framed as below: | Using above variables, the questions for investigation can be framed as below: | ||
# Analysis of Respondents who have taken part in the survey. | # Analysis of Respondents who have taken part in the survey. | ||
− | # How do Wiki users across domain | + | # How do Wiki users across domain perceive usefulness of Wikipedia. |
− | |||
# How do users across domain and age group rate Visibility of Wikipedia. | # How do users across domain and age group rate Visibility of Wikipedia. | ||
+ | # How comfortable are the users with Sharing their work on Wikipedia. | ||
# How does Experience of the users matter in the investigation. | # How does Experience of the users matter in the investigation. | ||
− | # How | + | # How do users across domains rate Wikipedia in terms of the quality. |
<br clear="all"/> | <br clear="all"/> |
Revision as of 15:12, 26 September 2016
Contents
- 1 Abstract
- 2 Theme of Interest and Motivation
- 3 Tools Utilised
- 4 Interactive Result
- 5 Results
- 5.1 Analysis of Respondents who have taken part in the survey.
- 5.2 How do Wiki users across domain rate perceived usefulness of Wikipedia.
- 5.3 How do users across domains rate Wikipedia in terms of the quality.
- 5.4 How do users across domain and age group rate Visibility of Wikipedia.
- 5.5 How does Experience of the users matter in the investigation.
- 5.6 How comfortable are the users with Sharing their work on Wikipedia.
- 6 Citations
- 7 Comments
Abstract
As the use of technology increases in data collection and storage in organizations, the demand for finding the insights from this data is a growing concern. Currently, most of the traditional business intelligence systems tend to confine to univariate and bivariate data analysis.
The Project focuses on applying interactive data exploration and analysis techniques to discovery patterns in multivariate data to explore different relationships in the data.
The topic used for exploring these techniques is “University faculty perceptions and practices of using Wikipedia as a teaching resource”. This is an ongoing research in which perception of colleagues and opinion about Wikipedia and the perceived quality of information in Wikipedia play a central role.
Theme of Interest and Motivation
The dataset used for this project is wiki4HE Data Set(https://archive.ics.uci.edu/ml/datasets/wiki4HE).
Identifying a theme of interest
The dataset provides information of the survey providers on multiple variables such as:
Age, Gender, Domain, PhD, Experience, University (Universitat Oberta de Catalunya, Universitat Pompeu Fabra), UOC_Position, Other, Other_Position, UserWiki
The survey consists of questions in following categories to analyse the use of Wikipedia for education purposes.
- Perceived Usefulness
- Perceived Ease of Use
- Perceived Enjoyment
- Quality
- Visibility
- Social Image
- Sharing attitude
- Use behaviour
- Profile 2.0
- Job relevance
- Behavioural intention
- Incentives
- Experience
To define the scope of the assignment, I am considering 5 of the above list of variables. Limiting the scope will provide me a confined field of analysis which can be furthered to other variables too. These variables are Perceived Usefulness, Quality, Visibility, Experience and Sharing Attitude.
Data Preparation
1. Import Data in JMP Pro for data preparation.
- The data consists of 913 rows for the responses by the users.
2. Check for Missing Data pattern.
- After initial analysis, the data consists of inconsistencies in terms of the attribute values. There are a number of missing values in multiple attributes. Following steps describe the fix for these missing values by studying the data dictionary provided with the data set.
3. Check for attribute appropriateness with the data set description.
- Following are the attributes provided in the data dictionary.
AGE: numeric GENDER: 0=Male; 1=Female DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics PhD: 0=No; 1=Yes YEARSEXP (years of university teaching experience): numeric UNIVERSITY: 1=UOC; 2=UPF UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct OTHER (main job in another university for part-time members): 1=Yes; 2=No OTHER_POSITION (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct USERWIKI (Wikipedia registered user): 0=No; 1=Yes
While comparing the attributes, following observations are made:
- Age, Gender, Yearsexp, University do not have any discrepancy.
- DOMAIN: This domain has an extra value (6) and missing values which needs to be taken care of. Hence, recoding the Attribute values as below:
1=Arts & Humanities 2=Sciences 3=Health Sciences 4=Engineering & Architecture 5=Law & Politics 6=Others ?=Unknown (7)
- Yearsexp: There are 23 records that are missing values for this attribute.
As this number is not very significant (2.5%) recoding these as ‘0’.
- UOC_POSITION (academic position of UOC members): This is a field which is specific for University type 1 (UOC), so recoding the missing values as NA for another type of university.
1=Professor 2=Associate 3=Assistant 4=Lecturer 5=Instructor 6=Adjunct ?=NA (7)
- OTHER (main job in another university for part-time members): This attribute is also specific to UOC as all the records for UPF. Recoding the missing values as NA
1=Yes 2=No ?=NA (3)
- OTHER_POSITION (work as part-time in another university and UPF members): This attribute has 1 extra classification which is recoded as Other and missing values are recoded as NA.
1=Professor 2=Associate 3=Assistant 4=Lecturer 5=Instructor 6=Adjunct 7=Other ?=Unknown (8)
- USERWIKI (Wikipedia registered user): This attribute defines whether the users are registered users if Wikipedia or not. There are 4 records where the data is missing. Hence, recoding this data as Unknown.
0=No 1=Yes ?=Unknown (2)
4. Change data types of the attributes.
- Gender: Numeric, Nominal
- PhD: Numeric, Nominal
- Yearsexp: Numeric, Continuous
- University: Numeric, Nominal
- All Question attributes: Numeric, Continuous
5. Create new columns to understand the attributes better.
- Gender
- Domain
- PhD
- University
- UOC_Position
- Other
- Other_Position
- UserWiki
6. Exclude and hide attributes that are out of the scope of the assignment.
7. Export data in csv format which can be used for further visualization in another tools. (<v2>)
Define questions for investigation
Using above variables, the questions for investigation can be framed as below:
- Analysis of Respondents who have taken part in the survey.
- How do Wiki users across domain perceive usefulness of Wikipedia.
- How do users across domain and age group rate Visibility of Wikipedia.
- How comfortable are the users with Sharing their work on Wikipedia.
- How does Experience of the users matter in the investigation.
- How do users across domains rate Wikipedia in terms of the quality.
Tools Utilised
- JMP – To explore and transform the data into usable data set. Also used to check distribution of the ratings for selected questions in scope of the assignment.
- Tableau – To create interactive data visualizations for finding insights and relationships between multiple variables.
- High-D – To create interactive visualization for analysing the quality criteria of the Wikipedia survey.
Interactive Result
Results
Analysis of Respondents who have taken part in the survey.
How do Wiki users across domain rate perceived usefulness of Wikipedia.
How do users across domains rate Wikipedia in terms of the quality.
How do users across domain and age group rate Visibility of Wikipedia.
How does Experience of the users matter in the investigation.
How comfortable are the users with Sharing their work on Wikipedia.
Citations
Meseguer, A., Aibar, E., Lladós, J., Minguillón, J., Lerga, M. (2015). “Factors that influence the teaching use of Wikipedia in Higher Educationâ€. JASIST, Journal of the Association for Information Science and Technology. ISSN: 2330-1635. doi: 10.1002/asi.23488.
Comments