Abstract

As the use of technology increases in data collection and storage in organizations, the demand for finding the insights from this data is a growing concern. Currently, most of the traditional business intelligence systems tend to confine to univariate and bivariate data analysis. The Project focuses on applying interactive data exploration and analysis techniques to discovery patterns in multivariate data to explore different relationships in the data. The topic used for exploring these techniques is “University faculty perceptions and practices of using Wikipedia as a teaching resource”. This is an ongoing research in which perception of colleagues and opinion about Wikipedia and the perceived quality of information in Wikipedia play a central role.

Theme of Interest and Motivation

The dataset used for this project is wiki4HE Data Set(https://archive.ics.uci.edu/ml/datasets/wiki4HE).

Identifying a theme of interest

The dataset provides information of the survey providers on multiple variables such as:
Age, Gender, Domain, PhD, Experience, University (Universitat Oberta de Catalunya, Universitat Pompeu Fabra), UOC_Position, Other, Other_Position, UserWiki The survey consists of questions in following categories to analyse the use of Wikipedia for education purposes.

Perceived Usefulness
Perceived Ease of Use
Perceived Enjoyment
Quality
Visibility
Social Image
Sharing attitude
Use behaviour
Profile 2.0
Job relevance
Behavioural intention
Incentives
Experience

To define the scope of the assignment, I am considering 5 of the above list of variables. Limiting the scope will provide me a confined field of analysis which can be furthered to other variables too. These variables are Perceived Usefulness, Quality, Visibility, Experience and Sharing Attitude.

Data Preparation

1. Import Data in JMP Pro for data preparation.

The data consists of 913 rows for the responses by the users.

2. Check for Missing Data pattern.

After initial analysis, the data consists of inconsistencies in terms of the attribute values. There are a number of missing values in multiple attributes. Following steps describe the fix for these missing values by studying the data dictionary provided with the data set.

3. Check for attribute appropriateness with the data set description.

Following are the attributes provided in the data dictionary.

     AGE: numeric 
     GENDER: 0=Male; 1=Female 
     DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics 
     PhD: 0=No; 1=Yes 
     YEARSEXP (years of university teaching experience): numeric 
     UNIVERSITY: 1=UOC; 2=UPF 
     UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct 
     OTHER (main job in another university for part-time members): 1=Yes; 2=No 
     OTHER_POSITION (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct 
     USERWIKI (Wikipedia registered user): 0=No; 1=Yes

While comparing the attributes, following observations are made:

Age, Gender, Yearsexp, University do not have any discrepancy.
DOMAIN: This domain has an extra value (6) and missing values which needs to be taken care of. Hence, recoding the Attribute values as below:

     1=Arts & Humanities
     2=Sciences
     3=Health Sciences
     4=Engineering & Architecture
     5=Law & Politics
     6=Others
     ?=Unknown (7)

Yearsexp: There are 23 records that are missing values for this attribute.

     As this number is not very significant (2.5%) recoding these as ‘0’.

UOC_POSITION (academic position of UOC members): This is a field which is specific for University type 1 (UOC), so recoding the missing values as NA for another type of university.

     1=Professor
     2=Associate
     3=Assistant
     4=Lecturer
     5=Instructor
     6=Adjunct 
     ?=NA (7)

OTHER (main job in another university for part-time members): This attribute is also specific to UOC as all the records for UPF. Recoding the missing values as NA

     1=Yes
     2=No
     ?=NA (3)

OTHER_POSITION (work as part-time in another university and UPF members): This attribute has 1 extra classification which is recoded as Other and missing values are recoded as NA.

     1=Professor
     2=Associate
     3=Assistant
     4=Lecturer
     5=Instructor
     6=Adjunct 
     7=Other
     ?=Unknown (8)

USERWIKI (Wikipedia registered user): This attribute defines whether the users are registered users if Wikipedia or not. There are 4 records where the data is missing. Hence, recoding this data as Unknown.

     0=No
     1=Yes
     ?=Unknown (2)

4. Change data types of the attributes.

Gender: Numeric, Nominal
PhD: Numeric, Nominal
Yearsexp: Numeric, Continuous
University: Numeric, Nominal
All Question attributes: Numeric, Continuous

5. Create new columns to understand the attributes better.

Gender
Domain
PhD
University
UOC_Position
Other
Other_Position
UserWiki

6. Exclude and hide attributes that are out of the scope of the assignment.

7. Export data in csv format which can be used for further visualization in another tools. (<v2>)

Tools Utilised

JMP – To explore and transform the data into usable data set. Also used to check distribution of the ratings for selected questions in scope of the assignment.
Tableau – To create interactive data visualizations for finding insights and relationships between multiple variables.
High-D – To create interactive visualization for analysing the quality criteria of the Wikipedia survey.

ISSS608 2016-17 T1 Assign2 Shishir Nehete

Contents

Abstract

Theme of Interest and Motivation

Identifying a theme of interest

Data Preparation

Tools Utilised

Interactive Result

Results

Citations

Comments

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools