ISSS608 2016-17 T1 Assign2 Mukund Krishna Ravi
Contents
Overview
In this digital economy age, massive and complex data have been captured and stored in organization databases and/or data warehouses. By and large, these data contain a large amount of variables of a particular product, customer or activity. Due to limitations in perceptual and screen space, graphical techniques available in traditional business intelligence systems tend to confine to uni variate and bi variate data such as bar chart, pie chart and scatter plot. As a result, many important relationships that live in these data remain undiscovered.For instance, in the wiki4HE dataset there are many relationships in between the survey data and the different academic segments. These observations are hidden and require more complex visualization techniques to uncover all the observations.
Theme of Interest and Motivation
Ongoing research on university faculty perceptions and practices of using Wikipedia as a teaching resource. Based on a Technology Acceptance Model, the relationships within the internal and external constructs of the model are analyzed. Both the perception of colleagues€™ opinion about Wikipedia and the perceived quality of the information in Wikipedia play a central role in the obtained model.In this particular problem I have chosen to focus only a few areas of interest and discovering intricate relations in the data set which would not be visible if we used basic visualization techniques . The following are a few key aspects of the problem-
How do various user segments and domains rate Wikipedia and the perceived quality of information in Wikipedia
To understand this behavior we analyse the following criteria
- How different Domains rate to perceived usefulness
- How different users rate perceived usefulness
- How different users rate the experience of wikipedia
- Different users rate the quality of wikipedia
Data Set
For this assignment, I have selected to use the wiki4HE dataset, which is an ongoing research on university faculty perceptions and practices of using Wikipedia as a teaching resource. Based on a Technology Acceptance Model, the relationships within the internal and external constructs of the model are analysed. Both the perception of university faculty teaching staff’s opinion about Wikipedia and the perceived quality of the information in Wikipedia play a central role in the obtained model. The original data set can be found on the UC Irvine Machine Learning Repository’s website [1]. The original data set is formatted as a CSV file.
Data Preparation
Before we can analyse and explore the data, it has to be prepped. JMP Pro and excel was used to clean and prepare the data set. The data set contains a total of 913 entries.
1.Step 1
In the first step we hide the columns which are irrelevant to us. The data set was downloaded in the .csv format and opened with Excel 2013. All the csv values were delimited using comma.The data set was separated into survey responses and non survey responses.Except for age, all the values are converted to ordinal values. The reason for this is because all the values have a natural order to them (Rating values from 1-6), also the values are categorical.
2.Step 2
In the second step we analyse all the non-survey values. In this step we are mainly looking to eliminate all values which have '?'.The DOMAIN category has 0.2 % of total values as '?',the UOC has 12.37% of the total number of values as '?' and the userwiki has 0.4% of the values as '?'. As these values cannot not be imputed in any way without using statistical models , they would have to be excluded from our analysis. To exclude these values from our analysis, we select the filter option from excel and exclude all values which have a '?'. Now the total number of rows has been reduced 796 values.
3.Step 3
In the third step we analyse the distribution of each of the non- survey data. We notice that there are '?' values in the data set which need to be handled. These values have been imputed to 0. These 0 entries in the data set will not have an impact on the distribution of each of the features. This is carried out using filters in excel 2013. The reason 0 will have no impact on the distribution is because the value 0 has no meaning in the analysis.
4.Step 4
In the fourth step we use excel to create an average rating for all the parameters using excel. We also stack all the values using survey response value. This operation is performed on JMP. We also include the descriptions of each of the questions in the stacked data set. In the domain field we also include all values which contain the number 6 , more than 50% of the values have this domain number. All the values in this domain have been re-coded to unknown
Tools used
The tools that have been used in the analysis are
- JMP PRO
- High-D
- Tableau
- Tibco Spotfire
Analysis
In our analysis we first try to understand how the overall distribution and the relation between various parameters like the Domain, UOC and age are.
The following observations could be made from the heat map
- Around 33% of all the domains is of unknown type and the majority of the faculty in this domain are adjunct . The remaining faculty are assistant, lecturer, associate, and professor. The age range of the faculty are from 27- 59.
- In the arts and humanities domain , the age group of the faculty lies between 32- 62. The major chunk of sample is of adjunct faculty type. The oldest in the arts and humanities section is of adjunct faculty type. Interestingly all the adjunct faculty seem to the oldest among all the faculty types. The reason for this could be all the faculty which are of adjunct type could be emeritus professors.
- In the Law and politics domain the age group ranges from 30-59. The faculty comprise of Adjunct,Associate,Lecturer,Assistant and Professor. In this domain the faculty seem to be of slightly lower age group. Evidently this group seems to have younger people
- In the Engineering domain, the age group of the faculty lies between 28-69. The faculty comprise of Adjunct,Associate,Assistant and Lecturer. In this domain the faculty are mainly of adjunct. Interestingly, there are a large number younger faculty who are adjunct.
- In the Health sciences department, the age group of the faculty lies between 28-62. But, majority of the faculty in this domain are quite old except a few of them. Majority of the faculty in this domain are of Adjunct type. The remaining faculty are of adjunct,associate,assistant and lecturer.
- In the Sciences department, the age group of the faculty lies between 29-64. Similar to all the other departments the most of the faculty are adjunct faculty.
Interactive Divergent stacked Bar chart
- The divergent stacked bar chart allow quick visualization of the different responses which the respondents have towards each question. The filter on the right includes domain, question type, age range, the years of experience, and teaching position. The dashboard helps in providing understanding of the respondents' sentiments based on the topic of interest, for example I may be concerned with the experience of the respondents with Wikipedia and their use behaviour of Wikipedia, I would then proceed to filter out these questions using the Question Type filter bar to review the responses.
Please refer to interactive dashboard at the following url: https://public.tableau.com/profile/publish/DivergentStackedBarChart/Sheet1#!/publish-confirm
Adoption Status of Wikipedia among the respondents
- Before we try to understand the adoption status of wikipedia among the respondents, we define the parameters which have to be considered for our analysis. The main parameters or features which are relevant to our analysis are perceived usefulness, Quality and Experience. We notice that, adoption of wiki to teach students is not very prevalent among the respondents as more than 60% of the respondents do not prefer wikipedia and have given it a score of 2 and below. On the other hand wikipedia seems to have good insights on many areas of importance , as most of the respondents seem to use wikipedia for their personal and area of interest(most of the respondents have given wiki a score of 4 and 5
- In terms of quality most of the respondents seem to prefer wiki or are neutral towards wikipedia. for all five of the questions -"Articles in Wikipedia are reliable " ," Articles in Wikipedia are updated " ,"Articles in Wikipedia are comprehensive","In my area of expertise, Wikipedia has a lower quality than other educational resources","I trust in the editing system of Wikipedia" most of the respondents have given either a positive or neutral feed back(more than 70%) for all the questions.
- Most of the faculty do not find wikipedia a useful tool for teaching. Most of the respondents do not prefer wikipedia for teaching (ratings 3 and 4 make up more than 50% ). All the respondents feel that wikipedia does not improve the overall student learning experience.
How do various user segments and domains rate Wikipedia and the perceived quality of information in Wikipedia
To evaluate this four questions have been answered.Each of these situations have been evaluated separately using the Tree Map software.
In the above analysis we have explored four different questions -"How different Domains rate to perceived usefulness","How different users rate perceived usefulness","How different users rate the experience of wikipedia" and "Different users rate the quality of wikipedia" From each of the heat maps we notice that a unique insight is produced. To produce all the insights the average score has been considered in each of the domains
- Among all the domains, we notice that the unknown domain does not find wikipedia as a useful software. Apart from Law and Politics most of the domains do not agree with the usefulness of wikipedia. Of all the departments the unknown and the engineering departments consider wikipedia as completely useless as most of them have given wikipedia a rating of above 4. Due to large size of the Unknown domain, we notice that most of the respondents in this domain find wikipedia a very useful software.
- Among all the faculty respondents, most of the adjunct faculty do not have much of an opinion on the quality of wikipedia contents. That being said, there is also sizable number of people among the adjunct faculty who believe that wikipedia is a useful software. Almost all the faculty seem to trust the information that has been published on wikipedia. Only a small portion of the total number of respondents from the adjunct faculty have rated the quality of content as 5 on wikipedia.
- A sizable number of respondents do not find wikipedia as the most useful software especially the adjunct faculty.Among all the assistant professors only a handful of them find wikipedia as a useful tool for imparting knowledge. A similar trend is seen among lecturers and the associates
- The user experience seems to have neutral sentiment(rating of 3). Most of the faculty seem to be okay with the wikipedia experience. Though a sizable number of adjunct faculty have had a positive experience towards wikipedia. We also notice that most of faculty have not had bad experiences with wikipedia
References
Meseguer, A., Aibar, E., Lladós, J., Minguillón, J., Lerga, M. (2015). “Factors that influence the teaching use of Wikipedia in Higher Educationâ€. JASIST, Journal of the Association for Information Science and Technology. ISSN: 2330-1635. doi: 10.1002/asi.23488.