ISSS608 2016-17 T1 Assign2 Agrim Gairola
ISSS608 2016-17 T1 Assign1_Agrim Gairola
Contents
Abstract
The assignment involves study of data based on a survey conducted among the faculty of two Spanish Universities on various aspects of Wikipedia. A set of 44 questions were asked from 913 members of the University on 13 different subjects of perception. The task at hand is to identify interesting patterns revealed in the survey regarding the perception of Wikipedia
Motivation
The assignment would enable us to gather interesting insights and patterns into the perception of people on Wikipedia based on its use, image, ease and several other factors.
Tools Used
- Tableau version 10.0
- JMP Pro
- Treemaps HCI
- Microsoft Office
Data Preparation
The following steps were carried out to prepare the data for effective analysis:
Data Manipulation: A unique ID was given to each record for the ease of analysis.
Data Type Conversion: On importing the data into JMP, age and work experience was kept in continuous data type. All the remaining data was converted to nominal data type.
Missing data analysis: Missing data analysis was performed on the data in order to identify the missing data and suitably recoding them.
Assumption: There were several unambiguous values that could be noted throughout the data set. These values were recoded based on the below assumptions:
All “?” values in survey items were taken as 2.5 such that it does not hamper the analysis while comparing the mean scores.
Additional Columns for Categories: Additional columns were created for each of the categories such that it represented the survey items under it. For e.g.: A new column was created for Quality which would have the mean of values in QU1,QU2,QU3,QU4,QU5 thus representing the overall score for quality for the ease of analysis.
Demographics
In order to understand the data set accurately, let us first analyse the demographics.
Treemap: Below is a screenshot along with the link to the video of a treemap with several different hierarchies. This treemap accurately shows the demographics of the data in one look.
Distribution: On analysis of the distribution of the data, the following interesting patterns can be seen regarding the demographics of the participants:
Age: Most participants (80%) who took part in the survey were between the age 32-53
Gender: The survey comprised of 58% males and 42% females.
Experience: 50% participants have over 4-15 Years of experience. This shows that the data set has a wide range of experience among participants
UOC Position:It is interesting to note that almost 72% of the faculty is adjunct staff.
Domain: For 39.5% of the participants domain mentioned as 6 which has been assumed as “others”. A large number of participants belong to Arts and Humanities and Science.
Registered User: Another interesting thing to note is that majority of users of Wikipedia are unregistered.
=Exploration and Analysis=
Lets try to answer the following questions from the data sets using visual analytics techniques
Q1: Which is the best rated and worst rated survey Item?
To answer the above question, we plot a bar graph between the survey categories and their mean score. We notice that Sharing attitude has obtained the highest mean score while use behavior has been scored the least. From this we can infer that the general perception of the survey participants is that Wikipedia is an excellent platform for sharing information due to its open platform, availability of academic journals and online collaborative material. On the other hand, the use behavior has been rated poorly since apparently the participants are not using it to create teaching material or develop educational activities.
Q2 How have the question under category Sharing Attitude been rated?
We can arrive onto the answer to the above question by deepdiving into the category of Sharing Attitude. For this, we analyse SA1,SA2,Sa3 and plot them as shown below.
On inspecting the outlier, we notice that it is represents the rating of just 1 person (ID 40) and hence can be ignored as the opinion of one person could be biased and cannot be taken as a general trend. Hence it would be safe to say that the general perception is that Wikipedia is an excellent source for Sharing.
Results
From the above graphs, the following conclusions can be made:
- The most expensive flats are located around the Central Business District area. These high prices could be the reason for the sales dropping in Q2 2016 in the downtown core area as discussed in the previous section.
- Flats situated at higher stories garner higher prices. Most of these High storied expensive flats are located in Downtown area.
- The highest prices are garnered by Multi-generation housing. This is closely followed by Executive housing and then 5 room housing.
SHARE OF PUBLIC HOUSING IN 2015
Approaches
- Share of Sale of Flat by number of Rooms: A large population of Singapore seems to believe that a 4 bedroom HDB suits their needs the best.
- Share of flat type by Location: The highest number of sales appear to be in the extreme east (Tampines) and extreme west (Jurong West) of Singapore. On cross referencing the below figure with the map in the previous section, we notice that the sale is higher in areas farther away from the downtown area.
- Share of property by number of stories: Contrary to the popular belief, flats at higher floors are not very popular. Most people prefer to buy flats in stories between 3-12. This could be because of the direct relation between the number of story and the prices.
Results
From the above graphs, the following conclusions can be made:
- 4 Room flats are the most popular types of houses among Singaporeans.
- The highest sales take place in Jurong west and Tampines. This is closely followed by Woodlands. We can conclude that the sales are higher in areas where the prices are lower(referring to the map in section 2) ie away from downtown area.
- Most Singaporeans prefer to buy flats on stories between 3-12.