Difference between revisions of "ISSS608 2016-17 T1 Assign2 Agrim Gairola"

From Visual Analytics and Applications
Jump to navigation Jump to search
(Created page with "ISSS608 2016-17 T1 Assign1_Agrim Gairola =Abstract= <br/>The assignment involves study of data based on a survey conducted among the faculty of two Spanish Universities on...")
 
 
(8 intermediate revisions by the same user not shown)
Line 17: Line 17:
 
=Data Preparation=
 
=Data Preparation=
 
The following steps were carried out to prepare the data for effective analysis:
 
The following steps were carried out to prepare the data for effective analysis:
<br/><br/>
+
<br/>
 +
<b>Data Manipulation</b>: A unique ID was given to each record for the ease of analysis.<br/>
 +
<b>Data Type Conversion</B>: On importing the data into JMP, age and work experience was kept in continuous data type. All the remaining data was converted to nominal data type.<BR/>
 +
<B>Missing data analysis</B>: Missing data analysis was performed on the data in order to identify the missing data and suitably recoding them.<br/>
 +
[[File:1.jpg|500px|frameless|center]]
 +
<B>Assumption</B>: There were several unambiguous values that could be noted throughout the data set. These values were recoded based on the below assumptions:<Br/>
 +
[[File:2.jpg|200px|frameless|center]]
  
=<b>2015 VS 2016 TRENDS</b>=
+
<I>All “?” values in survey items were taken as 2.5 such that it does not hamper the analysis while comparing the mean scores.</I>
=<br/>Approaches<br/>=
 
<p>
 
  
* Sales trend:In order to analyse the sales trends, we plot a area chart indicating sales quarter by quarter for 2015 and first half of 2016<br/>
+
<B>Additional Columns for Categories</B>: Additional columns were created for each of the categories such that it represented the survey items under it. For e.g.: A new column was created for Quality which would have the mean of values in QU1,QU2,QU3,QU4,QU5 thus representing the overall score for quality for the ease of analysis.<br/>
* Understanding the data:On analysis of the trend, we notice that the sales peak during the second quarter of 2015 as well as 2016. On a more general note, we also notice that the end of 2015 saw a fall in the sales of flats. However, 2016 saw an increase in the sales.<br/>
+
[[File:Ag4.jpg|400px|frameless|center]]<br/>
[[File:Untitled.jpg|500px|frameless|center]] <br/>
 
  
* Sales Trends by location: On analyzing the sales trends by locations, we notice that most of the locations follow a trend similar to the general trend. <br/>
+
=Demographics=
[[File:1b.jpg|1000px|frameless|center]]<br/>
 
*On deep-diving into this graph, we notice that the downtown Area does not follow the general trend. <br/>
 
[[File:1c.jpg|800px|frameless|center]]
 
 
<br/>
 
<br/>
 +
In order to understand the data set accurately, let us first analyse the demographics. <br/>
  
=Results=
+
<b>Treemap</B>: Below is a screenshot along with the link to the video of a treemap with several different hierarchies. This treemap accurately shows the demographics of the data in one look.<br/>
From the above graphs, the following conclusions can be made:<br/>
+
https://www.youtube.com/watch?v=BnRFP_Xuwvg&feature=youtu.be
*We notice the <b>highest sales of flats happens in Q2</B> of 2015 and the same trend is seen in 2016. <b>Ear end bonus</B> recieved at the financial year could be the reason behind this trend. <Br/><br/>
+
[[File:Ag5.jpg|800px|frameless|center]] <br/>
*The <b>market appears to be going down at the end of the year 2015 </b> however the overall trend shows that the market <b>picks up in the Q2 of 2016 </b> after seeing a drop in Q1 of 2016. <br/><br/>
+
 
 +
<b>Distribution</b>: On analysis of the distribution of the data, the following interesting patterns can be seen regarding the demographics of the participants:<br/>
 +
Age: Most participants (80%) who took part in the survey were between the age 32-53<br/>
 +
[[File:Ag6.jpg|500px|frameless|center]]
 +
Gender: The survey comprised of 58% males and 42% females.<br/>
 +
[[File:Ag7.jpg|400px|frameless|center]]
 +
Experience: 50% participants have over 4-15 Years of experience. This shows that the data set has a wide range of experience among participants<br/>
 +
[[File:Ag8.jpg|400px|frameless|center]]
 +
UOC Position:It is interesting to note that almost 72% of the faculty is adjunct staff.</br>
 +
[[File:Ag9.jpg|400px|frameless|center]]
 +
Domain: For 39.5% of the participants domain mentioned as 6 which has been assumed as “others”. A large number of participants belong to Arts and Humanities and Science.
 +
[[File:Ag10.jpg|400px|frameless|center]]
 +
Registered User: Another interesting thing to note is that majority of users of Wikipedia are unregistered.
 +
[[File:Ag11.jpg|500px|frameless|center]]
 +
 
 +
=Exploration and Analysis=
 
<br/>
 
<br/>
=<b>DISTRIBUTION OF PRICES IN 2015</B>=
+
Lets try to answer the following questions from the data sets using visual analytics techniques<br/>
=<br/>Approaches<br/>=
+
<b>Q1: Which is the best rated and worst rated survey Item? </B> <br/>
*Geographic distribution of prices: Below is the map of Singapore with color gradients indicating the median prices. It shows the prices based on geographical location.It can be observed that <b>most expensive flats are located in the area around the downtown area</b><br/><br/>
+
To answer the above question, we plot a bar graph between the survey categories and their mean score. We notice that Sharing attitude has obtained the highest mean score while use behavior has been scored the least. From this we can infer that the general perception of the survey participants is that Wikipedia is an excellent platform for sharing information due to its open platform, availability of academic journals and online collaborative material. On the other hand, the use behavior has been rated poorly since apparently the participants are not using it to create teaching material or develop educational activities.<br/>
[[File:2.1.jpg|1000px|frameless|center]] <br/>
+
[[File:Ag12.jpg|800px|frameless|center]]<br/>
  
*Relation of Resale price with Number of Stories: As can be seen from the graphic, as the <b>number of stories go higher, the prices appear to increase</b>. On isolating the Downtown area, it can be noted almost all <b>high storied expensive flats are located in Downtown Core.</b><br/>
+
<b>Q2 How have the question under category Sharing Attitude been rated?</B> <br/>
 +
We can arrive onto the answer to the above question by deepdiving into the category of Sharing Attitude. For this, we analyse SA1,SA2,Sa3 and plot them as shown below.<br/>
 +
[[File:Ag13.jpg|800px|frameless|center]] <br/>
  
[[File:2b.jpg|700px|frameless|center]]<br/><br/>
+
On inspecting the outlier, we notice that it is represents the rating of just 1 person (ID 40) and hence can be ignored as the opinion of one person could be biased and cannot be taken as a general trend. Hence it would be safe to say that the general perception is that Wikipedia is an excellent source for Sharing.
 +
[[File:Ag14.jpg|500px|frameless|center]] <br/>
  
*Relation of price with flat type: As can be observed form the below figure, <b>the highest price is garnered by Multi-generation housing</b>. This is closely followed by Executive housing and then 5 room housing. <br/>
 
  
[[File:2c.jpg|800px|frameless|center]] <br/>
+
<B>Q3 Is there a difference in the perception of registered Users and unregistered users?</B> <br/>
 +
The below line plot compared the rating by Registered Wiki users and unregistered wiki users indicating that there is a clear difference in the opinion between registered and unregistered users specially for the categories of Behavioural, Intention, Experience, Profile 2.0, Use Behaviour and Visibility.<br/>
  
 +
[[File:Ag15.jpg|500px|frameless|center]]<br/>
  
 
=Results=
 
=Results=
 
From the above graphs, the following conclusions can be made:<br/>
 
From the above graphs, the following conclusions can be made:<br/>
*The most <b>expensive flats are located around the Central Business District area</b>. These high prices could be the reason for the <b>sales dropping in Q2 2016 in the downtown core area</b> as discussed in the previous section.<br/>
+
*Sharing Attitude is the best rated category of the question where as Use behaviour is the most poorly rated category.<br/>
*Flats situated at higher stories garner higher prices. <b>Most of these High storied expensive flats are located in Downtown area.</b> <br/>
+
*It can be seen instructors and associate who do not have a PHD have scored a 5 for SA1,SA2 and SA3 indicating that the Non-PhD Instructors and associate professors use Wikipedia to publish, share and collaborate  with other members of the group <br/>
*The <b>highest prices are garnered by Multi-generation housing</b>. This is closely followed by Executive housing and then 5 room housing.<br/>  
+
*Majority of the participants of the Survey are unregistered members. This could lead to inaccuate reviews on the survey as unregistered users might not be aware of the full use of Wikipedia<br/>  
 +
*There is apparent disparity between opinions of the registered and unregistered users in various categories of questions.<br/>
 
<br/>
 
<br/>
  
=<b>SHARE OF PUBLIC HOUSING IN 2015</B>=
 
=<br/>Approaches<br/>=
 
*Share of Sale of Flat by number of Rooms: A large population of Singapore seems to believe that a 4 bedroom HDB suits their needs the best.<br/>
 
[[File:3a.jpg|500px|frameless|center]] <br/>
 
*Share of flat type by Location: The highest number of sales appear to be in the extreme east (Tampines) and extreme west (Jurong West) of Singapore. On cross referencing the below figure with the map in the previous section, we notice that the sale is higher in areas farther away from the downtown area.<br/>
 
 
[[File:3b.jpg|500px|frameless|center]] <br/>
 
  
*Share of property by number of stories: Contrary to the popular belief, flats at higher floors are not very popular. Most people prefer to buy flats in stories between 3-12. This could be because of the direct relation between the number of story and the prices.<br/>
+
=Interactive File=
 
+
https://public.tableau.com/profile/publish/ASsignment2/Dashboard1#!/publish-confirm
[[File:3c.jpg|500px|frameless|center]]
 
 
 
[[File:3d.jpg|500px|frameless|center]] <br/>
 
 
=Results=
 
From the above graphs, the following conclusions can be made:<br/>
 
*<b>4 Room flats are the most popular</b> types of houses among Singaporeans.<br/>
 
*The highest sales take place in Jurong west and Tampines. This is closely followed by Woodlands. We can conclude that the <b>sales are higher in areas where the prices are lower</B>(referring to the map in section 2) ie away from downtown area.<br/>
 
*<b>Most Singaporeans prefer to buy flats on stories between 3-12</B>.<br/>
 

Latest revision as of 18:11, 26 September 2016

ISSS608 2016-17 T1 Assign1_Agrim Gairola

Abstract


The assignment involves study of data based on a survey conducted among the faculty of two Spanish Universities on various aspects of Wikipedia. A set of 44 questions were asked from 913 members of the University on 13 different subjects of perception. The task at hand is to identify interesting patterns revealed in the survey regarding the perception of Wikipedia

Motivation

The assignment would enable us to gather interesting insights and patterns into the perception of people on Wikipedia based on its use, image, ease and several other factors.


Tools Used

  • Tableau version 10.0
  • JMP Pro
  • Treemaps HCI
  • Microsoft Office


Data Preparation

The following steps were carried out to prepare the data for effective analysis:
Data Manipulation: A unique ID was given to each record for the ease of analysis.
Data Type Conversion: On importing the data into JMP, age and work experience was kept in continuous data type. All the remaining data was converted to nominal data type.
Missing data analysis: Missing data analysis was performed on the data in order to identify the missing data and suitably recoding them.

1.jpg

Assumption: There were several unambiguous values that could be noted throughout the data set. These values were recoded based on the below assumptions:

2.jpg

All “?” values in survey items were taken as 2.5 such that it does not hamper the analysis while comparing the mean scores.

Additional Columns for Categories: Additional columns were created for each of the categories such that it represented the survey items under it. For e.g.: A new column was created for Quality which would have the mean of values in QU1,QU2,QU3,QU4,QU5 thus representing the overall score for quality for the ease of analysis.

Ag4.jpg


Demographics


In order to understand the data set accurately, let us first analyse the demographics.

Treemap: Below is a screenshot along with the link to the video of a treemap with several different hierarchies. This treemap accurately shows the demographics of the data in one look.
https://www.youtube.com/watch?v=BnRFP_Xuwvg&feature=youtu.be

Ag5.jpg


Distribution: On analysis of the distribution of the data, the following interesting patterns can be seen regarding the demographics of the participants:
Age: Most participants (80%) who took part in the survey were between the age 32-53

Ag6.jpg

Gender: The survey comprised of 58% males and 42% females.

Ag7.jpg

Experience: 50% participants have over 4-15 Years of experience. This shows that the data set has a wide range of experience among participants

Ag8.jpg

UOC Position:It is interesting to note that almost 72% of the faculty is adjunct staff.

Ag9.jpg

Domain: For 39.5% of the participants domain mentioned as 6 which has been assumed as “others”. A large number of participants belong to Arts and Humanities and Science.

Ag10.jpg

Registered User: Another interesting thing to note is that majority of users of Wikipedia are unregistered.

Ag11.jpg

Exploration and Analysis


Lets try to answer the following questions from the data sets using visual analytics techniques
Q1: Which is the best rated and worst rated survey Item?
To answer the above question, we plot a bar graph between the survey categories and their mean score. We notice that Sharing attitude has obtained the highest mean score while use behavior has been scored the least. From this we can infer that the general perception of the survey participants is that Wikipedia is an excellent platform for sharing information due to its open platform, availability of academic journals and online collaborative material. On the other hand, the use behavior has been rated poorly since apparently the participants are not using it to create teaching material or develop educational activities.

Ag12.jpg


Q2 How have the question under category Sharing Attitude been rated?
We can arrive onto the answer to the above question by deepdiving into the category of Sharing Attitude. For this, we analyse SA1,SA2,Sa3 and plot them as shown below.

Ag13.jpg


On inspecting the outlier, we notice that it is represents the rating of just 1 person (ID 40) and hence can be ignored as the opinion of one person could be biased and cannot be taken as a general trend. Hence it would be safe to say that the general perception is that Wikipedia is an excellent source for Sharing.

Ag14.jpg



Q3 Is there a difference in the perception of registered Users and unregistered users?
The below line plot compared the rating by Registered Wiki users and unregistered wiki users indicating that there is a clear difference in the opinion between registered and unregistered users specially for the categories of Behavioural, Intention, Experience, Profile 2.0, Use Behaviour and Visibility.

Ag15.jpg


Results

From the above graphs, the following conclusions can be made:

  • Sharing Attitude is the best rated category of the question where as Use behaviour is the most poorly rated category.
  • It can be seen instructors and associate who do not have a PHD have scored a 5 for SA1,SA2 and SA3 indicating that the Non-PhD Instructors and associate professors use Wikipedia to publish, share and collaborate with other members of the group
  • Majority of the participants of the Survey are unregistered members. This could lead to inaccuate reviews on the survey as unregistered users might not be aware of the full use of Wikipedia
  • There is apparent disparity between opinions of the registered and unregistered users in various categories of questions.



Interactive File

https://public.tableau.com/profile/publish/ASsignment2/Dashboard1#!/publish-confirm