Difference between revisions of "ISSS608 2016-17 T1 Assign2 Franky Eddy"

From Visual Analytics and Applications
Jump to navigation Jump to search
(Created page with " Abstract = Wiki is * What are the shares of resale public housing supply? * How is the distribution of resale public housing prices? * How is the comparison between patterns...")
 
 
(66 intermediate revisions by the same user not shown)
Line 1: Line 1:
Abstract =
+
= Abstract =
Wiki is  
+
Nowadays, internet has been one of the most popularly used technology to explore and find useful information. Wikipedia is one of the most commonly used source for studying as well as teaching resource.From the survey data of faculty members from two Spanish universities on teaching uses of Wikipedia, there are some insights and findings that wanted to be explored:
* What are the shares of resale public housing supply?
+
* How do respondents with different age groups rate their experience on using Wikipedia?
* How is the distribution of resale public housing prices?
 
* How is the comparison between patterns of this year and last year?
 
  
To answer these questions, data visualization is used to get insights:
+
* What is the rating (Likert Value) of each question or statement used in the survey?
* 3-room, 4-room, and 5-room flat made up more than 90% of all sales
 
* Most of the flats' price are from 300k to 450k
 
* Decrease in percentage of sales of 3-room flat from 2015 to 2016
 
  
 +
* How do respondents with different gender from different universities, domains, and age group rate their experience on using Wikipedia?
  
 +
* How do universities, domain, or Wikiuser rate their contribution to Wikipedia?
  
= Problem and Motivation =
+
To answer these questions, data visualization is used to get insights:
For someone who wanted to buy or sell a flat, there are several things that they usually wanted to know.
+
* Most of the respondents aged 20-30 tends to rate higher (Strongly Agree and Agree) on their experience using Wikipedia compared to other age groups
The following are the questions that usually comes to their minds when considering to buy or sell a flat:
+
* Respondents from age group of 60 years old and above respond "Neutral" on their experience using Wikipedia more than other age groups
 +
* Respondents used Wikipedia as a reference for their academic related issues but not citing Wikipedia in their academic papers
 +
* Respondents from UPF are on both extremes (very high and very low) in terms of Likert Value rating that they give on the survey
 +
* Wiki users tends to rate the survey higher than non-Wiki users rate
  
*  What are the shares of the resale public housing supply in 2015?
+
<br />
*  What are the distribution of the resale public housing prices in 2015?
 
*  With reference to the findings, compare the patterns of the first-half of 2016 with the patterns of 2015.
 
  
 +
= Overview of Data =
 +
The dataset used is the survey of faculty members from two Spanish universities on teaching uses of Wikipedia.
  
 
= Approaches =
 
= Approaches =
 +
The step by step approaches done can be seen below.
 +
<br />
 +
==Step 1:  Identify a theme of interest==
 +
The wiki dataset consists of answers from survey for research on university faculty perceptions and practices of using Wikipedia as a teaching resource. Theme of interest that can be explored from the dataset is  the relationship between different attributes of the respondents and how they assess based on the survey.
  
=== Getting the Data ===
+
==Step 2: Define questions for investigation==
The first thing to do is getting the data. The data is downloaded from data.gov.sg, the national data portal of Singapore. The data that will be used is Resale Flat Prices data from March 2012 onwards. https://data.gov.sg/dataset/resale-flat-prices
 
 
 
=== Data Preparation ===
 
After getting the data, the next step is exploring and preparing the data so that further analysis can be done. After exploring the data, it is known that the variable month is in text format instead of number. Therefore, a new formula is created to change the month from text to number. The formula for changing the month is DATE(LEFT([Month],4)+"-"+RIGHT([Month],2)+"-01"). Other variables are already in correct format. Therefore, no changes made to other variables.
 
 
 
=== Analysis ===
 
After exploring and preparing the data, analysis of the data can be done by using Tableau 10.0. The analysis mainly consists of 3 parts which are answering the questions that usually comes to their minds when considering to buy or sell a flat.
 
 
 
 
 
= Tools Utilized =
 
The only tool used is Tableau 10.0.
 
 
   
 
   
 +
There are 4 questions that will be investigated based on the theme of interests defined:
  
= Results =
+
* How do respondents with different age groups rate their experience on using Wikipedia?
There are 3 results from the analysis:
 
 
 
 
 
== Shares of The Resale Public Housing Supply in 2015 ==
 
 
 
 
 
=== Flat Model ===
 
[[File:Flat Model 2015.png]]
 
 
 
From the graph, it can be seen that out of 18 different flat models, there are 4 flat models that contributes to up to 80% of all sales. They are: Model A (29.35%), Improved (26.15%), New Generation (17.51%), and Premium Apartment (8.91%).
 
 
 
=== Flat Type ===
 
[[File:Flat Type 2015.png]]
 
 
 
Based on the graph, it can be seen that 3 room (28.02%), 4 room (40.16%), and 5 room (23.34%) flat type made up more than 90% of all sales.
 
 
 
 
 
=== Distribution of Area ===
 
[[File:Distribution of Town (2015).png]]
 
 
 
From the distribution of area, it is known that Jurong West (8.102%) and Tampines (7.366%) are the area with most sales while Bukit Timah is the lowest sales area with only less than 1% of all sales.
 
 
 
 
 
== Distribution of The Resale Public Housing Prices in 2015 ==
 
=== Resale Price Distribution in 2015 ===
 
[[File:Resale Price Distribution in 2015.png]]
 
 
 
From the distribution of the resale price, it can be seen that most of the flats' price are ranging from 300k to 450k.
 
 
 
 
 
=== Resale Price by Flat Type in 2015===
 
[[File:Resale Price by Flat Type.png]]
 
 
 
From this graph, it can be seen that the resale price of 1 room and 2 room flat type is very low (<300k) compared to other flat types while Executive type's resale price is the highest (400k to 1000k). It can also be seen that the 3 room flat type has widest range of price (from 200k to 1050k).
 
 
 
 
 
=== Resale Price by Flat Model in 2015===
 
[[File:Resale Price by Flat Model.png]]
 
 
 
It can be seen from the graph that Type S1, Type S2, and Terrace are the flat model with highest resale price. The standard and Improved flat model have a wide range of price (from 200k to >900k). Model A2 and Simplified model has the lowest resale price.
 
 
 
 
 
== Comparison of The Patterns of The First-Half of 2016 with The Patterns of 2015 ==
 
=== Comparison of Flat Type ===
 
[[File:Flat Type (2015 vs 2016).png]]
 
 
 
From this graph, it can be seen that the sales of the flat type from 2015 to 2016 is quite similar with 4 room flat type still dominating the market with more than 40% of all sales. It can also be seen from the graph that for 3 room flat type, there is a decrease in percentage of sales from 2015 to 2016.
 
 
 
 
 
=== Comparison of Flat Model ===
 
[[File:Flat Model (2015 vs 2016).png]]
 
 
 
  
From the comparison graph, it can be seen that Model A has the most sales in 2015 and 2016. Model A even increases about 2.5% in terms of percentage of sales from 2015 to 2016.
+
* What is the rating (Likert Value) of each question or statement used in the survey?
  
 +
* How do respondents with different gender from different universities, domains, and age group rate their experience on using Wikipedia?
  
= Infographic =
+
* How do universities, domain, or Wikiuser rate their contribution to Wikipedia?
  
The infographic that summarizes all the results and findings can be seen below.
+
==Step 3:  Find appropriate data attributes==
 
   
 
   
 +
After defining the questions, the next step is finding the appropriate data attributes. The data attributes that will be used are University, Domain, Gender, and UserWiki. These attributes will be used to analyse the survey results and see whether there are some insights that can be obtained.
  
[[File:Assign 1 Infographic.png]]
 
  
= Comments =
+
=== Data Preparation ===
Abstract =
+
Before using the data to do analysis, firstly data preparation needs to be done.
Place to stay is one of basic human needs. HDB is one of the most frequently chosen place to stay. In considering which flat they wanted to buy, there are few questions that usually comes to their minds:
+
The first thing to be done is recoding all "?" values to blank values.
* What are the shares of resale public housing supply?
+
After that, other variables such as Gender, Domain, PhD, YearsExp, University, UOC_Position, Other Position, OtherStatus, and UserWiki are also recoded as can be seen in the figure below.
* How is the distribution of resale public housing prices?
 
* How is the comparison between patterns of this year and last year?
 
  
To answer these questions, data visualization is used to get insights:
+
[[File:Recode Franky.png]]
* 3-room, 4-room, and 5-room flat made up more than 90% of all sales
 
* Most of the flats' price are from 300k to 450k
 
* Decrease in percentage of sales of 3-room flat from 2015 to 2016
 
  
 +
After recoding the variable values, next, a new column named "ID" is created to be assigned to each respondents. This is done to help in visualizing the data.
 +
<br /> After adding a new column "ID", next the dataset needs to be reshaped so that every question has one row. The reshaped data can be seen in the figure below.
 +
[[File:Reshape Franky.png]]
 +
<br />
 +
<br />
 +
After reshaping the data, the data is now ready to be used for analysis.
 +
<br />
  
 +
Lastly, the Question code (e.g. PU1,PU2,PU3) needs to be changed to the actual question by changing the alias so that it is more meaningful in the visualization.
 +
<br />
  
= Problem and Motivation =
+
= Analysis =
For someone who wanted to buy or sell a flat, there are several things that they usually wanted to know.
+
After preparing the data, next step is to do the analysis. The analysis is done to answer the questions that have been defined.
The following are the questions that usually comes to their minds when considering to buy or sell a flat:
 
  
*  What are the shares of the resale public housing supply in 2015?
+
== Results ==
*  What are the distribution of the resale public housing prices in 2015?
+
There are 3 results from the analysis:
*  With reference to the findings, compare the patterns of the first-half of 2016 with the patterns of 2015.
+
<br />
 +
[[File:Age Group Responses Franky.png]]
 +
<br />
 +
<br />
 +
<br />
 +
From the graph, there are few interesting observations can be observed. First, it can be seen that most of the respondents aged 20-30 tends to rate higher (Strongly Agree and Agree) for the statement "I consult Wikipedia for personal issues" (about 75%) compared to other age groups. Another interesting observation from this chart is that respondents from age group of 60 years old and above respond "Neutral" to this statement more than other age groups. The overall distribution of the response for this question is relatively skewed to "Agree". Respondents aged between 30 and 60 have relatively similar distribution with most of the respondents respond "Agree" to this statement.
 +
.
 +
<br />
 +
<br />
 +
<br />
 +
[[File:Likert Franky.png]]
 +
<br />
 +
<br />
 +
From this chart, it can be seen that the statement "I consult Wikipedia for personal issues" have the highest rating (3.651) compared to 5 other statements while citing Wikipedia in academic papers has the lowest Rating (2.027) compared to the 5 other statements. Another statement that also have relatively high rating (3.492) is "I consult Wikipedia for academic related issues". This also indicates that most of the respondents used Wikipedia as a reference for their academic related issues but they will usually decide not to cite Wikipedia in their academic papers.
  
 +
<br />
 +
<br />
  
= Approaches =
+
[[File:Treemap Franky.png]]
 +
<br />
 +
<br />
 +
From the Treemap plot, it can be seen that most of the respondents are from UOC and "Others" domain. It can also be seen from the plot that respondents from UPF are on both extremes (very high and very low) in terms of Likert Value rating. This can be seen from the bottom right side of the Treemap Plot where the University and Domain of the respondent are same but there is a very high (darker color) Likert Value rating, and there is also a very low (light color) Likert value rating.
 +
<br />
 +
<br />
 +
[[File:Trellis Franky.png]]
 +
<br />
 +
<br />
 +
From the trellis chart above, it can be seen that there is a distinct difference in Likert Value rating between Wiki user and Non-wiki user. Wiki users tends to rate higher (average Likert value >2) than non-Wiki users rate (average Likert value <=1.5).This is probably because Wiki users are more used to using it and therefore understand the advantage of Wikipedia better which leads to higher rating on the survey. Another interesting thing can be observed from this graph is that respondents from UOC are on both extremes (highest and lowest) in terms of average likert value.
 +
<br />
 +
<br />
  
=== Getting the Data ===
+
[[File:Question Group.png]]
The first thing to do is getting the data. The data is downloaded from data.gov.sg, the national data portal of Singapore. The data that will be used is Resale Flat Prices data from March 2012 onwards. https://data.gov.sg/dataset/resale-flat-prices
+
<br />
 
+
<br />
=== Data Preparation ===
+
From this chart, it can be seen that most of the respondents rated high (average Likert value = 4.235) for "Sharing Attitude" questions while they rated relatively lower (average Likert value <= 2.5) for "Profile" and "Use Behavior" questions. This indicates that most of the respondents realized that it is important for them to share and publish academic content and research results in online platform such as Wikipedia but they are unlikely to participate actively or recommend their students and colleagues to use Wikipedia.
After getting the data, the next step is exploring and preparing the data so that further analysis can be done. After exploring the data, it is known that the variable month is in text format instead of number. Therefore, a new formula is created to change the month from text to number. The formula for changing the month is DATE(LEFT([Month],4)+"-"+RIGHT([Month],2)+"-01"). Other variables are already in correct format. Therefore, no changes made to other variables.
+
<br />
 
+
<br />
=== Analysis ===
+
= Interactive Visualization =
After exploring and preparing the data, analysis of the data can be done by using Tableau 10.0. The analysis mainly consists of 3 parts which are answering the questions that usually comes to their minds when considering to buy or sell a flat.
+
These are the links to the Tableau Public : <br />Dashboard 1 [https://public.tableau.com/profile/franky.eddy#!/vizhome/Dashboard1Updated_Franky/Dashboard1]
 +
<br />
 +
Dashboard 2[https://public.tableau.com/profile/franky.eddy#!/vizhome/Dashboard2_Franky/Dashboard2]
  
  
 
= Tools Utilized =
 
= Tools Utilized =
The only tool used is Tableau 10.0.
+
The tools used for this analysis are Tableau 10.0, JMP Pro12, and Microsoft Excel, Tableau Public.
+
<br />
 
+
* Tableau 10.0 is used to visualize the data.
= Results =
+
* JMP Pro12 is used to clean and prepare the data before it can be used for further analysis.
There are 3 results from the analysis:
+
* Microsoft Excel is used to clean and reshape the data so that it will be easier to visualize.
 
+
* Tableau Public is used to enable interactive visualization of the charts.
 
+
<br />
== Shares of The Resale Public Housing Supply in 2015 ==
 
 
 
 
 
=== Flat Model ===
 
[[File:Flat Model 2015.png]]
 
 
 
From the graph, it can be seen that out of 18 different flat models, there are 4 flat models that contributes to up to 80% of all sales. They are: Model A (29.35%), Improved (26.15%), New Generation (17.51%), and Premium Apartment (8.91%).
 
 
 
=== Flat Type ===
 
[[File:Flat Type 2015.png]]
 
 
 
Based on the graph, it can be seen that 3 room (28.02%), 4 room (40.16%), and 5 room (23.34%) flat type made up more than 90% of all sales.
 
 
 
 
 
=== Distribution of Area ===
 
[[File:Distribution of Town (2015).png]]
 
 
 
From the distribution of area, it is known that Jurong West (8.102%) and Tampines (7.366%) are the area with most sales while Bukit Timah is the lowest sales area with only less than 1% of all sales.
 
 
 
 
 
== Distribution of The Resale Public Housing Prices in 2015 ==
 
=== Resale Price Distribution in 2015 ===
 
[[File:Resale Price Distribution in 2015.png]]
 
 
 
From the distribution of the resale price, it can be seen that most of the flats' price are ranging from 300k to 450k.
 
 
 
 
 
=== Resale Price by Flat Type in 2015===
 
[[File:Resale Price by Flat Type.png]]
 
 
 
From this graph, it can be seen that the resale price of 1 room and 2 room flat type is very low (<300k) compared to other flat types while Executive type's resale price is the highest (400k to 1000k). It can also be seen that the 3 room flat type has widest range of price (from 200k to 1050k).
 
 
 
 
 
=== Resale Price by Flat Model in 2015===
 
[[File:Resale Price by Flat Model.png]]
 
 
 
It can be seen from the graph that Type S1, Type S2, and Terrace are the flat model with highest resale price. The standard and Improved flat model have a wide range of price (from 200k to >900k). Model A2 and Simplified model has the lowest resale price.
 
 
 
 
 
== Comparison of The Patterns of The First-Half of 2016 with The Patterns of 2015 ==
 
=== Comparison of Flat Type ===
 
[[File:Flat Type (2015 vs 2016).png]]
 
 
 
From this graph, it can be seen that the sales of the flat type from 2015 to 2016 is quite similar with 4 room flat type still dominating the market with more than 40% of all sales. It can also be seen from the graph that for 3 room flat type, there is a decrease in percentage of sales from 2015 to 2016.
 
  
 +
Charts used: Bar Chart, Stacked bar Chart, Treemap, Trellis Chart
  
=== Comparison of Flat Model ===
+
<br />
[[File:Flat Model (2015 vs 2016).png]]
 
  
 +
=References=
 +
Below are some of the references that are used as a guide:<br />
 +
http://www.datarevelations.com/using-tableau-to-visualize-survey-data-part-1.html<br />
 +
http://www.datarevelations.com/using-tableau-to-visualize-survey-data-part-2.html<br />
  
From the comparison graph, it can be seen that Model A has the most sales in 2015 and 2016. Model A even increases about 2.5% in terms of percentage of sales from 2015 to 2016.
 
  
 
= Comments =
 
= Comments =

Latest revision as of 00:39, 10 October 2016

Abstract

Nowadays, internet has been one of the most popularly used technology to explore and find useful information. Wikipedia is one of the most commonly used source for studying as well as teaching resource.From the survey data of faculty members from two Spanish universities on teaching uses of Wikipedia, there are some insights and findings that wanted to be explored:

  • How do respondents with different age groups rate their experience on using Wikipedia?
  • What is the rating (Likert Value) of each question or statement used in the survey?
  • How do respondents with different gender from different universities, domains, and age group rate their experience on using Wikipedia?
  • How do universities, domain, or Wikiuser rate their contribution to Wikipedia?

To answer these questions, data visualization is used to get insights:

  • Most of the respondents aged 20-30 tends to rate higher (Strongly Agree and Agree) on their experience using Wikipedia compared to other age groups
  • Respondents from age group of 60 years old and above respond "Neutral" on their experience using Wikipedia more than other age groups
  • Respondents used Wikipedia as a reference for their academic related issues but not citing Wikipedia in their academic papers
  • Respondents from UPF are on both extremes (very high and very low) in terms of Likert Value rating that they give on the survey
  • Wiki users tends to rate the survey higher than non-Wiki users rate


Overview of Data

The dataset used is the survey of faculty members from two Spanish universities on teaching uses of Wikipedia.

Approaches

The step by step approaches done can be seen below.

Step 1: Identify a theme of interest

The wiki dataset consists of answers from survey for research on university faculty perceptions and practices of using Wikipedia as a teaching resource. Theme of interest that can be explored from the dataset is the relationship between different attributes of the respondents and how they assess based on the survey.

Step 2: Define questions for investigation

There are 4 questions that will be investigated based on the theme of interests defined:

  • How do respondents with different age groups rate their experience on using Wikipedia?
  • What is the rating (Likert Value) of each question or statement used in the survey?
  • How do respondents with different gender from different universities, domains, and age group rate their experience on using Wikipedia?
  • How do universities, domain, or Wikiuser rate their contribution to Wikipedia?

Step 3: Find appropriate data attributes

After defining the questions, the next step is finding the appropriate data attributes. The data attributes that will be used are University, Domain, Gender, and UserWiki. These attributes will be used to analyse the survey results and see whether there are some insights that can be obtained.


Data Preparation

Before using the data to do analysis, firstly data preparation needs to be done. The first thing to be done is recoding all "?" values to blank values. After that, other variables such as Gender, Domain, PhD, YearsExp, University, UOC_Position, Other Position, OtherStatus, and UserWiki are also recoded as can be seen in the figure below.

Recode Franky.png

After recoding the variable values, next, a new column named "ID" is created to be assigned to each respondents. This is done to help in visualizing the data.
After adding a new column "ID", next the dataset needs to be reshaped so that every question has one row. The reshaped data can be seen in the figure below. Reshape Franky.png

After reshaping the data, the data is now ready to be used for analysis.

Lastly, the Question code (e.g. PU1,PU2,PU3) needs to be changed to the actual question by changing the alias so that it is more meaningful in the visualization.

Analysis

After preparing the data, next step is to do the analysis. The analysis is done to answer the questions that have been defined.

Results

There are 3 results from the analysis:
Age Group Responses Franky.png


From the graph, there are few interesting observations can be observed. First, it can be seen that most of the respondents aged 20-30 tends to rate higher (Strongly Agree and Agree) for the statement "I consult Wikipedia for personal issues" (about 75%) compared to other age groups. Another interesting observation from this chart is that respondents from age group of 60 years old and above respond "Neutral" to this statement more than other age groups. The overall distribution of the response for this question is relatively skewed to "Agree". Respondents aged between 30 and 60 have relatively similar distribution with most of the respondents respond "Agree" to this statement. .


Likert Franky.png

From this chart, it can be seen that the statement "I consult Wikipedia for personal issues" have the highest rating (3.651) compared to 5 other statements while citing Wikipedia in academic papers has the lowest Rating (2.027) compared to the 5 other statements. Another statement that also have relatively high rating (3.492) is "I consult Wikipedia for academic related issues". This also indicates that most of the respondents used Wikipedia as a reference for their academic related issues but they will usually decide not to cite Wikipedia in their academic papers.



Treemap Franky.png

From the Treemap plot, it can be seen that most of the respondents are from UOC and "Others" domain. It can also be seen from the plot that respondents from UPF are on both extremes (very high and very low) in terms of Likert Value rating. This can be seen from the bottom right side of the Treemap Plot where the University and Domain of the respondent are same but there is a very high (darker color) Likert Value rating, and there is also a very low (light color) Likert value rating.

Trellis Franky.png

From the trellis chart above, it can be seen that there is a distinct difference in Likert Value rating between Wiki user and Non-wiki user. Wiki users tends to rate higher (average Likert value >2) than non-Wiki users rate (average Likert value <=1.5).This is probably because Wiki users are more used to using it and therefore understand the advantage of Wikipedia better which leads to higher rating on the survey. Another interesting thing can be observed from this graph is that respondents from UOC are on both extremes (highest and lowest) in terms of average likert value.

Question Group.png

From this chart, it can be seen that most of the respondents rated high (average Likert value = 4.235) for "Sharing Attitude" questions while they rated relatively lower (average Likert value <= 2.5) for "Profile" and "Use Behavior" questions. This indicates that most of the respondents realized that it is important for them to share and publish academic content and research results in online platform such as Wikipedia but they are unlikely to participate actively or recommend their students and colleagues to use Wikipedia.

Interactive Visualization

These are the links to the Tableau Public :
Dashboard 1 [1]
Dashboard 2[2]


Tools Utilized

The tools used for this analysis are Tableau 10.0, JMP Pro12, and Microsoft Excel, Tableau Public.

  • Tableau 10.0 is used to visualize the data.
  • JMP Pro12 is used to clean and prepare the data before it can be used for further analysis.
  • Microsoft Excel is used to clean and reshape the data so that it will be easier to visualize.
  • Tableau Public is used to enable interactive visualization of the charts.


Charts used: Bar Chart, Stacked bar Chart, Treemap, Trellis Chart


References

Below are some of the references that are used as a guide:
http://www.datarevelations.com/using-tableau-to-visualize-survey-data-part-1.html
http://www.datarevelations.com/using-tableau-to-visualize-survey-data-part-2.html


Comments