ISSS608 2016-17 T1 Assign1 Parikshit Ravindra MAYEE

From Visual Analytics and Applications
Jump to navigation Jump to search

Abstract


The Housing & Development Board (HDB), Singapore's public housing authority, has built more than 1 million flats across the island and it homes more than 80% of Singapore's Population. Resale of these properties makes up a huge portion of yearly property transactions and hence requires a thorough analysis of past and present trends. In this project I will focus on analysis of resale HDB properties for 2015 and its comparable trends with first half of 2016.

Problem and Motivation


More often than not, our assumptions with respect to any trends are based on unverified facts and untested theories. Biased opinions on public portals only adds to this mess. To get a true picture of the situation, analyzing the reliable data is the only true option. With this project I intend to showcase the trends, distributions and correlation between various factors such as Property type, Property area, Resale price of property and number of property transactions to name a few. I intend to use reliable datasets available from https://data.gov.sg/.

Approaches

Datasets

After exploring https://data.gov.sg/ site, I decided to make use of following datasets :

  i. Resale Flat Prices (Based on Registration Date), From March 2012 Onwards
 ii. Housing And Development Board Resale Price Index (1Q2009 = 100), Quarterly
iii. HDB Branches (SHP)

I decided not to use Resale Transactions by Flat Type (based on registered cases) dataset because I was unable to establish the reasons for difference in the count of transactions captured in Resale Flat Prices data set and the total transactions documented in this dataset.

Data Preparation

After loading datasets in Tableau I verified the datatypes, referring to metadata files available with dataset. One important change I did was creating a calculated field (Date) using column 'Month' and 'Quarter' in Resale Flat Prices and Resale Price Index dataset respectively. Calculation formulas had to be different due to difference in structure of Month & Quarter in original dataset.

For HDB Branches dataset, I used SAS JMP Pro to open and subsequently convert the dataset to .xlsx so that it can be loaded in Tableau.

I created more calculated fields in order to streamline my analysis process.

Exploration & Analysis

I decided to start with Box & Whisker analysis to look for spread of median resale prices across towns. This analysis showed that 2 specific towns, viz, Central Area and Bukit Timah had exceptionally high median resale prices. I plotted another Box & Whisker to check the spread across all quarters for 2015. This confirmed that Properties in Central Area and Bukit Timah were resold at exceptionally higher prices as compared to other Towns. I plotted Histogram to confirm that median resale price distribution for 2015 is skewed. My observations attached below.

001-Box&Whisker.png
011-Box&Whisker - 2015 Qtr spread of meadian resale prices across towns.png
04- Histogram - 2015 Median Resale pricer distribution.png


Next, I decided to go for Pareto analysis on number of properties resold in 2015 with respect to Flat Model of property. This analysis showed that 81.92% of of total HDB flats sold in 2015 belonged to 4 Flat Models : Model A, Improved, New Generation and Premium Apartment. These 4 Flat types constitute 22.2% of the total HDB Flat Models. My observation is attached below.

Pareto 1.png


Next, I decided to analyze the flat area and Median resale price along with number of flats resold. I plotted histogram for Flat area (sqm) with the bin size of 10. This showed that Flats with area size between 60 to 70 has the highest transactions in 2015. I introduced median resale price using line chart over the histogram. I observed that it showed a steady increase in median resale price with the increase in area of flat. I noted couple of exceptions for flats with area 90-100, 160-170 and 180-190. These showed drop in median resale rice w.r.t. preceding bin. My observation is attached below.

021-Histogram & Line - Property Area & Median resale price.png


Following the same line I decided to analyse the Change in median resale price and number of resold flats with respect to the age of the flat (using the Lease start date). I observed that flats with the lease start date between 1980-1985 had the most number of transactions in 2015. Also, the median resale price multiple deviations with respect to preceding bin. My observation is attached below.

021-Histogram & Line - Property Age & Median resale price.png


I decided to analyse this further and decided to use scatter plot to observe the association of Property Age with respect to median resale price and the number of resold properties. As expected, Plot showed a negative correlation of median resale property price with respect to increasing age of flat. Plot showed low co relation between flat age and number of resold transactions which surprised me.

Scatter Plot - Property Age wrt - Resold propertied & median resale price.png


I decided to analyse the changes in median resale property prices across the quarters for 2015. I observed that Central Area was the most volatile in terms of change in median resale prices in Q1 & Q4. Interestingly, I observed similar behaviour for 2016 Q1.

Choropleth- %Change in Median Resale prices across quarters updated.png


Next, I used Resale Price Index data set to analyse the changes in 2015. This showed Q-Q drop in RPI from Q1-Q2-Q3. Q4 however showed some improvement with a rise in RPI. Since the RPI was dropping this might lead to immature conclusion that overall 2015 performance was not good. I decided to expand the range of my analysis to past 10 year. With this I observed that 2015 has the lowest RPI % drop in last 3 years.

Resale Price index.png
Price Index over 10 years.png


I used HDB Locations dataset to draw geographic locations of HDB offices across Singapore.

Geographical 2- HDB Locations.png


For 2015 vs 2016 analysis I plotted multiple graphs as below. I observed that for that 2015 & 2016 showed similar trends for Q-Q % change in number of resold properties for Q1 & Q2. Also, Q-Q wise analysis of median resale price showed that median resale price increased in 2016 Q1 compared to 2015 Q1. However, the 2016 Q2 median resale price was lower than 2015 Q2 in spite of having similar higher upper whisker value.

2015vs16 - 1 - % change in - of prperties sold.png
2015vs16 - qtr wise - median resale price.png
2015vs16 - 2 - Spread of Median Resale price.png
2015vs16 - 3 - cost-sqm.png






Infographic



Infographic MiniAssignment1.png


Tools Utilized


1. Microsoft Excel : used to handle the basic operations of data file
2. SAS JMP Pro : Used to load SHP files and to convert them to .xlsx files
3. Tableau : Used for exploratory data analysis and to generate graphical representations
4. Microsoft Powerpoint : Used for preparation of final Infographic poster

Results

The output of this project is the analysis and observations with respect to multiple variables such as HDB property type (resold), HDB property location, median resale price, number of transactions, Flat Area, Flat Model and Property age. This can be used to identify potential areas of hypothesis testing which can then result into actionable insights. For example, Exceptionally high median resale prices in Central Area and Bukit Timah should be studied further.
The example of Price Index shown in this experiment shows the importance of considering a wired angle before drawing conclusions. Also, the range of data used for analysis has a major impact on the observations and hence the inference drawn from it.


References

https://data.gov.sg/
http://www.singstat.gov.sg/
http://www.straitstimes.com/singapore/housing/hdb-resale-price-index-up-by-02-in-q4-of-2015-flash-estimates
http://www.hdb.gov.sg/