Difference between revisions of "Group21 Application"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 29: Line 29:
 
|}
 
|}
 
<br>
 
<br>
 
= Data Cleaning & Preparation =
 
The data for 12 months (June 2017 to March 2018) that covers the resale prices, nature and characteristics of HDBs is taken from analysis.
 
<br>
 
<b> 1. Checking for Missing Values: </b>  Excel is used to check for the missing value patterns in the dataset. From the results of missing value pattern in the dataset, we observe that the dataset has no missing value patterns.
 
<br>
 
<b> 2. Adding New Columns: </b> The year, month and quarter field was extracted from the “Month” column of the dataset. Using the data from the Singapore government website, the Planning region for each area is added to the dataset.
 
<br>
 
<b> 3. Preparing the Geolocation Data: </b>
 
<br> 1. The concatenate function in Excel was used to derive the Address of each block in the dataset. The block number, street name was concatenated together, and “Singapore” was added to the end of the entry to derive the full address of each block.
 
<br> 2. Since the postal code was not available in the dataset, the coordinates were derived using the address of each block. To do the following, the “ggmap” package was used in R. The geocode function in ggmap package was used to derive the latitude and longitude coordinates of each block based on the address. The API could only run 2500 entries at a time and hence subsets of each month of the dataset were created and the code was run.
 
<br> 3. After running the code, it was observed that several rows had “NA” values in the latitude and longitude columns. These “NA” values were then extracted into a separate excel sheet and the code was re run. After running the code multiple times, there were still some “NA” values. The latitude and longitude coordinates for these blocks were manually derived.
 
<br> 4. For a few blocks, the values generated for latitude and longitude coordinates were reversed. The values were manually swapped to get the accurate coordinates for each block.
 
 
= Visualizations =
 
<b> 1. Geofacet Plot: </b>
 
The Geofacet plot shows the resale prices of HDB flats based on the new town planning. There is a total of 26 towns and each town accounts for each facet. The visualization shows the month wise trend of resale prices. Based on the location of a town on the Singapore map, each town is mapped on the Geofacet plot as an individual facet. From this plot, the user can examine the fluctuations in Resale prices across a town and compare it to other neighboring towns. The geofacet and ggplot2 package was used in R to create this geofacet plot. This visualization helps the user to observe the resale price trend with a view of the original geographic topology as closely as possible. From the Geofacet plot, we observe that Bukit Timah, Central Area and Marine Parade show the highest volatility in prices.   
 
 
<b> 2. Interactive Tree Map: </b>
 
The interactive tree map allows the user to view the resale prices and floor area across different levels. This gives the users the chance to look at the resale price trends in-depth. The user can view the treemap at the Planning region level. The user can then drill down to view the treemap at the Town level and Street level. The visualization also has a data table that summarizes the values of important parameters. The treemap size is based on the number of transactions and the color is based on the resale price. This visualization is designed to give the users an opportunity to drill down and compare resale price trends across respective levels, ie, compare towns with other towns or street with other streets within a town. The visualization was build using d3treeR package in R. Other packages like DT and dplyr are used to render data table and build interactivity to the treemap.
 
 
<b> 3. Detailed Analysis of HDB Blocks: </b>
 
Based on the input from the user, this tab renders a box plot and a leaflet map. The user is allowed to select a Region, Town, flat model and flat type. The Box Plot (Fig 6.3.1) shows the plot for Resale Price by Flat Type. This could be used for basic statistical analysis. This allows the user to analyze the minimum resale price and maximum resale price and identify the outliers in the dataset. On the other hand, the leaflet map shows the location of different HDBs based on the user input. It allows the user to view the transaction details of each HDB on click. The visualization was created using the leaflet, ggplot2, DT, plotly, dplyr, viridisLite and other packages in R. This visualization would aid the user to get the transaction level summary of HDB blocks based on the user input. The user can then use the visualization to analyze which HDB could be a good investment decision based on the price trends.
 
 
= Future Scope =
 
Given time constraints, the current application is only able to showcase the location of HDB flats based on the user input and resale prices of the flats at different levels. However, we believe that the dashboard can be enhanced further to include the following:
 
<br> 1. Price Estimator- Using regression models and forecasting methods, the dashboard can be enhanced further to build a price estimator which provides the users with the future price trends and estimated price of a HDB in the future based on the current market trend.                                     
 
<br> 2. User Interface with more options- The data can also be prepared to include amenities and other key locations such as MRTs, Schools, shopping malls, etc so that the users are given an option to select their desired HDB based on proximity to these key locations.
 
 
 
= Installation Guide =
 

Revision as of 19:34, 13 August 2018