Difference between revisions of "GeoEstate PROPOSAL"
Ylang.2016 (talk | contribs) |
|||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | <div align="center" style="font-size:75px">[[File:GeoEstate_logo.png|center | + | <div align="center" style="font-size:75px">[[File:GeoEstate_logo.png|center|250px]]GeoEstate</div> |
<!------ Main Navigation Bar----> | <!------ Main Navigation Bar----> | ||
Line 32: | Line 32: | ||
| style="font-family:Open Sans, Arial, sans-serif; font-size:24px; border-top:solid #ffffff; border-bottom:solid #2DB0AF" width="9999px" | Project Description | | style="font-family:Open Sans, Arial, sans-serif; font-size:24px; border-top:solid #ffffff; border-bottom:solid #2DB0AF" width="9999px" | Project Description | ||
|} | |} | ||
− | Our project aims to | + | Our project aims provide an easy way for end users to calculate the predicted resale housing prices of apartments, condominiums and executive condominiums, using inputs such as the postal code, square area and type of apartment. To achieve this, we use 3 regression models, the geographically weighted regression model, the spatial autoregression model and the multiple linear regression model. Users can read about the methodology for each model and select the one that he or she feels best fits his or her situation, or simply look at which model fits the data points best by looking at the r square value. |
+ | |||
+ | Furthermore, we aim to allow for extensive exploratory data analysis, allowing users to see how various variables in the property market such as the age of the property and square feet of the property correlate to property resale prices. Through this, we aim to educate interested consumers and real estate agents alike on what truly matters when determining the price of real estate resale property. | ||
<br/> | <br/> | ||
Line 40: | Line 42: | ||
|} | |} | ||
− | + | How do you know if you are getting a reasonable price for your apartment? Due to vested interests, for people who are interested in being educated consumers, taking your Real Estate Agent's word for the price of a property may not be enough. In our current age, websites like PropertyGuru appear to give us some semblance of what prices are competitive. However, this may be misleading as it only is a snapshot in time. | |
+ | |||
+ | What if you were able to predict the price of the property you want to sell, or conversely, the dream property you wish to purchase, using masses of data accumulated over past years? | ||
+ | |||
+ | Through our application, we aim to educate consumers on the value of property using rigorous statistical methods. | ||
<br/> | <br/> | ||
Line 48: | Line 54: | ||
|} | |} | ||
− | [[File:StoryBoard1.png| | + | [[File:StoryBoard1.png|1000px]] |
− | [[File:StoryBoard2.png| | + | [[File:StoryBoard2.png|1000px]] |
− | [[File:StoryBoard3.png| | + | [[File:StoryBoard3.png|1000px]] |
− | [[File:StoryBoard4.png| | + | [[File:StoryBoard4.png|1000px]] |
<br/> | <br/> | ||
Line 141: | Line 147: | ||
<br> | <br> | ||
+ | <table class="wikitable" style="background-color:#FFF; margin: 1em auto;" width="80%; font-size: 15px;"> | ||
+ | <tr> | ||
+ | <th> No. </th> | ||
+ | <th> Required Data</th> | ||
+ | <th> Action</th> | ||
+ | </tr> | ||
+ | |||
+ | <tr> | ||
+ | <td> 1. </td> | ||
+ | <td> Count of facilities </td> | ||
+ | <td> | ||
+ | As property prices are affected by the various points of interest located in their proximity, we needed a method to determine the quantity of these facilities around the property we are querying. | ||
+ | |||
+ | To achieve this, we created a buffer of radius 1km around the specific property using method gBuffer from the rgeos package in R. After this buffer was created, we then used the over method from the SP package to count how many facilities were within points of interest. | ||
+ | </td> | ||
+ | </tr> | ||
+ | |||
+ | <tr> | ||
+ | <td> 2. </td> | ||
+ | <td> Freehold duration left </td> | ||
+ | <td> | ||
+ | Another variable that we felt might affect the overall prediction was the effective freehold duration where a property with less than 20 years left would be worth much less than a property with 50 years left. | ||
+ | |||
+ | From the data provided by REALIS, only Freehold tenure and building completion year was provided. From there, we determine whether it was a 60/ 99 / 100 / 999 years tenure before subtracting its age (2015 - completion year) to get our effective tenure value. | ||
+ | </td> | ||
+ | </tr> | ||
+ | </table> | ||
<br> | <br> | ||
Line 198: | Line 231: | ||
{| style="background-color:#ffffff ; margin: 3px 10px 3px 10px; width="80%"| | {| style="background-color:#ffffff ; margin: 3px 10px 3px 10px; width="80%"| | ||
− | | style="font-family:Open Sans, Arial, sans-serif; font-size:24px; border-top:solid #ffffff; border-bottom:solid #2DB0AF" width="9999px" | | + | | style="font-family:Open Sans, Arial, sans-serif; font-size:24px; border-top:solid #ffffff; border-bottom:solid #2DB0AF" width="9999px" | Methods used |
|} | |} | ||
− | + | === 1. Multi-linear regression (MLR) === | |
− | + | ||
+ | MLR or hedonic price model assumes that the property is valued by the sum of their characteristics such as the size, neighborhood, accessibility and proximity facilities. Therefore, with such generalization, the general pricing model can be simplified to the expression below. | ||
+ | |||
+ | [[File:GeoEstate MLR eq.png|300px]] | ||
+ | |||
+ | We have included MLR into our project as not only MLR was traditionally used in these Real Estates sites we are evaluating against, but also, we wanted to compare the performance difference of a Non-spatial predictor compared to a spatial model such as GWR and SAR. | ||
+ | |||
+ | |||
+ | |||
+ | === 2. Geographically Weighted Regression (GWR) === | ||
+ | GWR is similar to MLR in w.r.t the regression formula, however for estimating the coefficients of the model for a flat i we assume that flats located near to the flat i, have higher influence on the coefficients than flats located far from the flat i. | ||
+ | In terms of regression, it means that flats located near to the point i, have larger weights. | ||
+ | Usually the bi-square function is used for calculating a weight of the point j for the point i model: | ||
+ | |||
+ | [[File:GeoEstate GWR eq.png]] | ||
+ | |||
+ | where distance ij, is a distance from the point i, to the point j,. The bandwidth reflects the speed of weight decreasing. There are two main approaches for the bandwidth selection – fixed and adaptive bandwidths. | ||
+ | The fixed bandwidth is selected once for all data points, the adaptive can be changed from point to point depending on data density. The illustration of the adaptive bandwidth is presented by the diagram below. | ||
+ | |||
+ | [[File:GeoEstate GWR bw.png|500px]] | ||
+ | |||
+ | |||
+ | |||
+ | === 3. Spatial Autoregression (SAR) === | ||
+ | As defined in Anselin (2001), spatial autoregression is referred to as the `coincidence of value similarity with locational similarity'. Therefore, due to SAR, property in near proximity of one another tend to have similar transaction price. | ||
+ | |||
+ | First, with a nearby location, homeowners tend to follow their neighbors' improvement activities, which result in similar dwelling size, vintage, designs, and other structural characteristics. | ||
+ | |||
+ | Second, spatial autoregression arises from the shared locational amenities of houses in the nearby location and neighborhood (Basu and Thibodeau, 1998; Militino et al, 2004), such as schools, hawker centers and shopping malls. Last, Real Estate agents tend to evaluate the value of houses by referring to the neighborhood conditions, an activity which also results in similar housing values in the nearby locations. | ||
+ | |||
+ | Overall the formula could be generalized to: | ||
+ | |||
+ | [[File:GeoEstate SAR eq.png]] | ||
− | + | Whereby Wy represents the influence of the neighboring properties on the targeted property final transaction price. | |
− | |||
− | |||
− | |||
<br> | <br> | ||
Latest revision as of 23:53, 14 April 2019
Project Description |
Our project aims provide an easy way for end users to calculate the predicted resale housing prices of apartments, condominiums and executive condominiums, using inputs such as the postal code, square area and type of apartment. To achieve this, we use 3 regression models, the geographically weighted regression model, the spatial autoregression model and the multiple linear regression model. Users can read about the methodology for each model and select the one that he or she feels best fits his or her situation, or simply look at which model fits the data points best by looking at the r square value.
Furthermore, we aim to allow for extensive exploratory data analysis, allowing users to see how various variables in the property market such as the age of the property and square feet of the property correlate to property resale prices. Through this, we aim to educate interested consumers and real estate agents alike on what truly matters when determining the price of real estate resale property.
Project Motivation |
How do you know if you are getting a reasonable price for your apartment? Due to vested interests, for people who are interested in being educated consumers, taking your Real Estate Agent's word for the price of a property may not be enough. In our current age, websites like PropertyGuru appear to give us some semblance of what prices are competitive. However, this may be misleading as it only is a snapshot in time.
What if you were able to predict the price of the property you want to sell, or conversely, the dream property you wish to purchase, using masses of data accumulated over past years?
Through our application, we aim to educate consumers on the value of property using rigorous statistical methods.
Storyboard |
Data sources |
Data | Source | Data Type/Method |
---|---|---|
2014 Master Plan Planning Subzone (Web) | Data.gov.sg | SHP |
URA Private Residential Property Transactions | Ura.gov.sg |
CSV |
Pre-School Locations | Data.gov.sg | KML Converted to Shapefile |
Primary/Secondary School Locations | Data.gov.sg | CSV Data was geocoded using OneMap API |
MRT/LRT Station Locations | LTA Datamall (Direct Download) |
SHP |
Supermarket Locations | Data.gov.sg | KML Converted to Shapefile |
Shopping Mall Locations | Wikipedia | Text Data was converted to Shapefile after geocoding using OneMap API |
Park Locations | Data.gov.sg | KML Converted to Shapefile |
Sports Facilities Locations | Data.gov.sg | KML Converted to Shapefile |
Hawker Centre Locations |
Public Food Centres: |
1: KML - Converted to Shapefile |
Data Transformation |
No. | Required Data | Action |
---|---|---|
1. | Count of facilities |
As property prices are affected by the various points of interest located in their proximity, we needed a method to determine the quantity of these facilities around the property we are querying. To achieve this, we created a buffer of radius 1km around the specific property using method gBuffer from the rgeos package in R. After this buffer was created, we then used the over method from the SP package to count how many facilities were within points of interest. |
2. | Freehold duration left |
Another variable that we felt might affect the overall prediction was the effective freehold duration where a property with less than 20 years left would be worth much less than a property with 50 years left. From the data provided by REALIS, only Freehold tenure and building completion year was provided. From there, we determine whether it was a 60/ 99 / 100 / 999 years tenure before subtracting its age (2015 - completion year) to get our effective tenure value. |
Literature Review |
Contents
1. A Spatial Analysis of House Prices in the Kingdom of Fife, Scotland
(By: Julia Zmölnig, Melanie N Tomintz, Stewart A Fotheringham)
Aim of Study: to analyse the spatial variations in house price adjustments due to economic conditions, and to quantify and describe patterns in the variations of house prices in the study area of Fife, Scotland
Methodology:
Spatial Interpolation Technique - using points with known values to estimate values at other unknown points. There were 3 main methods being used:
- Diffusion Interpolation with Boundaries
- Inverse-distance weighting
- Deterministic ordinary Kriging (Most accurate)
Learning Points:
- House price hot spot will migrate from year to year and multiple models is required if the study duration spans over multiple years
- Economic downturn actually leads to increase of property prices despite more supply from unemployed people
Areas for Improvement:
- Data lacked information such as the size and type of real estates which while could be approximated via interpolation, overall still hurts the accuracy of the model
- Using a different model such Geographically Weighted Regression (GWR) to identify spatial patterns apparent in the study area.
2. Statistical analysis of the relationship between public transport accessibility and flat prices in Riga
(By: Dmitry Pavlyuk)
Aim of Study: to examine the relationship between public transport accessibility and residential land value in Riga, Latvia
Methodology:
- Geographically Weighted Regression (GWR)
- Global Regression Model
Learning Points:
- Within city centre, accessibility has no significant relationship on flat prices as the city centres are already rich in transport route and new routes have a diminishing impact
- For the population with higher income, higher public transport accessibility will possibility lead to lower property prices
- Overall GWR performed significantly better than global regression
- Variable that have no significant relation in one model might be significant in another. For example, the influence of the first floor on the price was insignificant in the global regression model, it was a local dependency in GWR.
Areas for Improvement:
- Overall limited impact by transport which was the main focus of the study
- Possibility of using Manhattan distance to compute the actual distance travelled rather than straight line distance
Methods used |
1. Multi-linear regression (MLR)
MLR or hedonic price model assumes that the property is valued by the sum of their characteristics such as the size, neighborhood, accessibility and proximity facilities. Therefore, with such generalization, the general pricing model can be simplified to the expression below.
We have included MLR into our project as not only MLR was traditionally used in these Real Estates sites we are evaluating against, but also, we wanted to compare the performance difference of a Non-spatial predictor compared to a spatial model such as GWR and SAR.
2. Geographically Weighted Regression (GWR)
GWR is similar to MLR in w.r.t the regression formula, however for estimating the coefficients of the model for a flat i we assume that flats located near to the flat i, have higher influence on the coefficients than flats located far from the flat i. In terms of regression, it means that flats located near to the point i, have larger weights. Usually the bi-square function is used for calculating a weight of the point j for the point i model:
where distance ij, is a distance from the point i, to the point j,. The bandwidth reflects the speed of weight decreasing. There are two main approaches for the bandwidth selection – fixed and adaptive bandwidths. The fixed bandwidth is selected once for all data points, the adaptive can be changed from point to point depending on data density. The illustration of the adaptive bandwidth is presented by the diagram below.
3. Spatial Autoregression (SAR)
As defined in Anselin (2001), spatial autoregression is referred to as the `coincidence of value similarity with locational similarity'. Therefore, due to SAR, property in near proximity of one another tend to have similar transaction price.
First, with a nearby location, homeowners tend to follow their neighbors' improvement activities, which result in similar dwelling size, vintage, designs, and other structural characteristics.
Second, spatial autoregression arises from the shared locational amenities of houses in the nearby location and neighborhood (Basu and Thibodeau, 1998; Militino et al, 2004), such as schools, hawker centers and shopping malls. Last, Real Estate agents tend to evaluate the value of houses by referring to the neighborhood conditions, an activity which also results in similar housing values in the nearby locations.
Overall the formula could be generalized to:
Whereby Wy represents the influence of the neighboring properties on the targeted property final transaction price.
Tools & Technology |
Project Timeline |
Challenges |
No. | Key Challenges | Mitigation |
---|---|---|
1. | Unfamiliarity with R, its packages and R Shiny |
|
2. | Limited oneMap API call for standard account |
|