Difference between revisions of "REO Project Proposal"
Gtong.2014 (talk | contribs) |
Gtong.2014 (talk | contribs) |
||
Line 69: | Line 69: | ||
== Scope of Work == | == Scope of Work == | ||
Our dataset contain only records for 2017 Q3 & Q4, which limits any time-series analysis <br> | Our dataset contain only records for 2017 Q3 & Q4, which limits any time-series analysis <br> | ||
− | Dataset only | + | Dataset only contains behavior of commercial users on their portal. <br> |
Phase 0: Context learning | Phase 0: Context learning | ||
Line 79: | Line 79: | ||
Phase 1: Data Preparation | Phase 1: Data Preparation | ||
* Refer to “Project Methodology” | * Refer to “Project Methodology” | ||
− | Phase 2: Data | + | Phase 2: Data Exploration |
* Studying the distribution of the variables | * Studying the distribution of the variables | ||
* Identifying and treating outliers | * Identifying and treating outliers | ||
* Comparing data among different groups | * Comparing data among different groups | ||
* Classifying variable data for further analysis | * Classifying variable data for further analysis | ||
− | Phase 3: | + | Phase 3: Data Analysis |
− | * | + | * Conduct data segmentation using Clustering Analysis |
− | * | + | * Use of K-Means and Latent Class Analysis |
− | |||
− | |||
<br/> | <br/> |
Revision as of 22:24, 15 April 2018
Contents
Project Motivation
Despite being a perfect avenue to collect various usage data of both the commercial users and house seekers, REO fails to fully utilise the vast amount of collected data. This is attributed to due to the lack of cleaning or preparation for further analysis. This is further exacerbated by the fact that REO has been present since 2014, which means it has 4 years’ worth of data unexplored. Meanwhile, REO faces strong competition from either similar sites with greater scale or smaller sites with a strong niche e.g. specializing in new condominiums.
Recent news articles have explored how property market is going to get ‘hot’ due to the increase in demand. According to URA’s Real Estate Information System (REALIS), the property price index of residential property has increased for the first time since quarter 4 in 2013 and the planned supply of property from 4 years from now would increase as well. Given that the increase in both demand and supply, REO needs to capitalise on the growing market for its growth.
Project Objectives
Business Objectives
- As their revenue model is reliant on the subscription fees and premium features used, the focus of this project would be on enhancing engagement with the commercial users through identifying user segments and developing segment targeting strategies. With better engagement, they hope to reduce attrition rate and increase potential user base.
Technical Objectives
To use data analytical tools and statistical methods to study the data and derive insights that may help to achieve the business objective.
- To understand the data domains
- To understand the usage rate and activities performed on the portals by the commercial users
- To identify patterns or trends with among the commercial users behaviour
- To identify customer segments through clustering analysis
Project Data
REO provided 6 separate sets of data. The datasheets are Users, Subscription, Sessions, Enquiries, Cobroke and Listings.
Data Dictionary
Project Methodology
Combining of sheets
The current data are in five distinct sheets. The various variables will be matched to their respective unique customer ID and combined into a single spreadsheet.
Data-cleaning
- There are entries with missing values. It should be confirmed with REO if these entries contain a true zero or if their values are indeed missing. REO will be consulted to help make sense on this issue. Depending on the client’s input, these values will be allocated with either a 0 or the median value.
- Under the sessions dataset, there are unique user_ids which were logged but they do not have a corresponding entry under the subscription dataset. Input from REO will allow the team to determine the follow-up action.
Data Transformation
- The descriptive statistics suggested that the datasets are skewed towards the right as their arithmetic mean are larger than the median value. Based on the analysis requirements, the team may trim the outliers and/or standardise the values.
Independent t-test
An independent t-test should be conducted to observe for any difference between REO’s commercial users – the subscribers (paid users) and non-subscribers (free users).
Others
Based on the findings from the exploratory research, other methods may be attempted. For example, a regression analysis may be conducted to identify leading factors affecting their leads generation. A clustering analysis can also be conducted to identify if there are any meaningful groups from the users’ behaviour.
Scope of Work
Our dataset contain only records for 2017 Q3 & Q4, which limits any time-series analysis
Dataset only contains behavior of commercial users on their portal.
Phase 0: Context learning
- Understanding the business model of REO, industry and competitors
- Verifying facts/assumptions
- Mapping out the user’s process of the portal
- Defining what constitutes as a successful ‘conversion’ for the portal
- Understanding the variables of the dataset
Phase 1: Data Preparation
- Refer to “Project Methodology”
Phase 2: Data Exploration
- Studying the distribution of the variables
- Identifying and treating outliers
- Comparing data among different groups
- Classifying variable data for further analysis
Phase 3: Data Analysis
- Conduct data segmentation using Clustering Analysis
- Use of K-Means and Latent Class Analysis