REO Project Proposal

From Analytics Practicum
Jump to navigation Jump to search


Back to ANLY482 AY2017-18 Home Page

HOME

 

ABOUT US

 

PROJECT PROPOSAL

 

PROJECT FINDINGS

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 


Project Motivation

Recent news articles have explored how property market is going to get ‘hot’ due to the increase in demand (Today, 2017; The Straits Times, 2017). According to URA’s Real Estate Information System (REALIS), the property price index of residential property has increased for the first time since quarter 4 in 2013 and the planned supply of property from 4 years from now would increase as well. Given that the increase in both demand and supply, REO needs to capitalise on the growing market for its growth.

Project Objectives

Business Objectives

  • To improve REO’s financial performance by increasing and maintaining the number of paid subscribers. Subscribers will be incentivised to maintain a business relationship with REO if they are receiving more quality leads than from competitors’ sites.


Technical Objectives
To use data analytical tools and statistical methods to study the data and derive insights that may help to achieve the business objective.

  • To understand the data domains
  • To understand the usage rate and activities performed on the portals by the commercial users
  • To identify patterns or trends with among the commercial users behaviour
  • To identify the contributing factors that generate greater leads
  • To provide visual representation of the results

Project Data

REO provided 6 separate sets of data. The datasheets are Users, Subscription, Sessions, Enquiries, Cobroke and Listings.
Data Dictionary Metadata.png

Project Methodology

Combining of sheets
The current data are in five distinct sheets. The various variables will be matched to their respective unique customer ID and combined into a single spreadsheet.

Data-cleaning

  • There are entries with missing values. It should be confirmed with REO if these entries contain a true zero or if their values are indeed missing. REO will be consulted to help make sense on this issue. Depending on the client’s input, these values will be allocated with either a 0 or the median value.
  • Under the sessions dataset, there are unique user_ids which were logged but they do not have a corresponding entry under the subscription dataset. Input from REO will allow the team to determine the follow-up action.

Data Transformation

  • The descriptive statistics suggested that the datasets are skewed towards the right as their arithmetic mean are larger than the median value. Based on the analysis requirements, the team may trim the outliers and/or standardise the values.

Independent t-test
An independent t-test should be conducted to observe for any difference between REO’s commercial users – the subscribers (paid users) and non-subscribers (free users).

Others
Based on the findings from the exploratory research, other methods may be attempted. For example, a regression analysis may be conducted to identify leading factors affecting their leads generation. A clustering analysis can also be conducted to identify if there are any meaningful groups from the users’ behaviour.


Scope of Work

Our dataset contain only records for 2017 Q3 & Q4, which limits any time-series analysis
Dataset only contain behavior of commercial users on their portal.

Phase 0: Context learning

  • Understanding the business model of REO, industry and competitors
  • Verifying facts/assumptions
  • Mapping out the user’s process of the portal
  • Defining what constitutes as a successful ‘conversion’ for the portal
  • Understanding the variables of the dataset

Phase 1: Data Preparation

  • Refer to “Project Methodology”

Phase 2: Data Description

  • Studying the distribution of the variables
  • Identifying and treating outliers
  • Comparing data among different groups
  • Classifying variable data for further analysis

Phase 3: Model Building

  • Developing of models such as Multivariate Regression to learn the impact of drivers
  • Building a predictive model using algorithm such as classifiers e.g. k-nearest neighbour (k-NN) classifer

Phase 4: Results Visualisation

  • Preparing the results to be displayed on Google Data Studio