ANLY482 AY2016-17 T2 Group4: Project Overview
Contents
- 1 Motivation & Business Problem
- 2 Project Objectives
- 3 Constraints
- 4 Project Details
- 5 System Architecture
- 6 Predictive Variables(Seller Attributes)
- 7 Response Variables (Seller Performance Metrics)
- 8 Data Source
- 9 Methodology
- 10 Data Collection
- 11 Data Exploration and Cleaning
- 12 Data Modelling
- 13 Data Visualization
Motivation & Business Problem
INTENT Improve the transparency of information useful in identifying a seller’s performance to customers and sellers. PROBLEM - Customers aren’t able to identify which are the best sellers to purchase products from. - The characteristics of sellers that matter to a customer aren’t clearly defined to sellers who want to manage and improve their performance.
Project Objectives
We will identify critical features that can allow sellers to measure and manage their performance on Lazada’s platform. The aforementioned features will be exposed to the customers to help them identify the better/best sellers to purchase from.
Constraints
Production ready: Run data pipeline within 3 hours with 16gb RAM
Project Details
System Architecture
Predictive Variables(Seller Attributes)
Shipping Time
Pricing
Return Rate
Seller Initiated Cancellation Rate
Seller Category ( e.g. home & living , fashion, multi-category sellers)
Size of Seller
Seller’s Years of experience on Lazada
Response Variables (Seller Performance Metrics)
Total purchases made per sales item Product Popularity Ratio (PPR) = Total Purchase / Distinct Count of products
Data Source
Sensitive Data (Not to be revealed)
Methodology
Data Collection
This will be done to form the pipeline of data extraction from Lazada database and Google Analytics. The challenge is to properly pull out quality data from the relevant and updated sources.
Data Exploration and Cleaning
Manage exploratory analysis of these data. These analysis will be used to improve on business questions which also affect the exploratory analysis. This process will be done repeatedly with necessary data cleaning and munging until we find business questions which accurately express business needs given the data and exploratory analysis made.
Data Modelling
After a proper exploratory phase of the analysis, we will train and test machine learning models to to answer predictive and prescriptive business questions. This will include processes such as clustering to segment user behaviours, regression to include impacts of various seller attributes to CX Metrics, etc. Various statistical learning models such as Random Forest and Regularization might also be used to reduce risk of overfitting and increase testing accuracy of models.
Data Visualization
These data analysis will be documented visually Jupyter Notebook or interactive dashboard tools which are later demonstrated and presented to business users such as Lazada suppliers and internal teams. Insights presentation techniques such as Storyboarding and Pyramid technique (Barbara Pinto) might also be used to ensure proper presentation to match findings and business needs.