Difference between revisions of "ANLY482 AY2016-17 T1 Group4: Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
Line 1: Line 1:
 
<!--Logo-->
 
<!--Logo-->
[[Image: TeamInsured_Logo.png|center|1000x150px]]<br>
+
[[Image: lazadash.png|center|1000x150px]]<br>
 
<!--Header Start-->
 
<!--Header Start-->
 
{|style="background-color:#4c5974; color:#F5F5F5; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#4c5974; color:#F5F5F5; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
Line 31: Line 31:
 
<!--Body Start-->
 
<!--Body Start-->
  
==Background==
+
==Motivation & Business Problem==
At the insurance company, serving happiness to customers has always been the main focus. As such, we at Team Insured have been tasked to assist their customer base, specifically for the Life Insurance side of the business.  
+
Lazada is still small compared to major competitors (e.g. Aliexpress / Amazon / Tao Bao) and it still needs to continue expanding and strengthening its user base before it can start to focus on realizing its profit goals. Despite its small size, it has significant market penetrations across many different countries in South-East Asia. That said, as a relative newcomer in the industry, it needs to prepare to for the serious market competition that Amazon might bring in the future. One way to provide a barrier to entry towards Amazon is to understand its buyers and sellers better and offer more effective strategies to enhance their experiences.
<br><br>
 
The three main things that the insurance company requires our help for to do analysis on:
 
* Average Product Holdings per Customer
 
* Customer Segmentation
 
* Agent Segmentation.  
 
 
<br>
 
<br>
Deeper analysis such as Need Based Analysis has also been requested. These are the three general objectives that are to be pursued for this project, so as to help the insurance company understand their customers better.
+
=== Current Issues ===
 +
 
 +
==== '''Difficulty navigating product searches catalog''' ====
 +
Lazada currently has millions of products listed where many versions of the same product are being sold. Buyers may find their online experience hindered by many conflicting factors in making a decision, such as lower prices, shorter time to delivery, and higher transaction reputation. To help buyers gain quick access to the best products in the catalogue/search results, intelligent ranking orders of all available product items needs to be generated to help buyers make preferred decisions and sellers get better sales.
 +
 +
==== '''Difficulty measuring features which lead to higher conversion rates and better customer experience''' ====
 +
One of the key drivers of a customer’s experience throughout the purchase journey (from online browsing to receiving a product) is the product’s quality. However, product quality is hard to determine due to no fixed way of measuring it. A methodology is required to assess what attributes contribute most to customers’ impression of product quality. On top of that, there are other useful features that have yet to be explored, including seller rating.
 +
 
 +
 
 +
=== Project objectives ===
 +
As such, the objective of our team is to model possible predictors for conversion rate and customer experience through measures of product quality and seller rating features.This will be used to derive recommendations in Lazada's on-site ranking system to enhance customer experience through quick access to the best products in customer’s catalogue.
 
<br>
 
<br>
  
==Data Source==
+
== Project Details ==
 +
===System Architecture ===
 +
[[Image: SA.png | center]]
 +
<br>
  
<br>
+
===Product & Seller Features (Predictive Variables)===
===Nature of Dataset===
+
Product Quality (online)
 +
- What factors have an effect on conversion and customer experience?
 +
- How should we adjust our website to improve online engagement, and NPS?
 +
<br/>
 +
Product Quality (delivery)
 +
- What factors have an effect on conversion and customer experience?
 +
- What can we do to improve customer NPS after they have received their product?
 +
<br/>
 +
Seller Rating
 +
- What factors have an effect on conversion and customer experience?
 +
- Which factors should we prioritize when building a seller dashboard?
 +
<br/>
  
The team was able to obtain ten different datasets from the main Database, of which eight were usable and relevant to our area of analysis (Actual table names have been masked for privacy's sake.)
+
===Conversion Rate & Customer Experience Metrics===
 +
====Conversion Rate====
 +
Conversion Rate is the percentage of visits which results in e-commerce transactions (sales). It helps to calculate how many visitors (shoppers) are actually turning into buyers (customers). High or higher conversion rate is the ultimate target for the business.
 +
<br/>
  
*The first dataset “Client”, the primary Client dataset, contains approximately 510,000 rows of client records, with 105 unique columns, which came in a “.TXT” formatted flat file.  
+
====Customer Experience====
 +
Customer Experience is the entire interactions that a customer has with the brand, organization, or e-commerce platform in this case, and an organization should aim to deliver most enjoyable and usable experience to existing and potential customers.
 +
<br/>
 +
Formulas:
 +
<br/>
 +
'''Entire Website:''' E-commerce Conversion Rate = (Total E-commerce transactions / total visits on a website) * 100
 +
<br/>
 +
'''By Product:'''  E-commerce Conversion Rate for a product = (Product Transaction / Product page visits) * 100
 +
<br/>
 +
For this project, we can get Conversion Rate from data collected, or get calculated Conversion Rate from Google Analytics platform since Lazada Group holds GA account.
 +
<br/>
 +
Another important aspect of Conversion Rate is the factors that have high correlation with high Conversion Rate (eg) Shorter payment process, coupon. This will be examined by correlation and/or regression analysis from softwares like SPSS.
 +
<br/>
  
*The second dataset, “Contract”, that was provided was the contract header dataset that held each clients’ contract data, containing approximately 1.4 million rows of records, with 234 unique columns, which came in a “.TXT” formatted flat file.
 
  
*The third dataset, “Life”, that was provided was the Life Insurance data that was linked to each contract, containing approximately 414,000 rows of records, with 27 unique columns, which came in a “.TXT” formatted flat file.
+
Formulas:
 +
<br/>
 +
'''Net Promoter Score (NPS): % of Promoters - % of Detractors'''
 +
<br/>
 +
Promoters (score 9-10): loyal customers who keep buying and referring to others
 +
<br/>
 +
Detractors (score 0-6): dissatisfied customers who might spread negative comments and word-of-mouth
  
*The fourth dataset, “Coverage”, that was provided was the coverage data that was linked to each contract, containing approximately 2.86 million rows of records, with 102 unique columns, which came in a “.TXT” formatted flat file.
 
  
*The fifth dataset, “Agent”, that was provided was the agent data linked to each client, containing approximately 10800 rows of records, with 51 unique columns, which came in a “.TXT” formatted flat file.
+
===Data Source===
  
*The sixth dataset, “Payer”, that was provided was the payee data linked to each contract, containing approximately 1.14 million rows of records, with 51 unique columns, which came in a “.TXT” formatted flat file.
+
<br>
 +
===Nature of Dataset===
  
*The seventh dataset, “ClientRel”, that was provided was a table linking agent data, client data and life insurance data together, containing approximately 2.482 million rows of records, with 11 unique columns, which came in a “.TXT” formatted flat file.  
+
To be obtained on approval (11th January).  
  
*The eighth dataset, “ClientExtra”, that was provided was the extra client data linked to each client, containing approximately 396,000 rows of records, with 21 unique columns, which came in a “.TXT” formatted flat file.
 
  
===ER Diagram===
 
[[Image: Teaminsured_erdiagram.png | center]]
 
<br>
 
  
 
==Methodology==
 
==Methodology==
 +
 +
[[Image: LazadaMethodology.png|center|1000x150px]]
  
 
<br>
 
<br>
  
===Cleaning and Merging===
+
=== Data Collection ===
* The flat files were processed with Windows Powershell Scripts, to removed any accidental line breaks and escape characters.
+
* The files were then manually validated against a data dictionary (Word Document) and then loaded into SAS Enterprise Guide
+
This will be done to form the pipeline of data extraction from Lazada database and Google Analytics. The challenge is to properly pull out quality data from the relevant and updated sources.
* After that the files were merged inside SAS EG and Exported as SAS7BAT files, to be analysed by the team.
 
  
===Exploratory Data Analysis===
+
=== Data Exploration and Cleaning===
* We used SAS Entreprise Guide for our Exploratory Data Analysis.
+
 +
Manage exploratory analysis of these data. These analysis will be used to improve on business questions which also affect the exploratory analysis. This process will be done repeatedly with necessary data cleaning and munging until we find business questions which accurately express business needs given the data and exploratory analysis made.
 +
Data Modelling
 +
 +
After a proper exploratory phase of the analysis, we will train and test machine learning models to to answer predictive and prescriptive business questions. This will include processes such as clustering to segment user behaviours, regression to include impacts of various seller attributes to CX Metrics, etc. Various statistical learning models such as Random Forest and Regularization might also be used to reduce risk of overfitting and increase testing accuracy of models.
 +
 +
===Data Visualization===
 +
 +
These data analysis will be documented visually Jupyter Notebook or interactive dashboard tools which are later demonstrated and presented to business users such as Lazada suppliers and internal teams. Insights presentation techniques such as Storyboarding and Pyramid technique (Barbara Pinto) might also be used to ensure proper presentation to match findings and business needs.
  
 
<!--Body End-->
 
<!--Body End-->

Revision as of 18:50, 31 December 2016


TeamInsured Home.png   HOME

 

TeamInsured About Icon.png   PROJECT OVERVIEW

 

TeamInsured Findings.png   PROJECT FINDINGS

 

TeamInsured PM.png   PROJECT MANAGEMENT

 

TeamInsured Documentation.png   DOCUMENTATION


Motivation & Business Problem

Lazada is still small compared to major competitors (e.g. Aliexpress / Amazon / Tao Bao) and it still needs to continue expanding and strengthening its user base before it can start to focus on realizing its profit goals. Despite its small size, it has significant market penetrations across many different countries in South-East Asia. That said, as a relative newcomer in the industry, it needs to prepare to for the serious market competition that Amazon might bring in the future. One way to provide a barrier to entry towards Amazon is to understand its buyers and sellers better and offer more effective strategies to enhance their experiences.

Current Issues

Difficulty navigating product searches catalog

Lazada currently has millions of products listed where many versions of the same product are being sold. Buyers may find their online experience hindered by many conflicting factors in making a decision, such as lower prices, shorter time to delivery, and higher transaction reputation. To help buyers gain quick access to the best products in the catalogue/search results, intelligent ranking orders of all available product items needs to be generated to help buyers make preferred decisions and sellers get better sales.

Difficulty measuring features which lead to higher conversion rates and better customer experience

One of the key drivers of a customer’s experience throughout the purchase journey (from online browsing to receiving a product) is the product’s quality. However, product quality is hard to determine due to no fixed way of measuring it. A methodology is required to assess what attributes contribute most to customers’ impression of product quality. On top of that, there are other useful features that have yet to be explored, including seller rating.


Project objectives

As such, the objective of our team is to model possible predictors for conversion rate and customer experience through measures of product quality and seller rating features.This will be used to derive recommendations in Lazada's on-site ranking system to enhance customer experience through quick access to the best products in customer’s catalogue.

Project Details

System Architecture


Product & Seller Features (Predictive Variables)

Product Quality (online) - What factors have an effect on conversion and customer experience? - How should we adjust our website to improve online engagement, and NPS?
Product Quality (delivery) - What factors have an effect on conversion and customer experience? - What can we do to improve customer NPS after they have received their product?
Seller Rating - What factors have an effect on conversion and customer experience? - Which factors should we prioritize when building a seller dashboard?

Conversion Rate & Customer Experience Metrics

Conversion Rate

Conversion Rate is the percentage of visits which results in e-commerce transactions (sales). It helps to calculate how many visitors (shoppers) are actually turning into buyers (customers). High or higher conversion rate is the ultimate target for the business.

Customer Experience

Customer Experience is the entire interactions that a customer has with the brand, organization, or e-commerce platform in this case, and an organization should aim to deliver most enjoyable and usable experience to existing and potential customers.
Formulas:
Entire Website: E-commerce Conversion Rate = (Total E-commerce transactions / total visits on a website) * 100
By Product: E-commerce Conversion Rate for a product = (Product Transaction / Product page visits) * 100
For this project, we can get Conversion Rate from data collected, or get calculated Conversion Rate from Google Analytics platform since Lazada Group holds GA account.
Another important aspect of Conversion Rate is the factors that have high correlation with high Conversion Rate (eg) Shorter payment process, coupon. This will be examined by correlation and/or regression analysis from softwares like SPSS.


Formulas:
Net Promoter Score (NPS): % of Promoters - % of Detractors
Promoters (score 9-10): loyal customers who keep buying and referring to others
Detractors (score 0-6): dissatisfied customers who might spread negative comments and word-of-mouth


Data Source


Nature of Dataset

To be obtained on approval (11th January).


Methodology

LazadaMethodology.png


Data Collection

This will be done to form the pipeline of data extraction from Lazada database and Google Analytics. The challenge is to properly pull out quality data from the relevant and updated sources.

Data Exploration and Cleaning

Manage exploratory analysis of these data. These analysis will be used to improve on business questions which also affect the exploratory analysis. This process will be done repeatedly with necessary data cleaning and munging until we find business questions which accurately express business needs given the data and exploratory analysis made. Data Modelling

After a proper exploratory phase of the analysis, we will train and test machine learning models to to answer predictive and prescriptive business questions. This will include processes such as clustering to segment user behaviours, regression to include impacts of various seller attributes to CX Metrics, etc. Various statistical learning models such as Random Forest and Regularization might also be used to reduce risk of overfitting and increase testing accuracy of models.

Data Visualization

These data analysis will be documented visually Jupyter Notebook or interactive dashboard tools which are later demonstrated and presented to business users such as Lazada suppliers and internal teams. Insights presentation techniques such as Storyboarding and Pyramid technique (Barbara Pinto) might also be used to ensure proper presentation to match findings and business needs.