Difference between revisions of "ANLY482 AY2016-17 T1 Group4: Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
(Undo revision 11324 by Ranon.sim.2013 (talk))
(Undo revision 11317 by Ranon.sim.2013 (talk))
 
Line 1: Line 1:
 
<!--Logo-->
 
<!--Logo-->
[[Image: lazadash.png|center|1000x150px]]<br>
+
[[Image: TeamInsured_Logo.png|center|1000x150px]]<br>
 
<!--Header Start-->
 
<!--Header Start-->
 
{|style="background-color:#4c5974; color:#F5F5F5; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#4c5974; color:#F5F5F5; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
Line 31: Line 31:
 
<!--Body Start-->
 
<!--Body Start-->
  
==Motivation & Business Problem==
+
==Background==
Lazada is still small compared to major competitors (e.g. Aliexpress / Amazon / Tao Bao) and it still needs to continue expanding and strengthening its user base before it can start to focus on realizing its profit goals. Despite its small size, it has significant market penetrations across many different countries in South-East Asia. That said, as a relative newcomer in the industry, it needs to prepare to for the serious market competition that Amazon might bring in the future. One way to provide a barrier to entry towards Amazon is to understand its buyers and sellers better and offer more effective strategies to enhance their experiences.
+
At the insurance company, serving happiness to customers has always been the main focus. As such, we at Team Insured have been tasked to assist their customer base, specifically for the Life Insurance side of the business.
 +
<br><br>
 +
The three main things that the insurance company requires our help for to do analysis on:
 +
* Average Product Holdings per Customer
 +
* Customer Segmentation
 +
* Agent Segmentation.  
 +
<br>
 +
Deeper analysis such as Need Based Analysis has also been requested. These are the three general objectives that are to be pursued for this project, so as to help the insurance company understand their customers better.
 
<br>
 
<br>
=== Current Issues ===
 
 
==== '''Difficulty navigating product searches catalog''' ====
 
Lazada currently has millions of products listed where many versions of the same product are being sold. Buyers may find their online experience hindered by many conflicting factors in making a decision, such as lower prices, shorter time to delivery, and higher transaction reputation. To help buyers gain quick access to the best products in the catalogue/search results, intelligent ranking orders of all available product items needs to be generated to help buyers make preferred decisions and sellers get better sales.
 
 
==== '''Difficulty measuring features which lead to higher conversion rates and better customer experience''' ====
 
One of the key drivers of a customer’s experience throughout the purchase journey (from online browsing to receiving a product) is the product’s quality. However, product quality is hard to determine due to no fixed way of measuring it. A methodology is required to assess what attributes contribute most to customers’ impression of product quality. On top of that, there are other useful features that have yet to be explored, including seller rating.
 
  
 +
==Data Source==
  
== Project Objectives ==
 
As such, the objective of our team is to model possible predictors for conversion rate and customer experience through measures of product quality and seller rating features.This will be used to derive recommendations in Lazada's on-site ranking system to enhance customer experience through quick access to the best products in customer’s catalogue.
 
<br>
 
 
== Project Details ==
 
===System Architecture ===
 
[[Image: LazadaSA.png | center]]
 
 
<br>
 
<br>
 +
===Nature of Dataset===
  
===Product & Seller Features (Predictive Variables)===
+
The team was able to obtain ten different datasets from the main Database, of which eight were usable and relevant to our area of analysis (Actual table names have been masked for privacy's sake.)
Product Quality (online)
 
- What factors have an effect on conversion and customer experience?
 
- How should we adjust our website to improve online engagement, and NPS?
 
<br/>
 
Product Quality (delivery)
 
- What factors have an effect on conversion and customer experience?
 
- What can we do to improve customer NPS after they have received their product?
 
<br/>
 
Seller Rating
 
- What factors have an effect on conversion and customer experience?
 
- Which factors should we prioritize when building a seller dashboard?
 
<br/>
 
  
===Conversion Rate & Customer Experience Metrics===
+
*The first dataset “Client”, the primary Client dataset, contains approximately 510,000 rows of client records, with 105 unique columns, which came in a “.TXT” formatted flat file.  
====Conversion Rate====
 
Conversion Rate is the percentage of visits which results in e-commerce transactions (sales). It helps to calculate how many visitors (shoppers) are actually turning into buyers (customers). High or higher conversion rate is the ultimate target for the business.
 
<br/>
 
  
====Customer Experience====
+
*The second dataset, “Contract”, that was provided was the contract header dataset that held each clients’ contract data, containing approximately 1.4 million rows of records, with 234 unique columns, which came in a “.TXT” formatted flat file.  
Customer Experience is the entire interactions that a customer has with the brand, organization, or e-commerce platform in this case, and an organization should aim to deliver most enjoyable and usable experience to existing and potential customers.
 
<br/><br/>
 
Formulas:
 
<br/>
 
'''Entire Website:''' E-commerce Conversion Rate = (Total E-commerce transactions / total visits on a website) * 100
 
<br/>
 
'''By Product:'''  E-commerce Conversion Rate for a product = (Product Transaction / Product page visits) * 100
 
<br/><br/>
 
For this project, we can get Conversion Rate from data collected, or get calculated Conversion Rate from Google Analytics platform since Lazada Group holds GA account.
 
<br/><br/>
 
Another important aspect of Conversion Rate is the factors that have high correlation with high Conversion Rate (eg) Shorter payment process, coupon. This will be examined by correlation and/or regression analysis from softwares like SPSS.
 
<br/>
 
  
 +
*The third dataset, “Life”, that was provided was the Life Insurance data that was linked to each contract, containing approximately 414,000 rows of records, with 27 unique columns, which came in a “.TXT” formatted flat file.
  
Formulas:
+
*The fourth dataset, “Coverage”, that was provided was the coverage data that was linked to each contract, containing approximately 2.86 million rows of records, with 102 unique columns, which came in a “.TXT” formatted flat file.
<br/>
 
'''Net Promoter Score (NPS): % of Promoters - % of Detractors'''
 
<br/>
 
Promoters (score 9-10): loyal customers who keep buying and referring to others
 
<br/>
 
Detractors (score 0-6): dissatisfied customers who might spread negative comments and word-of-mouth
 
  
===Data Source===
+
*The fifth dataset, “Agent”, that was provided was the agent data linked to each client, containing approximately 10800 rows of records, with 51 unique columns, which came in a “.TXT” formatted flat file.
  
<br>
+
*The sixth dataset, “Payer”, that was provided was the payee data linked to each contract, containing approximately 1.14 million rows of records, with 51 unique columns, which came in a “.TXT” formatted flat file.
===Nature of Dataset===
 
  
To be obtained on approval (11th January).  
+
*The seventh dataset, “ClientRel”, that was provided was a table linking agent data, client data and life insurance data together, containing approximately 2.482 million rows of records, with 11 unique columns, which came in a “.TXT” formatted flat file.  
  
 +
*The eighth dataset, “ClientExtra”, that was provided was the extra client data linked to each client, containing approximately 396,000 rows of records, with 21 unique columns, which came in a “.TXT” formatted flat file.
  
 +
===ER Diagram===
 +
[[Image: Teaminsured_erdiagram.png | center]]
 +
<br>
  
 
==Methodology==
 
==Methodology==
 
[[Image: LazadaMethodology.png|center|1000x150px]]
 
  
 
<br>
 
<br>
  
=== Data Collection ===
+
===Cleaning and Merging===
+
* The flat files were processed with Windows Powershell Scripts, to removed any accidental line breaks and escape characters.
This will be done to form the pipeline of data extraction from Lazada database and Google Analytics. The challenge is to properly pull out quality data from the relevant and updated sources.
+
* The files were then manually validated against a data dictionary (Word Document) and then loaded into SAS Enterprise Guide
 +
* After that the files were merged inside SAS EG and Exported as SAS7BAT files, to be analysed by the team.
  
=== Data Exploration and Cleaning===
+
===Exploratory Data Analysis===
+
* We used SAS Entreprise Guide for our Exploratory Data Analysis.
Manage exploratory analysis of these data. These analysis will be used to improve on business questions which also affect the exploratory analysis. This process will be done repeatedly with necessary data cleaning and munging until we find business questions which accurately express business needs given the data and exploratory analysis made.
 
Data Modelling
 
 
After a proper exploratory phase of the analysis, we will train and test machine learning models to to answer predictive and prescriptive business questions. This will include processes such as clustering to segment user behaviours, regression to include impacts of various seller attributes to CX Metrics, etc. Various statistical learning models such as Random Forest and Regularization might also be used to reduce risk of overfitting and increase testing accuracy of models.
 
 
===Data Visualization===
 
 
These data analysis will be documented visually Jupyter Notebook or interactive dashboard tools which are later demonstrated and presented to business users such as Lazada suppliers and internal teams. Insights presentation techniques such as Storyboarding and Pyramid technique (Barbara Pinto) might also be used to ensure proper presentation to match findings and business needs.
 
  
 
<!--Body End-->
 
<!--Body End-->

Latest revision as of 19:01, 31 December 2016

TeamInsured Logo.png


TeamInsured Home.png   HOME

 

TeamInsured About Icon.png   PROJECT OVERVIEW

 

TeamInsured Findings.png   PROJECT FINDINGS

 

TeamInsured PM.png   PROJECT MANAGEMENT

 

TeamInsured Documentation.png   DOCUMENTATION


Background

At the insurance company, serving happiness to customers has always been the main focus. As such, we at Team Insured have been tasked to assist their customer base, specifically for the Life Insurance side of the business.

The three main things that the insurance company requires our help for to do analysis on:

  • Average Product Holdings per Customer
  • Customer Segmentation
  • Agent Segmentation.


Deeper analysis such as Need Based Analysis has also been requested. These are the three general objectives that are to be pursued for this project, so as to help the insurance company understand their customers better.

Data Source


Nature of Dataset

The team was able to obtain ten different datasets from the main Database, of which eight were usable and relevant to our area of analysis (Actual table names have been masked for privacy's sake.)

  • The first dataset “Client”, the primary Client dataset, contains approximately 510,000 rows of client records, with 105 unique columns, which came in a “.TXT” formatted flat file.
  • The second dataset, “Contract”, that was provided was the contract header dataset that held each clients’ contract data, containing approximately 1.4 million rows of records, with 234 unique columns, which came in a “.TXT” formatted flat file.
  • The third dataset, “Life”, that was provided was the Life Insurance data that was linked to each contract, containing approximately 414,000 rows of records, with 27 unique columns, which came in a “.TXT” formatted flat file.
  • The fourth dataset, “Coverage”, that was provided was the coverage data that was linked to each contract, containing approximately 2.86 million rows of records, with 102 unique columns, which came in a “.TXT” formatted flat file.
  • The fifth dataset, “Agent”, that was provided was the agent data linked to each client, containing approximately 10800 rows of records, with 51 unique columns, which came in a “.TXT” formatted flat file.
  • The sixth dataset, “Payer”, that was provided was the payee data linked to each contract, containing approximately 1.14 million rows of records, with 51 unique columns, which came in a “.TXT” formatted flat file.
  • The seventh dataset, “ClientRel”, that was provided was a table linking agent data, client data and life insurance data together, containing approximately 2.482 million rows of records, with 11 unique columns, which came in a “.TXT” formatted flat file.
  • The eighth dataset, “ClientExtra”, that was provided was the extra client data linked to each client, containing approximately 396,000 rows of records, with 21 unique columns, which came in a “.TXT” formatted flat file.

ER Diagram

Teaminsured erdiagram.png


Methodology


Cleaning and Merging

  • The flat files were processed with Windows Powershell Scripts, to removed any accidental line breaks and escape characters.
  • The files were then manually validated against a data dictionary (Word Document) and then loaded into SAS Enterprise Guide
  • After that the files were merged inside SAS EG and Exported as SAS7BAT files, to be analysed by the team.

Exploratory Data Analysis

  • We used SAS Entreprise Guide for our Exploratory Data Analysis.