Difference between revisions of "Analysis of User and Merchant Dropoff for Sugar App Data Source"

From Analytics Practicum
Jump to navigation Jump to search
Line 47: Line 47:
 
The users and merchants also belong to different regions, namely Singapore, Jakarta, Hong Kong, and we have to extract and focus on the Singapore region users and merchants.
 
The users and merchants also belong to different regions, namely Singapore, Jakarta, Hong Kong, and we have to extract and focus on the Singapore region users and merchants.
  
#User Data
+
==<div style="background: #95A5A6; line-height: 0.3em; font-family:helvetica;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Data Preparation</strong></font></div></div>==
#*ID
+
 
#*Latitude
+
As the data for Sugar is massive, cleaning the data took several stages.
#*Longitude
+
 
#*Timezone
+
[[File:SugarBranch.png|600px]]
#*Country
+
 
#*Username
+
First, the '''Users''' table was cleaned to only contain Singapore users with the following steps:
#*Email
+
# Clean out all Users that are not from Singapore by deleting Country_id != 1 
#*Creation Date
+
# Clean out user_id:99280 (invalid mobile number(10000) and balance of 1200)
#*Updated At Data
+
 
#Merchant Brand Data:  
+
Secondly, '''Branch''' table was cleaned to only contain Singapore non-Sugar Businesses, and eliminate duplicates :
#*ID
+
 
#*Manager ID
+
# Clean out all Merchants that are not from Singapore by deleting [Country_id] != 1
#*Name
+
# Clean out all Merchants that are Sugar’s administrative accounts
#*Description
+
# Clean out all Merchants that are accounts created for specific campaigns (e.g. Blackout Friday, SugarGSS etc)
#*Creation Date
+
# Clean out Merchant duplicates
#*Updated At Date
+
# Corrected [Area] field for Merchants that have invalid/blank fields
#Merchant Branch Data:
+
# Corrected [Lng,Lat] coordinates for following branches that have incorrect coordinates (i.e. [Country_id] == 1 but coordinates correspond to another country)
#*ID
+
# Cleaned out all Merchants that have invalid [Created_At] dates
#*Brand ID
+
# Cleaned out all Merchants whose [Created_At] dates != 2014 OR 2015
#*Latitude
+
 
#*Longitude
+
Next, the '''Users''' table and '''Branch''' table was inner joined with the '''Order''' table to eliminate all non-Singaporean users and businesses. We further cleaned the data using latitude and longitude to eliminate all orders made outside Singapore.
#*Country
+
 
#*Name
+
[[File:SugarMerchant.png|600px]]
#*Address
+
 
#*Description
+
Lastly, the '''Branch''' table was inner joined with the '''Orders''' table to form '''Merchant''' table (for merchant analysis). It is further cleaned to include only paid orders and decomposed further for both the funnel plot analysis and time-series data mining analysis.
#*Redemption Type
+
 
#*Redemption Time
+
# Clean out all unpaid orders (i.e. [Status] != 1,2 and 4)
#*Rating
+
# Clean out all orders where [Price] == 0
#*Enabled
+
# Clean out all orders where it is gifted during a Sugar event (i.e. [Is_Reward] == 1)
#*Creation Date
+
# Clean out all orders that are lucky draws
#*Updated At Date
+
# Decomposed [Redeemed_Date] into Hour, Day of Week and Month
#Item Data
 
#*ID
 
#*Brand ID
 
#*Name
 
#*Description
 
#*Creation Date
 
#*Updated At Date
 
#*Market Price
 
#Campaign Data (Each campaign is an offer on the app)
 
#*ID
 
#*Brand ID
 
#*Item ID
 
#*Branch ID
 
#*Is all branch
 
#*Category Name
 
#*Start Time
 
#*End Time
 
#*Redemption Start Time
 
#*Redemption End Time
 
#*Redemption After Buy
 
#*Start Price
 
#*Floor Price
 
#*Unlock Price
 
#*Current Price
 
#*Bargain Range
 
#*Stock
 
#*Left
 
#*Click Count
 
#*Weight
 
#*Needs Booking
 
#*Allows Take Out
 
#*Tips(description)
 
#*Enabled
 
#*Creation Date
 
#*Updated At Date
 
#Campaign Skim Data
 
#*ID
 
#*Campaign ID
 
#*Branch ID
 
#*User ID
 
#*Amount
 
#*Hour
 
#*Coordinates
 
#Campaign Click Data
 
#*ID
 
#*Campaign ID
 
#*Branch ID
 
#*User ID
 
#*Amount
 
#*Hour
 
#*Coordinates
 
#Campaign Buy Data
 
#*ID
 
#*Campaign ID
 
#*Branch ID
 
#*User ID
 
#*Amount
 
#*Hour
 
#*Coordinates
 
#Campaign View Data
 
#*ID
 
#*Campaign ID
 
#*Branch ID
 
#*User ID
 
#*Amount
 
#*Hour
 
#*Coordinates
 
#Campaign Spread Data
 
#*ID
 
#*Campaign ID
 
#*Branch ID
 
#*User ID
 
#*Amount
 
#*Hour
 
#*Coordinates
 
#Order Data (when a user buys something from a campaign)
 
#*ID
 
#*Creator ID
 
#*Owner ID
 
#*Campaign ID
 
#*Branch ID
 
#*Price
 
#*Status
 
#*Trade Number
 
#*Redemption Date
 
#*Expiration Date
 
#*Is Commented
 
#*Is Deleted
 
#*Is Rewarded
 
#*Longitude
 
#*Latitude
 
#*Creation Date
 
#*Updated At Date
 
#*Refunded At Date
 

Revision as of 12:09, 27 February 2016

Home

 

Project Overview

 

Findings

 

Project Documentation

 

Project Management

Background Data Source Methodology

Data Source

We will be using Sequel Pro to connect to Sugar’s SQL database in order to extract any necessary data. We will also be logging on to their Flurry or Localytics dashboard to extract any further information that we may need.

JMP Pro will be used to perform the survival analysis.

Details of Data Source

We have 3 years of data from Sugar’s database and dashboards, from 2013 to 2015. From our dataset, we have identified certain variables that we will using in our analysis.

However, before embarking on our analysis, we have to do extensive data cleaning. For example, we have 45,000 users but a number of them are throwaway test accounts, or some are dead on arrival with no orders to their name.

The users and merchants also belong to different regions, namely Singapore, Jakarta, Hong Kong, and we have to extract and focus on the Singapore region users and merchants.

Data Preparation

As the data for Sugar is massive, cleaning the data took several stages.

SugarBranch.png

First, the Users table was cleaned to only contain Singapore users with the following steps:

  1. Clean out all Users that are not from Singapore by deleting Country_id != 1
  2. Clean out user_id:99280 (invalid mobile number(10000) and balance of 1200)

Secondly, Branch table was cleaned to only contain Singapore non-Sugar Businesses, and eliminate duplicates :

  1. Clean out all Merchants that are not from Singapore by deleting [Country_id] != 1
  2. Clean out all Merchants that are Sugar’s administrative accounts
  3. Clean out all Merchants that are accounts created for specific campaigns (e.g. Blackout Friday, SugarGSS etc)
  4. Clean out Merchant duplicates
  5. Corrected [Area] field for Merchants that have invalid/blank fields
  6. Corrected [Lng,Lat] coordinates for following branches that have incorrect coordinates (i.e. [Country_id] == 1 but coordinates correspond to another country)
  7. Cleaned out all Merchants that have invalid [Created_At] dates
  8. Cleaned out all Merchants whose [Created_At] dates != 2014 OR 2015

Next, the Users table and Branch table was inner joined with the Order table to eliminate all non-Singaporean users and businesses. We further cleaned the data using latitude and longitude to eliminate all orders made outside Singapore.

SugarMerchant.png

Lastly, the Branch table was inner joined with the Orders table to form Merchant table (for merchant analysis). It is further cleaned to include only paid orders and decomposed further for both the funnel plot analysis and time-series data mining analysis.

  1. Clean out all unpaid orders (i.e. [Status] != 1,2 and 4)
  2. Clean out all orders where [Price] == 0
  3. Clean out all orders where it is gifted during a Sugar event (i.e. [Is_Reward] == 1)
  4. Clean out all orders that are lucky draws
  5. Decomposed [Redeemed_Date] into Hour, Day of Week and Month