Analysis of User and Merchant Dropoff for Sugar App Data Source

From Analytics Practicum
Jump to navigation Jump to search

Home

 

Project Overview

 

Findings

 

Project Documentation

 

Project Management

Background Data Source Methodology

Data Source

We will be using Sequel Pro to connect to Sugar’s SQL database in order to extract any necessary data. We will also be logging on to their Flurry or Localytics dashboard to extract any further information that we may need.

JMP Pro will be used to perform the survival analysis.

Details of Data Source

We have 4 years of data from Sugar’s database and dashboards, from 2013 to 2016. However, for our analysis we have chosen to focus on only 2 full years of data as to ensure uniformity and consistency in our analysis, especially with regards to temporal. From our dataset, we have identified certain variables that we will use in our analysis.

However, before embarking on our analysis, we have to do extensive data cleaning. For example, we have 45,000 users but a number of them are throwaway test accounts, or some are dead on arrival with no orders to their name.

The users and merchants also belong to different regions, namely Singapore, Jakarta, Hong Kong, and we have to extract and focus on the Singapore region users and merchants.

Data Preparation

As the data for Sugar is massive, cleaning the data took several stages.

SugarBranch.png

First, the Users table was cleaned to only contain Singapore users with the following steps:

  1. Clean out all Users that are not from Singapore by deleting Country_id != 1
  2. Clean out user_id:99280 (invalid mobile number(10000) and balance of 1200)

Secondly, Branch table was cleaned to only contain Singapore non-Sugar Businesses, and eliminate duplicates :

  1. Clean out all Merchants that are not from Singapore by deleting [Country_id] != 1
  2. Clean out all Merchants that are Sugar’s administrative accounts
  3. Clean out all Merchants that are accounts created for specific campaigns (e.g. Blackout Friday, SugarGSS etc)
  4. Clean out Merchant duplicates
  5. Corrected [Area] field for Merchants that have invalid/blank fields
  6. Corrected [Lng,Lat] coordinates for following branches that have incorrect coordinates (i.e. [Country_id] == 1 but coordinates correspond to another country)
  7. Cleaned out all Merchants that have invalid [Created_At] dates
  8. Cleaned out all Merchants whose [Created_At] dates != 2014 OR 2015

Next, the Users table and Branch table was inner joined with the Order table to eliminate all non-Singaporean users and businesses. We further cleaned the data using latitude and longitude to eliminate all orders made outside Singapore.

SugarMerchant.png

Lastly, the Branch table was inner joined with the Orders table to form Merchant table (for merchant analysis). It is further cleaned to include only paid orders and decomposed further for both the funnel plot analysis and time-series data mining analysis.

  1. Clean out all unpaid orders (i.e. [Status] != 1,2 and 4)
  2. Clean out all orders where [Price] == 0
  3. Clean out all orders where it is gifted during a Sugar event (i.e. [Is_Reward] == 1)
  4. Clean out all orders that are lucky draws
  5. Decomposed [Redeemed_Date] into Hour, Day of Week and Month