Difference between revisions of "Analysis of User and Merchant Dropoff for Sugar App Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 54: Line 54:
  
 
==<div style="background: #95A5A6; line-height: 0.3em; font-family:helvetica;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Tools Used</strong></font></div></div>==
 
==<div style="background: #95A5A6; line-height: 0.3em; font-family:helvetica;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Tools Used</strong></font></div></div>==
<ol>
+
JMP Pro will be used to perform exploratory analysis, funnel plot analysis and survival analysis. SAS JMP Pro is an analytical software that is able to handle large volumes of data efficiently, which is imperative since Sugar's data is too large to be handled by other software such as Microsoft Excel. Its built-in tools for survival analysis and funnel plot add-in will be extremely useful in our analysis. We are also very familiar with JMP Pro as we have utilised the software for many of our analytical modules such as Analytical Foundation.
<li>Sequel Pro: Connection to Sugar's SQL database</li>
 
<li>Flurry: Sugar's analytics dashboard</li>
 
<li>Localytics: Sugar's analytics dashboard</li>
 
<li>SAS JMP: Analysis of data</li>
 
</ol>
 
  
==<div style="background: #95A5A6; line-height: 0.3em; font-family:helvetica;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Analysis</strong></font></div></div>==
+
In terms of time-series data-mining, we will be using SAS Enterprise Miner as its tool allows us to perform descriptive, predictive and time-series analysis on huge volumes of data.
We will perform the analysis in 4 stages:
 
  
<u>'''Stage 1 : Exploratory Analysis'''</u><br>
+
QGIS will be used for mainly geospatial analysis.
We will start off with an exploratory analysis into the behavior of the users and seek to answer the following questions:
 
  
*Who are the active users & merchants?
+
==<div style="background: #95A5A6; line-height: 0.3em; font-family:helvetica;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Analysis</strong></font></div></div>==
*Who are the non-active users & merchants?
+
'''Merchants'''
*What is the trend from 2013 to 2015?
+
*Identifying star merchants (performing better than expected) and laggard merchants (performing worse than expected)
*What are the most popular deals?
+
**Based on Revenue and Redemption Rate
 
+
*Identifying time-series patterns (e.g. day of week and hour) and grouping merchants with similar redemption behavior together
This is done via merging of users, merchants, campaigns and orders table and doing frequency counts and trends based on the table’s content, in order to spot any outliers or notable trends.
 
 
 
<u>'''Stage 2: Cluster Analysis'''</u><br>
 
We will then attempt to cluster the users/merchants into different groups using their categories, purchase history, activity level, geolocation (for starters). Each user or merchant will be assigned a cluster number.
 
 
 
<u>'''Stage 3: Multiple Survival Curves'''</u><br>
 
Each cluster will then be analysed with a survival curve with a corresponding event of the user’s journey funnel.
 
 
 
The survival curves include:
 
#First Installation to First Skimming
 
#First Skimming to First Purchase
 
#First Purchase to First Redemption
 
#First Redemption to Repeated purchases
 
  
<u>'''Stage 4: Refining and extrapolating'''</u><br>
+
'''Items'''
After survival analysis, we may refine the user segments to have better differentiation between the groups.
+
*Identifying star items (performing better than expected) and laggard items (performing worse than expected)
 +
**Based on Impressions, Clicks and Price
 +
*Identifying time-series patterns (e.g. day of week and hour) and grouping items with similar redemption behavior together - drill down to product level
  
From the results, we will attempt to generate recommendations and identify high value targets, as well as their attrition probability for Sugar.
+
'''Users'''
 +
*Identifying star users (performing better than expected) and normal users using LRFM
 +
*#Length
 +
*#Recency
 +
*#Frequency
 +
*#Monetary
 +
*Cluster users based on that
 +
*Apply survival analysis from installation to purchase (e.g. which type of users are more likely to purchase only x times before falling off the app?)
  
Lastly, if possible, we can extrapolate the data to do forecasting and calculate the lifetime value for users and merchants.
+
'''Geospatial'''
 +
*Determining the relationship between user location and merchant location (i.e. do users really go to places near them or are they more willing to travel to visit certain merchants?)
 +
*Identifying popular areas and unpopular areas (i.e. the proportion of merchants in an area should be proportion to the number of orders it receive)
 +
*Recommending potential locations for new merchants

Revision as of 12:31, 27 February 2016

Home

 

Project Overview

 

Findings

 

Project Documentation

 

Project Management

Background Data Source Methodology

Introduction

The key aim of this project is to tackle the three problems above and subsequently increase the voucher redemption rate, user retention rate and merchant retention rate. For this study, we will only focus on Singapore users and merchants.

As the nature of our study differs in some ways to existing literature reviews, we face three main limitations. Hence, we will make adjustments to the pre-existing methods of survival analysis.

Limitations

Our first limitation is, unlike subscription-based services, Sugar provides the app to users for free. Users can use the app indefinitely or choose to uninstall it. However, at this present time, there is no way for Sugar to track their uninstallations. This means that Sugar has no way of telling when a user has dropped off for real.

Our second limitation is that Sugar belongs in a two sided market. In a two sided market, users and merchant affect each other. As such, dropouts on the user end can cause dropouts on the merchant end, and vice versa. Thus, our survival analysis may be confounded by the network effects.

Our third limitation is that Sugar is an ecommerce app, which takes users through a sales funnel. There are a few main stages of a user’s journey, and users can drop off at any point:

Installation > Skimming > First Purchase > Redemption > Second Purchase


Sugar’s aim will be to move as many users as possible from the start to the end of the funnel in order to earn profits.

Therefore, it is not a straightforward analysis as predicting churn for a fixed subscription and it requires multiple survival curves to have a complete picture of the user’s journey.

Tools Used

JMP Pro will be used to perform exploratory analysis, funnel plot analysis and survival analysis. SAS JMP Pro is an analytical software that is able to handle large volumes of data efficiently, which is imperative since Sugar's data is too large to be handled by other software such as Microsoft Excel. Its built-in tools for survival analysis and funnel plot add-in will be extremely useful in our analysis. We are also very familiar with JMP Pro as we have utilised the software for many of our analytical modules such as Analytical Foundation.

In terms of time-series data-mining, we will be using SAS Enterprise Miner as its tool allows us to perform descriptive, predictive and time-series analysis on huge volumes of data.

QGIS will be used for mainly geospatial analysis.

Analysis

Merchants

  • Identifying star merchants (performing better than expected) and laggard merchants (performing worse than expected)
    • Based on Revenue and Redemption Rate
  • Identifying time-series patterns (e.g. day of week and hour) and grouping merchants with similar redemption behavior together

Items

  • Identifying star items (performing better than expected) and laggard items (performing worse than expected)
    • Based on Impressions, Clicks and Price
  • Identifying time-series patterns (e.g. day of week and hour) and grouping items with similar redemption behavior together - drill down to product level

Users

  • Identifying star users (performing better than expected) and normal users using LRFM
    1. Length
    2. Recency
    3. Frequency
    4. Monetary
  • Cluster users based on that
  • Apply survival analysis from installation to purchase (e.g. which type of users are more likely to purchase only x times before falling off the app?)

Geospatial

  • Determining the relationship between user location and merchant location (i.e. do users really go to places near them or are they more willing to travel to visit certain merchants?)
  • Identifying popular areas and unpopular areas (i.e. the proportion of merchants in an area should be proportion to the number of orders it receive)
  • Recommending potential locations for new merchants