Analysis of User and Merchant Dropoff for Sugar App Finals

From Analytics Practicum
Revision as of 20:34, 10 April 2016 by Elizabetht.2012 (talk | contribs)
Jump to navigation Jump to search

Home

 

Project Overview

 

Findings

 

Project Documentation

 

Project Management

Mid-Term Finals
Funnel Plot Analysis Time Series Analysis Geospatial Analysis

Abstract

Our client is a city guide discovery application that brings users and merchants together through geo-located offers. The objective of this research is to examine merchant performance via redemption rates. The results are displayed using funnel plots, a useful tool for displaying unbiased information on performance outcomes when comparing entities within a group. The funnel plot shows a high amount of overdispersion where there is a large number of outlying merchants. By further analyzing under-performing and over-performing merchants separately, the analysis shows that there is also a large variation in outlying redemption rates within each group. To investigate the underlying reasons, we conducted exploratory data analysis. Merchant and product category are shown to be significant contributors to a merchant’s redemption rate. These findings will help our client set benchmarks for individual merchants and develop interventions to help merchants increase their performance.

Introduction

As of 2016, Singapore has the highest smartphone penetration rate in the world at 88%, ahead of other countries such as Hong Kong, Australia and the U.S. [1]. Most smartphones today are able to use GPS track the user’s location, opening up opportunities for location-based marketing.

Our client seeks to take advantage of this by offering local deals tailored to the user’s location using a mobile app. It helps local small businesses such as cafes, restaurants, and small retail shops get discovered and market to users close in proximity. Whenever the user opens the app, they can see a list of discounted items from merchants nearby.

By definition, our client exists as a platform in a two-sided market, enabling two groups of end-users to come “on board” by charging one or both sides [2]. Examples include Alibaba, eBay, Groupon, Uber and many more e-commerce companies. Like many other two-sided markets, it connects users and merchants and earns a premium for connecting these two groups by charging the transactions made between the two groups.

Merchants

Redemption rate will be used as the performance indicator since the only channel for generating revenue is via upsell. This is the main factor of merchant attrition as merchants with little to zero uptake will perceive the application as a waste of time and choose to drop off. Since our client is in a two-sided market, a high attrition rate of merchants may result in users also ceasing usage.

For our client, there is a strong need to retain and attract merchants who are performing well and to improve or perhaps eliminate merchants who are not. Hence, this practical research aims to explore and demonstrate the effectiveness of using funnel plots to analyze merchant performance.

Review of Existing Work

In the field of medical research, ‘league tables’ such as caterpillar plots have been widely utilised to visualize the ranking according to a performance indicator. Figure 1 shows a league table of hospitals based on mortality following a fractured hip [3].

The critical flaw with caterpillar plots as seen in Figure 1 is the inability to take sample size into consideration when ranking results. Inadequate sample sizes with wide ranges of results often make it difficult and unwise to rank results; results that can lead to inaccurate interpretations and false decisions. [4] Thus, there is a strong need to seek out alternative charts that can give a more comprehensive and accurate analysis.

KEY fp1.png

Funnel plots are a form of scatterplot in which observed area rates are plotted against area populations. Control limits, which are computed similarly to confidence limits are then overlaid on the scatter plot. The control limits represent the expected variation in rates assuming that the only source of variation is stochastic. The funnel shape is generated due to the smaller expected variability in larger populations. When many points fall outside the funnel, the plot can be described as “over-dispersed” and it can be said that the process is not in control or the model does not fit the data well.

KEY fp2.png

As seen in Figure 2, it is the same data that is presented in Figure 1 but however, this plots the observed mortality rate against the sample size. While a researcher may be quick to suggest hospitals who lie below the average mortality rate in Figure 1 as ‘under-performing’, the funnel plot in Figure 2 shows that no hospital lies outside the 99.8% limits. Thus, there is no basis for indicating any hospitals who are under-performing or over-performing since they mostly lie within the 95% limits.

Funnel plots has been prevalent in medical research [6,7] but few and between in the area of retail analytics. Its usage in analyzing and interpreting merchant’s redemption behaviour could bring about important business insights that retail businesses could utilise. For example, merchants who have been identified as under-performing can potentially be cut loose from the application. Subsequently, more effort and resources can be invested into retaining outperforming merchants and even recruit merchants of a similar nature in the future.

Methodology

KEY Data.png

Figure 3 shows the data transformation process which we have undertaken for our research study. For this research study, two years’ worth of data from 2014 and 2015 will be utilised.

Firstly, SequelPro was used to extract the data from our client’s database. The database consists of our client’s orders, branch, brand and item data which we will be using in our analysis. Using SAS JMP Pro, these tables were joined together to form a more comprehensive dataset which we will be using for our analyses. Following that, the raw data provided by our client had several errors that needed a high level of data cleansing. This includes cleaning out invalid merchants and orders, correcting geographical coordinates and merchant details. In addition, there were several confounding factors that could affect our analysis such free items that were given out during campaigns.

Next, we filtered for the relevant data which we needed for our analysis (e.g. localized orders and merchants). This is important because the scope of the project has been defined by our client, specific to the Singapore context. In order to add more dimensions to our data for better analysis, we recoded selected columns in our dataset. This will be explained further below in-depth. Lastly, we loaded our data into the JMP server to conduct our funnel plot analysis.

Data preparation albeit being a time-consuming and tedious task, is extremely important. It allows more accurate, easier and better interpretation of the data. High-quality data is essential for having an accurate and reliable analysis of the results. Data cleansing and transformation are two key areas which we have invested a lot of our time and effort into during this research study.

After data cleansing as what was mentioned above, we conducted exploratory analysis. We discovered two main groups of merchants that are prevalent on our client’s app - Lifestyle/Retail merchants and F&B merchants. Both groups exhibit very different user behaviour which will be analysed further.

In addition to reconfiguring our client’s current categorization system which have a large area of overlap between item categories, there is also a need to delve deeper by splitting the item categories into more narrow and specific ones. This way, our client is able to gain a deeper insight of what items are more popular during specific timings of the day.

KEY SCategories.png

The time-stamped data provided will enable us to discover greater insights and knowledge about users’ behaviour. Hence, there is a need to examine the time-frames where there is high traffic in number of orders. Therefore, we have classified the paid orders into the following broad categories: Breakfast, Lunch, Tea Break, Dinner, Chill and Miscellaneous.

Initial Data Exploration and Analysis

We conducted initial data analysis using Exploratory Data Analysis (EDA) to gain general insights about merchants and their order behaviour. Our client’s merchant growth experienced several notable spikes over the past 2 years. [REFER TO WIKI] For example, during the beta phase in February 2014, our client is trying to bring merchants on board before its official launch in July. There is also a noticeable spike in July 2015 which is probably due to our client’s Great Singapore Sale campaign which main focus was user acquisition.

Examining Figure 3, we can see that the cumulative growth in number of merchants have slowly been tapering off. Currently, our client has 835 merchants compared to a total of 6,860 establishments in the food & beverage (F&B) services industry [8]. Our client’s merchant pool only constitutes 12.2% of the industry and this means great room for potential in acquiring more merchants.

KEY CDF.png

Redemption rate is a key factor when it comes to merchant attrition. Merchants generally only earn revenue via upsell when customers come in to redeem their orders. As the nature of the orders are fairly small (e.g. a cup of coffee), the business model stands to guarantee a certain amount of upsell for merchants at a relatively low cost. Hence, many merchants with low redemption rates are likely to drop off the application after awhile due to i) lack of revenue and ii) perceived opportunity cost.

KEY RR.png

There is a wide distribution of redemption rates that is prominently right-skewed. However, in order to analyze the merchant’s performance accurately, a funnel plot will be utilised to capture each merchant’s volume of orders in addition to volume of redemptions. JMP Pro will be used to perform exploratory analysis and funnel plot analysis. SAS JMP Pro is an analytical software that is able to handle large volumes of data efficiently, which is imperative since our client's data is too large to be handled by other software such as Microsoft Excel. Even though the funnel plot is not a built-in feature, Matthew Wolfe [9] has created a plugin add-in that is accessible via JMP’s user community.

Funnel Plot

A dataset that contains occurrences and sample sizes will be used to execute the analysis. After launching the funnel plot plug-in, one will need to assign Sample Size and Occurrences to their respective roles.

KEY FP3.png

The funnel plot control limits are set at 95% and 99.8%. These correspond conceptually to the 2-sigma limits often used in health services research and to the 3-sigma limits commonly used in process control. [10] Outlying merchants outside of the 99.8% limits will be segregated into two groups - star merchants (performing better than expected) and laggard merchants (performing worse than expected). Merchants who are within the control limits follow a stochastic trend and will not be analysed.

Funnel plots calculate the boundaries of expected variation based on the mean and standard error, but the mean as a measure of center and standard errors as measures of confidence around the mean, are statistics that assume a normal (bell-shaped) distribution. [4] The Z-score histogram dictates whether the data points follow a normal distribution. Thus, a normal distribution is needed for a funnel plot analysis.

Case Study on Our Client's Merchant Performance

KEY fp4.png

The graph above shows the proportion of redemption rate out of paid orders in Singapore from 2014 to 2015, mapped by merchants. From the funnel plot analysis, we identified a total of 331 merchants who fell outside the limits of 99.8% confidence.

In order to establish which merchants are star merchants (performing better than expected) and which are laggard merchants (performing worse than expected), we have used SAS JMP’s brush tool to select data points that fall outside the 99.8% limits. Merchants that fall above the upper 99.8% limit are identified as star merchants and those that fall above the lower 99.8% limit are denoted as laggard merchants.

There are 87 merchants (11.3%) who we classified as star merchants and 244 merchants (28.6%) who are laggard merchants. Restating the fact that approximately 40% of merchants lie outside the control limits, there is substantial overdispersion.

Overdispersion is a common phenomenon that occurs with large sample sizes and often occur when control limits are very close to each other. This results in the identification of an abnormally large amount of data points, signaling special cause variation [4]. Thus, we can conclude that the model does not fit well. There may be additional factors that may result in a wide variation between merchants which researchers can seek to identify using exploratory analysis.