Analysis of User and Merchant Dropoff for Sugar App Finals

From Analytics Practicum
Revision as of 20:29, 10 April 2016 by Elizabetht.2012 (talk | contribs)
Jump to navigation Jump to search

Home

 

Project Overview

 

Findings

 

Project Documentation

 

Project Management

Mid-Term Finals
Funnel Plot Analysis Time Series Analysis Geospatial Analysis

Abstract

Our client is a city guide discovery application that brings users and merchants together through geo-located offers. The objective of this research is to examine merchant performance via redemption rates. The results are displayed using funnel plots, a useful tool for displaying unbiased information on performance outcomes when comparing entities within a group. The funnel plot shows a high amount of overdispersion where there is a large number of outlying merchants. By further analyzing under-performing and over-performing merchants separately, the analysis shows that there is also a large variation in outlying redemption rates within each group. To investigate the underlying reasons, we conducted exploratory data analysis. Merchant and product category are shown to be significant contributors to a merchant’s redemption rate. These findings will help our client set benchmarks for individual merchants and develop interventions to help merchants increase their performance.

Introduction

As of 2016, Singapore has the highest smartphone penetration rate in the world at 88%, ahead of other countries such as Hong Kong, Australia and the U.S. [1]. Most smartphones today are able to use GPS track the user’s location, opening up opportunities for location-based marketing.

Our client seeks to take advantage of this by offering local deals tailored to the user’s location using a mobile app. It helps local small businesses such as cafes, restaurants, and small retail shops get discovered and market to users close in proximity. Whenever the user opens the app, they can see a list of discounted items from merchants nearby.

By definition, our client exists as a platform in a two-sided market, enabling two groups of end-users to come “on board” by charging one or both sides [2]. Examples include Alibaba, eBay, Groupon, Uber and many more e-commerce companies. Like many other two-sided markets, it connects users and merchants and earns a premium for connecting these two groups by charging the transactions made between the two groups.

Merchants

Redemption rate will be used as the performance indicator since the only channel for generating revenue is via upsell. This is the main factor of merchant attrition as merchants with little to zero uptake will perceive the application as a waste of time and choose to drop off. Since our client is in a two-sided market, a high attrition rate of merchants may result in users also ceasing usage.

For our client, there is a strong need to retain and attract merchants who are performing well and to improve or perhaps eliminate merchants who are not. Hence, this practical research aims to explore and demonstrate the effectiveness of using funnel plots to analyze merchant performance.

Review of Existing Work

In the field of medical research, ‘league tables’ such as caterpillar plots have been widely utilised to visualize the ranking according to a performance indicator. Figure 1 shows a league table of hospitals based on mortality following a fractured hip [3].

The critical flaw with caterpillar plots as seen in Figure 1 is the inability to take sample size into consideration when ranking results. Inadequate sample sizes with wide ranges of results often make it difficult and unwise to rank results; results that can lead to inaccurate interpretations and false decisions. [4] Thus, there is a strong need to seek out alternative charts that can give a more comprehensive and accurate analysis.

KEY fp1.png

Funnel plots are a form of scatterplot in which observed area rates are plotted against area populations. Control limits, which are computed similarly to confidence limits are then overlaid on the scatter plot. The control limits represent the expected variation in rates assuming that the only source of variation is stochastic. The funnel shape is generated due to the smaller expected variability in larger populations. When many points fall outside the funnel, the plot can be described as “over-dispersed” and it can be said that the process is not in control or the model does not fit the data well.

KEY fp2.png

As seen in Figure 2, it is the same data that is presented in Figure 1 but however, this plots the observed mortality rate against the sample size. While a researcher may be quick to suggest hospitals who lie below the average mortality rate in Figure 1 as ‘under-performing’, the funnel plot in Figure 2 shows that no hospital lies outside the 99.8% limits. Thus, there is no basis for indicating any hospitals who are under-performing or over-performing since they mostly lie within the 95% limits.

Funnel plots has been prevalent in medical research [6,7] but few and between in the area of retail analytics. Its usage in analyzing and interpreting merchant’s redemption behaviour could bring about important business insights that retail businesses could utilise. For example, merchants who have been identified as under-performing can potentially be cut loose from the application. Subsequently, more effort and resources can be invested into retaining outperforming merchants and even recruit merchants of a similar nature in the future.

Methodology

KEY Data.png

Figure 3 shows the data transformation process which we have undertaken for our research study. For this research study, two years’ worth of data from 2014 and 2015 will be utilised.

Firstly, SequelPro was used to extract the data from our client’s database. The database consists of our client’s orders, branch, brand and item data which we will be using in our analysis. Using SAS JMP Pro, these tables were joined together to form a more comprehensive dataset which we will be using for our analyses. Following that, the raw data provided by our client had several errors that needed a high level of data cleansing. This includes cleaning out invalid merchants and orders, correcting geographical coordinates and merchant details. In addition, there were several confounding factors that could affect our analysis such free items that were given out during campaigns.

Next, we filtered for the relevant data which we needed for our analysis (e.g. localized orders and merchants). This is important because the scope of the project has been defined by our client, specific to the Singapore context. In order to add more dimensions to our data for better analysis, we recoded selected columns in our dataset. This will be explained further below in-depth. Lastly, we loaded our data into the JMP server to conduct our funnel plot analysis.

Data preparation albeit being a time-consuming and tedious task, is extremely important. It allows more accurate, easier and better interpretation of the data. High-quality data is essential for having an accurate and reliable analysis of the results. Data cleansing and transformation are two key areas which we have invested a lot of our time and effort into during this research study.

After data cleansing as what was mentioned above, we conducted exploratory analysis. We discovered two main groups of merchants that are prevalent on our client’s app - Lifestyle/Retail merchants and F&B merchants. Both groups exhibit very different user behaviour which will be analysed further.

In addition to reconfiguring our client’s current categorization system which have a large area of overlap between item categories, there is also a need to delve deeper by splitting the item categories into more narrow and specific ones. This way, our client is able to gain a deeper insight of what items are more popular during specific timings of the day.

KEY SCategories.png

The time-stamped data provided will enable us to discover greater insights and knowledge about users’ behaviour. Hence, there is a need to examine the time-frames where there is high traffic in number of orders. Therefore, we have classified the paid orders into the following broad categories: Breakfast, Lunch, Tea Break, Dinner, Chill and Miscellaneous.

Initial Data Exploration and Analysis

We conducted initial data analysis using Exploratory Data Analysis (EDA) to gain general insights about merchants and their order behaviour. Our client’s merchant growth experienced several notable spikes over the past 2 years. [REFER TO WIKI] For example, during the beta phase in February 2014, our client is trying to bring merchants on board before its official launch in July. There is also a noticeable spike in July 2015 which is probably due to our client’s Great Singapore Sale campaign which main focus was user acquisition.

Examining Figure 3, we can see that the cumulative growth in number of merchants have slowly been tapering off. Currently, our client has 835 merchants compared to a total of 6,860 establishments in the food & beverage (F&B) services industry [8]. Our client’s merchant pool only constitutes 12.2% of the industry and this means great room for potential in acquiring more merchants.

KEY CDF.png

Redemption rate is a key factor when it comes to merchant attrition. Merchants generally only earn revenue via upsell when customers come in to redeem their orders. As the nature of the orders are fairly small (e.g. a cup of coffee), the business model stands to guarantee a certain amount of upsell for merchants at a relatively low cost. Hence, many merchants with low redemption rates are likely to drop off the application after awhile due to i) lack of revenue and ii) perceived opportunity cost.

KEY RR.png

There is a wide distribution of redemption rates that is prominently right-skewed. However, in order to analyze the merchant’s performance accurately, a funnel plot will be utilised to capture each merchant’s volume of orders in addition to volume of redemptions.