Analysis of User and Merchant Dropoff for Sugar App Methodology

From Analytics Practicum
Revision as of 23:23, 28 February 2016 by Yisheng.lim.2012 (talk | contribs)
Jump to navigation Jump to search

Home

 

Project Overview

 

Findings

 

Project Documentation

 

Project Management

Background Data Source Methodology

Tools Used

JMP Pro will be used to perform exploratory analysis, funnel plot analysis and survival analysis. SAS JMP Pro is an analytical software that is able to handle large volumes of data efficiently, which is imperative since Sugar's data is too large to be handled by other software such as Microsoft Excel. Its built-in tools for survival analysis and funnel plot add-in will be extremely useful in our analysis. We are also very familiar with JMP Pro as we have utilised the software for many of our analytical modules such as Analytical Foundation.

In terms of time-series data-mining, we will be using SAS Enterprise Miner as its tool allows us to perform descriptive, predictive and time-series analysis on huge volumes of data.

For geospatial analysis, we have decided to use the QGIS software as it is open source, with a large amount of documentation and plugins available in the market. It is also the preferred software of choice for the Geospatial class in our university, which allowed us to access to more resources, namely the teaching materials, as well as the experience of our fellow university peers.

Methodology

SugarMethodology.jpg

Review of Existing Work

Funnel Plot Analysis

Funnel Plot Analysis

Funnel plots are a form of scatterplot in which observed area rates are plotted against area populations. Control limits, which are computed similarly to confidence limits are then overlaid on the scatter plot. The control limits represent the expected variation in rates assuming that the only source of variation is stochastic. The funnel shape is generated due to the smaller expected variability in larger populations. When many points fall outside the funnel, the plot can be described as “over-dispersed,” and it can be said that the process is not in control or the model does not fit the data well.

The funnel plot is often regarded as a form of control chart. (Wooddall, 2006) Control charts monitor whether a manufacturing or business process is under control. Often, if a funnel plot analysis indicates a stable process with only stochastic variation, the data will be used for prediction and forecasting. This terminology has been adapted to health system performance in various jurisdictions where it is assumed that managers within a health system can exercise control over a health event-related process (Benneyan, 2003).

Our study will be applying the funnel plot analysis to identify outliers who fall outside the control limits that are either outperforming or underperforming.

Time-Series Analysis

Time-Series Analysis

The advancement of technologies have allowed organisations to amass and process large amounts. Much of these data generated by business processes is time-stamped. Most of these data can be interpreted to exhibit seasonality and historical trends. In addition, time-series analysis enables these data to be analysed further using several techniques. Time series data mining methodology allows users to identify commonalities between sets of time-ordered data. This technique is supported by a variety of algorithms, notably dynamic time warping (DTW). (Schubert & Lee, 2011)

One area of time series data mining is pattern detection applied to the time series data directly. Fraud detection, is one example that time-series analysis can be applied to. If there is unusual behaviour detected in the time-based behaviour of a particular customer, detecting the same behaviour in other customers may uncover fraud instances.

We hope to cluster merchants that have similar redemption patterns together so as to optimise Sugar’s push notifications. For example, if a group of merchants (e.g. primarily cafes) have high redemptions during late-afternoon hours, Sugar could hold a campaign specific to only these merchants in this time-frame.

Survival Analysis

Survival Analysis Survival analysis is a statistical technique that analyses the duration to an event. The event can be death, divorce, churn or purchase. It overcomes the limitation of logistic regression because the probabilities may change over time. Furthermore, it can handle censoring, which refers to the incomplete information about participants if they continue beyond the range of the study or drop out from the study.

In business and marketing, survival analysis has been used in many studies including telecommunications, insurance companies and financial services. In telecommunications, Zhang and Chen (2007) applied clustering and survival analysis for each cluster as a way to segment customers into groups. Portela and Menezes (2009) applied survival analysis on the Portugal Telecommunications industry and found that factors affecting churn are related to the amount of purchase with the company. Usage, subscription conditions and satisfaction do not seem to affect churn. Lu (2003) calculated customer lifetime values for a telecommunications company using survival curves.

For the financial services industry, Van den Poel and Lariviere (2003) used survival analysis to create proportional hazard models to predict customer attrition and found that customer alone may not be able to predict attrition and a better model considers the combination of customer and product (Van den Poel & Lariviere, 2004).

However, few studies have tried to apply survival analysis into e-commerce and mobile applications. Moreover, most studies study only a single affected party (one sided markets), whereas Sugar operates in a two-sided market, with two stakeholders, users and merchants, which have a mutual effect on each other. Our study will attempt to apply these well-established techniques and modify them to suit our needs.

Geospatial Analysis

Geospatial Analysis Geospatial analysis is the gathering, display, and manipulation of imagery, GPS, satellite photography and historical data, described explicitly in terms of geographic coordinates or implicitly, in terms of a street address, postal code, or forest stand identifier as they are applied to geographic models. The many applications of geospatial analysis includes customer segmentation, asset management, risk analysis, strategic location determination etc. One area of interest in retail management is that of trade area analysis, which is a subset of geospatial analysis. Retail location is considered to be one of the most important elements in retail marketing strategy, because of its immense advantages at the cost of long-term capital commitment. A good locations can lead to strong competitive advantages, simply because location is one of factor that cannot be easily imitated by competitors, raising barriers to entry (Öner, 2014).

As such, analysis of trade areas and its impact on a company’s revenue is deemed to be extremely important. However, these analysis was usually carried out by gut feeling, before the introduction of spatial modelling in the 80s and 90s, used to analyse the effectiveness of store locations for major grocery retailers in the UK (Wood, 2007). One prevalent method used to analyse trade areas and revenue is the dominant store analysis (Clarke, 2006), where customers will travel to the nearest store within a specific trade area; thus based a store’s location in a trade area can be used to evaluate its effectiveness.

However, most studies examine trade area analysis in terms of competition, but few studies have examined it in terms of cannibalisation. Sugar has multiple merchants that have stores within close proximity of each other. This may create a situation where cannibalisation occurs, which will reduce the effectiveness and revenue stream of certain branches. Our study will attempt to narrow down and analysis the cannibalisation effect based on spatial distribution of stores.