Analysis of User and Merchant Dropoff for Sugar App - Findings

From Analytics Practicum
Jump to navigation Jump to search

Home

 

Project Overview

 

Findings

 

Project Documentation

 

Project Management

Mid-Term Finals

Executive Summary

The objective of project is to improve Sugar’s bottom line via Merchant, User and Geospatial Analysis.

User: Top 25% of the users contributes 75.4% of revenue

  • Sugar may want to invest more effort in maintaining relationships with these users and understanding who they are, thereby attracting more of such users.

User: User growth explains 72% of revenue growth after 2 months; branch growth has little effect

  • Between increasing user or branches, Sugar should focus more on users. Time lag should be taken into account when calculating marketing effectiveness.

User: Redemption peaks during tea-break, followed by lunch and dinner

  • Sugar can use this data to improve its product offerings or push notification timings to reach out to more customers at these specific timings

Merchant: Segmentation of merchants into star and laggards.

  • Sugar can make use of this segmentation to focus their efforts on improving merchant traffic flow (and thus building better relationships), by routing orders to laggards, and rewarding star merchant performance

Geospatial: Significantly less number of branches and orders in the West region.

  • These are potential areas of growth for Sugar to expand.

Geospatial: Inverse relationship between distance of branch and redemption rate

  • Sugar can focus on methods to shorten order distance to improve redemption rate, and thus get more repeat customers.

Geospatial: Derived a cannibalisation analysis model using shortest distance analysis

  • Sugar can identify and help stores that are cannibalised by other stores of the same brand; improving overall potential earnings.

Geospatial: Explanatory Model linking user total value to order origin location

  • Sugar can target areas with the highest spenders, namely: Kampong Ubi, Boon Lay Place, Kampong Glam, Marine Parade, Kembangan, Katong, Telok Blangah Drive, Bangkit, Kaki Bukit, Anak Bukit, to maximise and revenue.

Exploratory Analysis

Merchants

Merchant Growth

SugarM1.png

As we look towards Sugar’s merchant acquisition process over the past 2 years, there are several notable spikes.

For example, during the beta phase in February 2014, Sugar is trying to bring merchants on board before its official launch in July. There is also a noticeable spike in July 2015 which is probably due to Sugar’s campaign.

Looking at the CDF plot, we can see that the cumulative growth in number of merchants have slowly been tapering off. Currently, Sugar has 835 merchants compared to a total of 6,860 establishments in the food & beverage (F&B) services industry (Singstat, 2016). Sugar’s merchant pool only constitutes only 12.2% of the industry and this means great room for potential in acquiring more merchants.

Number of Orders per Merchant

SugarM2.png

Looking at the number of orders, the distribution is mainly right-skewed. Around 80% of merchants generate only 200 or less orders over the course of 2 years whereas the top 20% are able to generate up to 2,200 orders in total. However, we should take into account how many total revenue is also generated by these top 20% merchants. Thus, we will be looking more into the funnel plot analysis to identify these star merchants.

User Growth

Number of User Sign ups per Month

U2.png

As shown by the graph, Sugar experienced its highest user growth in Apr 2015. It has not been consistent. Number of new users per month has been decreasing.

User sign-ups grouped by Day of Month

SugarU1.png

Interestingly, the 5th and 6th day of each month has more orders than the rest of the month.

Number of User Sign Ups grouped by Day of Week

SugarU3.png

The Day of Week does not seem to affect the number of user sign ups. There seems to be no patterns within a week even though there is a slight bump on a Saturday.

User Sign Ups grouped by Hour of Day

SugarU4.png

Similar to orders, more users sign up at around lunch time than any other time.

User Profiles

We also separated Users into 3 groups based on their total order value:

  1. Top 25% - We recoded it into 3
  2. Middle 50% - We recoded it into 2
  3. Lower 25% - We recoded into 1

SugarU5.png

As shown the top 25% of the users contributes 75.4% of revenue. The middle 50% contributes only 22.1% of the total revenue and the bottom 25% only contributes to 2.5% of the total revenue.

To explain this phenomenon, we can look at the number of orders for each user.

U6.png

As shown, there are significant difference between the number of orders of each group. The top 25% group has a mean of 15.96 orders each whereas the middle 50% only has a mean of 3.35 orders each. The graph below shows the Maximum order price a user has spent on vs the three groups. As shown above, the mean of the Max order price for the top 25% users is significantly different from compared to the other groups.

SugarU7.png SugarU8.png

Lastly, the graph below shows the differences in the mean order price for each user.

SugarU9.png

As shown, there is a significant difference between the mean order price of the top25% and the middle 50%. The top25% has a mean average order of $5.25 per user whereas the middle 50% has roughly about half that amount.

Therefore, there is a multiplier effect. The top 25% spend more per order and have higher number of orders. This results in the extremely right skewed graph for revenue.

Implication: Sugar should focus on the high value customers if it wants to increase revenue.

Orders

For orders, only orders status = 1,2,4 are analysed. This is because these orders represent Sugar’s revenue. To recap, Order status: 1 - Active orders 2 - Redeemed orders 4 - Unredeemed orders

Orders grouped by Day of Month

SugarO2.png

Every month, there seems to be a small spike on the 7th and a dip on the 31st. Other than there, there seems to be no significant difference.

Orders grouped by Month

SugarO3.png

As shown, orders seem to peak at July and December. However this graph is subject to changes as most of the gifting orders are in Dec, awaiting verification.

Orders grouped by Hour of Day

O4.png

As mentioned above, the 4691 orders at 4am needs to be verified with Sugar. Nonetheless, a clear pattern is shown here: orders peak at 12noon, which is right before lunch time. This may be because users may be deciding where to head for lunch.

Orders grouped by DayofWeek

O5.png

There are no major differences between the days of the week and the number of orders.

Distribution of Prices of Orders

O6.png

It is a very right skewed graph with extreme values. The maximum price of an order is $244 while the median is only $2. As such, we have to zoom in.

After temporarily removing the extreme values, this is the zoomed in graph:

O7.png

This graph shows that most people order not more than $2, as orders sharply decrease from $3. This may be because of the free $2 that was given to the users when they install the app.

Redemptions


Proportion of Order Status

SugarO1.png

As shown, 65.5% of the orders are redeemed, and only 34.2% of the orders are unredeemed.

R3.png

For these redeemed vouchers, there is a higher ratio of normal items to voucher items as compared to non-redeemed vouchers. This insinuates that voucher items are more unlikely to be redeemed as compared to normal items. A possible explanation could be that these vouchers are often contingent on paying for expensive full-priced items.

By decomposition of redemption date into Hour, Day of Week and Month, we have gained a better understanding and identified key insights.

Redemptions grouped by Day of Month

R4.png

In terms of redemptions, there is a relatively consistent trend throughout the month, with a drop on the 31st. This can be attributed to the nature of the Gregorian calendar where several months have only 30 days.

Redemptions grouped by Day of Week

R5.png

In terms of day of week, there is low traffic on Mondays and Tuesdays and an increasing trend throughout the week, peaking on Saturday.

Redemptions grouped by Hour

R6.png

As we can see, there is an influx of redemptions mainly during lunch-time (i.e. 12PM to 2PM) and dinner-time (i.e. 6PM to 8PM).

Redemptions grouped by Month

R7.png

July and August seem to be the most popular months, which could be due to campaigns and events held during this period (e.g. National Day campaign). On the flipside, January and February have significantly much lower traffic in comparison to the other months. This could be explained by the fact that there is much little to no intake in 2014’s first quarter as the app was still in beta-testing.

Time Frame between Creation of Order and Redemption

We were also interested to find out whether users usually paid for an order in advance to redeem it on another day, or to purchase it on the same day itself.

R8.png

It seems that only 38% of redemptions usually take place on the current day itself, with 62% opting to purchase for orders in advance. We decided to investigate further to understand the redemption behaviour for items that are bought in advance.

R9.png

Taking a closer look at normal items, users usually redeem the day after they bought their items or on the last day of their usual seven-day redemption period. For the other items, many of them are registered as redeemed at a later timing due to (i) a need for an appointment or (ii) a voucher with a longer redemption period. As what was observed earlier, approximately 34.2% of the orders are expired.

Funnel Plot Analysis

As mentioned earlier in our objectives, we want to identify star merchants (performing better than expected) and laggard merchants (performing worse than expected). We have decided to use a funnel plot as opposed to tree maps or scatterplot graphs because they are difficult to interpret. For a tree map, the dataset is too big to have a comprehensible result. For a scatterplot graph, it is insufficient because it does not tell us which points are outliers unlike the funnel plot which provides control limits. These control limits informs us that whichever points that fall within the funnel is categorised as stochastic.

We categorise star merchants as having a high redemption rate and also generating high total revenue. Thus, we will be generating 3 funnel plots and inner-joining the merchants within these 3 datasets to identify these star merchants and laggard merchants.

400x

From the funnel plot analysis, we have identified a total of 248 merchants (31.5%) who are star merchants. We also identified a total of 244 merchants (31.0%) who are laggard merchants.

Time-Series Analysis

To do the time series analysis, number of users, number of orders, total revenue and total orders per month is calculated from Feb 2014 to Nov 2015, a total of 22 months of data.

Only orders with order status (1,2,4) are used in the analysis, meaning that these are completed orders representing Sugar’s revenue.

Dec 2015 data was excluded because of the gifting challenge that happened in Dec.

Orders & Revenue

OS1.png

Sugar’s orders and revenue per month has been on a steady increase but it dipped down at November 2015.

As mentioned above, Sugar is in a two-sided market. As such, the increase of users will theoretically make it more attractive for brands to join. The increase in brands should attract more users to download the app. To test this hypothesis, the time series of users and branches are plotted together.

OS2.png

As shown, there does not seem to have a significant pattern between Users and Branch, with a R-square value of 0.10.

Lagging either Users or Branches does not significantly improve the result. As such, it is safe to say that the number of Users and the number of Branches has little relationship with each other.

This could be because most merchants on the Sugar are not famous, and there is significant information asymmetry. Users do not know how many merchants there are on the app, and merchants do not know how many users are near them.

However, the number of User and Revenue per month of Sugar shows an interesting result.

OS3.png

It shows that the number of new Users per month is a significant predictor of revenue, with a R-square value of 0.469.

To see whether User precedes revenue, the user graph is lagged by 2 time periods.

OS4.png OS5.png

As shown by the graph, the R-square value improved from 0.469 to 0.722. The change in R-square values indicates that the latter is a better fit. This implies that revenue generally follows 2 months after an user is acquired.

This shows that the increase of users will likely to increase the revenue of Sugar 2 months later. This implies that, at the moment, Sugar should focus on User Acquisition.

There are some implications of this. It could affect KPI.

Geospatial Analysis

Subzone Planning Area: Orders

By sorting our orders by descending order, we are able to see a very interesting trend.

G1.png

Majority of our orders come from the Central Water Catchment area. The intuitive response is our reservoirs is a huge untapped market and resource, and that we will most likely want to build as many stores there as possible to make use of the orders and demands. However, majority of the orders are most likely due to the big residential areas around the Central Water Catchment area such as Novena, Bishan, Thompson etc; who are just between the boundaries of the planning area and fall within the Central Water Catchment area.

Next, we can also observe that the East area (Bedok, Tampines, Geylang) are responsible for a lot of the orders. This is due to the fact that the East has many town centers and residential areas which may result in proportionately more orders coming from these areas as proportionately more people stay there. However, this does not explain the fact why the West area is significantly underrepresented as people do stay there too. Sugar would probably like to conduct more targeted marketing in these areas so as to tap into potential users there.

Subzone Planning Area: Branches

By sorting our branches by descending order, we are able to see the following.

G2.png

Most of the branches are located in the Central region of Singapore. It is interesting to note that majority of the orders are created in the East, but most of the branches are in the Central region; and very little branches are in the Western or North region.

G3.png

Sugar may want to focus some of their recruitment attempts for merchants in the West and the North in order to maximise coverage.

Planning Area: % of Orders/ $ of Branches

Upon sorting our % of orders/ $ of branches metric by descending order, we realised the limitations of the metrics.

G5.png

Firstly, due to the spike in numbers for the Central Water Catchment area, this has inflated the value of that planning area. Orders straddling boundaries may be counted into other areas, and a black and white boundary system is not sufficient to capture user location. Secondly, this metrics assume that each planning area will cater to all the demands of the consumer, namely that the same goods and services can be procured in the same planning area, as compared to reality where a consumer most likely needs to travel to different areas for different goods and services. As such, we decided to refine our metrics to a shortest distance issue, as compared to a naive comparison of subzones.

Shortest Distance Analysis: Distance vs Redemption

For orders with status 2, these are orders that are redeemed. For orders with status 4, these orders are unredeemed. It is intuitive to suggest that the lack of redemption could be due to distance, plausibly that an order that is further away will tend to not be redeemed.

As such, we will be plotting the status by the distance of the current branch.

G6.png G7.png

By comparing means using a oneway anova, we are able to tell that orders with a status of 4 has a significantly larger distance than orders with status of 2.

From this, we can identify a plausible link between distance and redemption. It is possible that by decreasing the distance, we are able to encourage higher redemption. This could be done by sourcing similar merchants in areas that have little merchant/ brand coverage, or by allocating users to nearer branches.

Shortest Distance Analysis: Cannibalisation: Brand Allocative Efficiency

For merchant/ brand allocative efficiency, we will plot a table of merchants/ brands by same branch? (the column testing if an order goes to the closest branch). If merchants/ brands were wholly efficient, we will expect to see 100% of a branch’s order be 1 as compared to 0.

G8.png

However, as can be seen, all merchants/ brands are not efficient, but some merchants/ brands are more inefficient than others, namely merchants/ brands 930 and 1039. We will thus focus on these two to examine the cause of such an allocative inefficiency.

For the case of 930, we will then plot the current branch against same branch in order to see if there is any cannibalisation.

We can see that branch id 12500 is a major culprit, with all 335 of its sales cannibalised off other branches. In order to see which branch it has cannibalised, we can plot the current branch id (cbranch_id) against the nearest branch id (nbranch_id).

From here, we can see that 12500 cannibalised 239 out of 335 sales off branch 12952. According to the sales data, 12952 has absolutely no sales at all, despite it being the nearest branch to 319 orders. This is largely due to branch 12500 for cannibalising ~75% of its sales.

For the case of 1039, we will execute the same steps and plot the current branch against same branch? in order to see if there is any cannibalisation.

G9.png

We can see a very interesting trend here where every branch except for 12639 has a very high cannibalisation rate. In order to drill down deeper, we can then plot the current branch id (cbranch_id) against the nearest branch id (nbranch_id).

G10.png

This analysis is particularly revealing. Every branch is cannibalising the sales of branch 12639. What it means is that branch 12639 is at an extremely convenient location, where it is the closest to ~73% of the orders (168 out of 229). However, It is serving only ~23% of the orders (39 out of 168), with the rest being taken by the other branches. This could be due to 12639 being overloaded, or not enough people are aware of this branch. It is also worrying that there is so much overlap and costs can be reduced by removing branch 12639 as the other branches seem capable enough to handle the bulk of 12639’s demand.

Using evaluating allocative efficiency, Sugar is able to firstly, gain insights into the possible flaws of their location algorithm, and secondly, provide these insights to merchants, so as to help their brand partners improve their sales and traffic.