Analysis of User and Merchant Dropoff for Sugar App - Findings
Mid-Term | Finals |
---|
Contents
Executive Summary
The objective of project is to improve Sugar’s bottom line via Merchant, User and Geospatial Analysis.
User: Top 25% of the users contributes 75.4% of revenue
- Sugar may want to invest more effort in maintaining relationships with these users and understanding who they are, thereby attracting more of such users.
User: User growth explains 72% of revenue growth after 2 months; branch growth has little effect
- Between increasing user or branches, Sugar should focus more on users. Time lag should be taken into account when calculating marketing effectiveness.
User: Redemption peaks during tea-break, followed by lunch and dinner
- Sugar can use this data to improve its product offerings or push notification timings to reach out to more customers at these specific timings
Merchant: Segmentation of merchants into star and laggards.
- Sugar can make use of this segmentation to focus their efforts on improving merchant traffic flow (and thus building better relationships), by routing orders to laggards, and rewarding star merchant performance
Geospatial: Significantly less number of branches and orders in the West region.
- These are potential areas of growth for Sugar to expand.
Geospatial: Inverse relationship between distance of branch and redemption rate
- Sugar can focus on methods to shorten order distance to improve redemption rate, and thus get more repeat customers.
Geospatial: Derived a cannibalisation analysis model using shortest distance analysis
- Sugar can identify and help stores that are cannibalised by other stores of the same brand; improving overall potential earnings.
Geospatial: Explanatory Model linking user total value to order origin location
- Sugar can target areas with the highest spenders, namely: Kampong Ubi, Boon Lay Place, Kampong Glam, Marine Parade, Kembangan, Katong, Telok Blangah Drive, Bangkit, Kaki Bukit, Anak Bukit, to maximise and revenue.
Exploratory Analysis
Merchants |
Merchant Growth
As we look towards Sugar’s merchant acquisition process over the past 2 years, there are several notable spikes.
For example, during the beta phase in February 2014, Sugar is trying to bring merchants on board before its official launch in July. There is also a noticeable spike in July 2015 which is probably due to Sugar’s campaign.
Looking at the CDF plot, we can see that the cumulative growth in number of merchants have slowly been tapering off. Currently, Sugar has 835 merchants compared to a total of 6,860 establishments in the food & beverage (F&B) services industry (Singstat, 2016). Sugar’s merchant pool only constitutes only 12.2% of the industry and this means great room for potential in acquiring more merchants.
Number of Orders per Merchant
Looking at the number of orders, the distribution is mainly right-skewed. Around 80% of merchants generate only 200 or less orders over the course of 2 years whereas the top 20% are able to generate up to 2,200 orders in total. However, we should take into account how many total revenue is also generated by these top 20% merchants. Thus, we will be looking more into the funnel plot analysis to identify these star merchants.
User Growth |
Number of User Sign ups per Month
As shown by the graph, Sugar experienced its highest user growth in Apr 2015. It has not been consistent. Number of new users per month has been decreasing.
User sign-ups grouped by Day of Month
Interestingly, the 5th and 6th day of each month has more orders than the rest of the month.
Number of User Sign Ups grouped by Day of Week
The Day of Week does not seem to affect the number of user sign ups. There seems to be no patterns within a week even though there is a slight bump on a Saturday.
User Sign Ups grouped by Hour of Day
Similar to orders, more users sign up at around lunch time than any other time.
User Profiles
We also separated Users into 3 groups based on their total order value:
- Top 25% - We recoded it into 3
- Middle 50% - We recoded it into 2
- Lower 25% - We recoded into 1
As shown the top 25% of the users contributes 75.4% of revenue. The middle 50% contributes only 22.1% of the total revenue and the bottom 25% only contributes to 2.5% of the total revenue.
To explain this phenomenon, we can look at the number of orders for each user.
As shown, there are significant difference between the number of orders of each group. The top 25% group has a mean of 15.96 orders each whereas the middle 50% only has a mean of 3.35 orders each. The graph below shows the Maximum order price a user has spent on vs the three groups. As shown above, the mean of the Max order price for the top 25% users is significantly different from compared to the other groups.
Lastly, the graph below shows the differences in the mean order price for each user.
As shown, there is a significant difference between the mean order price of the top25% and the middle 50%. The top25% has a mean average order of $5.25 per user whereas the middle 50% has roughly about half that amount.
Therefore, there is a multiplier effect. The top 25% spend more per order and have higher number of orders. This results in the extremely right skewed graph for revenue.
Implication: Sugar should focus on the high value customers if it wants to increase revenue.
Orders |
For orders, only orders status = 1,2,4 are analysed. This is because these orders represent Sugar’s revenue. To recap, Order status: 1 - Active orders 2 - Redeemed orders 4 - Unredeemed orders
Orders grouped by Day of Month
Every month, there seems to be a small spike on the 7th and a dip on the 31st. Other than there, there seems to be no significant difference.
Orders grouped by Month
As shown, orders seem to peak at July and December. However this graph is subject to changes as most of the gifting orders are in Dec, awaiting verification.
Orders grouped by Hour of Day
As mentioned above, the 4691 orders at 4am needs to be verified with Sugar. Nonetheless, a clear pattern is shown here: orders peak at 12noon, which is right before lunch time. This may be because users may be deciding where to head for lunch.
Orders grouped by DayofWeek
There are no major differences between the days of the week and the number of orders.
Distribution of Prices of Orders
It is a very right skewed graph with extreme values. The maximum price of an order is $244 while the median is only $2. As such, we have to zoom in.
After temporarily removing the extreme values, this is the zoomed in graph:
This graph shows that most people order not more than $2, as orders sharply decrease from $3. This may be because of the free $2 that was given to the users when they install the app.
Redemptions |
Proportion of Order Status
As shown, 65.5% of the orders are redeemed, and only 34.2% of the orders are unredeemed.
For these redeemed vouchers, there is a higher ratio of normal items to voucher items as compared to non-redeemed vouchers. This insinuates that voucher items are more unlikely to be redeemed as compared to normal items. A possible explanation could be that these vouchers are often contingent on paying for expensive full-priced items.
By decomposition of redemption date into Hour, Day of Week and Month, we have gained a better understanding and identified key insights.
Redemptions grouped by Day of Month
In terms of redemptions, there is a relatively consistent trend throughout the month, with a drop on the 31st. This can be attributed to the nature of the Gregorian calendar where several months have only 30 days.
Redemptions grouped by Day of Week
In terms of day of week, there is low traffic on Mondays and Tuesdays and an increasing trend throughout the week, peaking on Saturday.
Redemptions grouped by Hour
As we can see, there is an influx of redemptions mainly during lunch-time (i.e. 12PM to 2PM) and dinner-time (i.e. 6PM to 8PM).
Redemptions grouped by Month
July and August seem to be the most popular months, which could be due to campaigns and events held during this period (e.g. National Day campaign). On the flipside, January and February have significantly much lower traffic in comparison to the other months. This could be explained by the fact that there is much little to no intake in 2014’s first quarter as the app was still in beta-testing.
Time Frame between Creation of Order and Redemption
We were also interested to find out whether users usually paid for an order in advance to redeem it on another day, or to purchase it on the same day itself.
It seems that only 38% of redemptions usually take place on the current day itself, with 62% opting to purchase for orders in advance. We decided to investigate further to understand the redemption behaviour for items that are bought in advance.
Taking a closer look at normal items, users usually redeem the day after they bought their items or on the last day of their usual seven-day redemption period. For the other items, many of them are registered as redeemed at a later timing due to (i) a need for an appointment or (ii) a voucher with a longer redemption period. As what was observed earlier, approximately 34.2% of the orders are expired.
Funnel Plot Analysis
As mentioned earlier in our objectives, we want to identify star merchants (performing better than expected) and laggard merchants (performing worse than expected). We have decided to use a funnel plot as opposed to tree maps or scatterplot graphs because they are difficult to interpret. For a tree map, the dataset is too big to have a comprehensible result. For a scatterplot graph, it is insufficient because it does not tell us which points are outliers unlike the funnel plot which provides control limits. These control limits informs us that whichever points that fall within the funnel is categorised as stochastic.
We categorise star merchants as having a high redemption rate and also generating high total revenue. Thus, we will be generating 3 funnel plots and inner-joining the merchants within these 3 datasets to identify these star merchants and laggard merchants.
From the funnel plot analysis, we have identified a total of 248 merchants (31.5%) who are star merchants. We also identified a total of 244 merchants (31.0%) who are laggard merchants.
Time-Series Analysis
To do the time series analysis, number of users, number of orders, total revenue and total orders per month is calculated from Feb 2014 to Nov 2015, a total of 22 months of data.
Only orders with order status (1,2,4) are used in the analysis, meaning that these are completed orders representing Sugar’s revenue.
Dec 2015 data was excluded because of the gifting challenge that happened in Dec.
Orders & Revenue
Sugar’s orders and revenue per month has been on a steady increase but it dipped down at November 2015.
As mentioned above, Sugar is in a two-sided market. As such, the increase of users will theoretically make it more attractive for brands to join. The increase in brands should attract more users to download the app. To test this hypothesis, the time series of users and branches are plotted together.
As shown, there does not seem to have a significant pattern between Users and Branch, with a R-square value of 0.10.
Lagging either Users or Branches does not significantly improve the result. As such, it is safe to say that the number of Users and the number of Branches has little relationship with each other.
This could be because most merchants on the Sugar are not famous, and there is significant information asymmetry. Users do not know how many merchants there are on the app, and merchants do not know how many users are near them.
However, the number of User and Revenue per month of Sugar shows an interesting result.
It shows that the number of new Users per month is a significant predictor of revenue, with a R-square value of 0.469.
To see whether User precedes revenue, the user graph is lagged by 2 time periods.
As shown by the graph, the R-square value improved from 0.469 to 0.722. The change in R-square values indicates that the latter is a better fit. This implies that revenue generally follows 2 months after an user is acquired.
This shows that the increase of users will likely to increase the revenue of Sugar 2 months later. This implies that, at the moment, Sugar should focus on User Acquisition.
There are some implications of this. It could affect KPI.
Geospatial Analysis
Subzone Planning Area: Orders |
By sorting our orders by descending order, we are able to see a very interesting trend.
Majority of our orders come from the Central Water Catchment area. The intuitive response is our reservoirs is a huge untapped market and resource, and that we will most likely want to build as many stores there as possible to make use of the orders and demands. However, majority of the orders are most likely due to the big residential areas around the Central Water Catchment area such as Novena, Bishan, Thompson etc; who are just between the boundaries of the planning area and fall within the Central Water Catchment area.
Next, we can also observe that the East area (Bedok, Tampines, Geylang) are responsible for a lot of the orders. This is due to the fact that the East has many town centers and residential areas which may result in proportionately more orders coming from these areas as proportionately more people stay there. However, this does not explain the fact why the West area is significantly underrepresented as people do stay there too. Sugar would probably like to conduct more targeted marketing in these areas so as to tap into potential users there.
Subzone Planning Area: Branches |
By sorting our branches by descending order, we are able to see the following.
Most of the branches are located in the Central region of Singapore. It is interesting to note that majority of the orders are created in the East, but most of the branches are in the Central region; and very little branches are in the Western or North region.
Sugar may want to focus some of their recruitment attempts for merchants in the West and the North in order to maximise coverage.
Planning Area: % of Orders/ $ of Branches |
Upon sorting our % of orders/ $ of branches metric by descending order, we realised the limitations of the metrics.
Firstly, due to the spike in numbers for the Central Water Catchment area, this has inflated the value of that planning area. Orders straddling boundaries may be counted into other areas, and a black and white boundary system is not sufficient to capture user location. Secondly, this metrics assume that each planning area will cater to all the demands of the consumer, namely that the same goods and services can be procured in the same planning area, as compared to reality where a consumer most likely needs to travel to different areas for different goods and services. As such, we decided to refine our metrics to a shortest distance issue, as compared to a naive comparison of subzones.
Shortest Distance Analysis: Distance vs Redemption |
For orders with status 2, these are orders that are redeemed. For orders with status 4, these orders are unredeemed. It is intuitive to suggest that the lack of redemption could be due to distance, plausibly that an order that is further away will tend to not be redeemed.
As such, we will be plotting the status by the distance of the current branch.
By comparing means using a oneway anova, we are able to tell that orders with a status of 4 has a significantly larger distance than orders with status of 2.
From this, we can identify a plausible link between distance and redemption. It is possible that by decreasing the distance, we are able to encourage higher redemption. This could be done by sourcing similar merchants in areas that have little merchant/ brand coverage, or by allocating users to nearer branches.
Shortest Distance Analysis: Cannibalisation: Brand Allocative Efficiency |
For merchant/ brand allocative efficiency, we will plot a table of merchants/ brands by same branch? (the column testing if an order goes to the closest branch). If merchants/ brands were wholly efficient, we will expect to see 100% of a branch’s order be 1 as compared to 0.
However, as can be seen, all merchants/ brands are not efficient, but some merchants/ brands are more inefficient than others, namely merchants/ brands 930 and 1039. We will thus focus on these two to examine the cause of such an allocative inefficiency.
For the case of 930, we will then plot the current branch against same branch in order to see if there is any cannibalisation.
We can see that branch id 12500 is a major culprit, with all 335 of its sales cannibalised off other branches. In order to see which branch it has cannibalised, we can plot the current branch id (cbranch_id) against the nearest branch id (nbranch_id).
From here, we can see that 12500 cannibalised 239 out of 335 sales off branch 12952. According to the sales data, 12952 has absolutely no sales at all, despite it being the nearest branch to 319 orders. This is largely due to branch 12500 for cannibalising ~75% of its sales.
For the case of 1039, we will execute the same steps and plot the current branch against same branch? in order to see if there is any cannibalisation.
We can see a very interesting trend here where every branch except for 12639 has a very high cannibalisation rate. In order to drill down deeper, we can then plot the current branch id (cbranch_id) against the nearest branch id (nbranch_id).
This analysis is particularly revealing. Every branch is cannibalising the sales of branch 12639. What it means is that branch 12639 is at an extremely convenient location, where it is the closest to ~73% of the orders (168 out of 229). However, It is serving only ~23% of the orders (39 out of 168), with the rest being taken by the other branches. This could be due to 12639 being overloaded, or not enough people are aware of this branch. It is also worrying that there is so much overlap and costs can be reduced by removing branch 12639 as the other branches seem capable enough to handle the bulk of 12639’s demand.
Using evaluating allocative efficiency, Sugar is able to firstly, gain insights into the possible flaws of their location algorithm, and secondly, provide these insights to merchants, so as to help their brand partners improve their sales and traffic.