ANLY482 AY2016-17 T2 Group3: HOME/Interim
HOME | ABOUT US | PROJECT OVERVIEW | PROJECT FINDINGS | PROJECT MANAGEMENT | DOCUMENTATION | ALL PROJECTS |
Contents
Overview
Project Background & Motivation
Vanitee was officially launched in May 2015, in an attempt to bridge the gap between customers and independent beauty professionals. Typically, beauty professionals that are listed on the platform are emerging and independent beauty artists. To put it simply, they are professionals who want to grow their brand and customer base. By providing such a platform, Vanitee is able to help them showcase what they do best.
However, this does not mean that there are no competitors. Competitors include brick and mortar shops in local neighbourhoods and even bigger beauty brands with chain stores such as Jean Yip Group. Even though these are physical stores, they still pose as a threat as customers can still choose to go to these stores instead of using Vanitee to engage a beauty professional. Hence, Vanitee does not want to stop at just providing a platform for these beauty professionals and for customers to engage them. Furthermore, with an increasing number of professionals and customers coming on board, evaluating their performance so far becomes much more imperative.
Firstly, to further the success of their application, Vanitee has to place emphasis on attracting new customers as well as retaining their existing customers. Many customers might have become dormant after just one booking. Hence, analysis can be done to find out why they have turned dormant and identify possible solutions to attract them to make the next booking.
Secondly, in response to these dormant customers, Vanitee currently has an extensive loyalty program (as shown in Figure 1) in place that offers customers credits, gems as well as campaign codes with every booking made. However, one issue they face is the lack of understanding of how consumers utilize these in-app resources. Also, they wish to understand the effectiveness of such a loyalty program in encouraging customers to make repeated bookings in the future.
Project Objectives
Hence, by utilizing the data from their current application’s database, we would wish to discover meaningful and informative insights which will allow Vanitee to better retain their customers and beauty professionals and understand the effectiveness of their current loyalty program. To achieve the above mentioned, we have set the following objectives:
Customers
- To determine the customer segmentation (different groups of customers) from the current booking patterns. Which customers are stagnant? Which customers are actively using the app?
- To understand customers’ behaviour. When was the last time a customer used the app? How frequent does a customer use the app? How much does a customer spend on average?
- To evaluate the effectiveness of using campaign codes to ensure customers repeat their bookings
- To understand how customers are using credits and gems (refer to Figure 2), whether they are accumulating before use or using them in their next booking
- To determine the Customer Lifetime Value (CLV) by campaign (which promotional campaign drives the highest value customer?) To which campaign do customers react to more? Do customers respond more to campaigns giving discounts in dollar amounts (e.g. $20 off) or to percentage amounts (e.g. 20% off)? Which customers react and respond more to campaigns, credits and gems?
- Which services generate the most profits?
Beauty Professionals
- To determine if there is any correlation on what makes beauty professionals more attractive to customers (based on the following hypothesis).
- Are beauty professionals more attractive if they have a higher chat response rate?
- Are beauty professionals more attractive if they have a greater variety of services?
- Are beauty professionals more attractive if they offer less expensive services compared to other professionals?
- Are beauty professionals more attractive if they offer services on non-working days or hours?
Data Integration and Filtering
Data Collection
To facilitate our data analysis, Vanitee has provided us with access to their current MongoDB database on the cloud. The database contains numerous tables such as customers, beauty professionals, bookings, campaigns etc. Our team has decided to use two full years worth of data which ranges from Jan 2015 to Dec 2016.
Extracted Tables
Different types of data are currently represented by different tables in the database. In this case, there were a total of 59 tables for us to utilize. After exploring each table and its suitability for analysis, we eventually narrowed down to the following 7 tables:
Bookings
A row in this table represents a specific booking of a Customer with a Beauty Professional. The detailed description of the main columns in this table is as follows:
Campaigns
A row in this table represents a specific campaign (marketing initiative). The detailed description of the main columns in this table is as follows:
Categories
A row in this table represents a specific category that can be used to classify services. The detailed description of the main columns in this table is as follows:
Users
A row in this table represents a specific user that has an account on the application (either as a customer or professional). The detailed description of the main columns in this table is as follows:
Customers
A row in this table represents a specific customer in relation to a specific beauty professional. The detailed description of the main columns in this table is as follows:
Professionals
A row in this table represents a specific beauty professional. The detailed description of the main columns in this table is as follows:
Services
A row in this table represents a specific service offered by a beauty professional. The detailed description of the main columns in this table is as follows:
Challenges
As a relatively new startup that is still growing, Vanitee has been doing quite a bit of testing to minimize the number of bugs in their application. However, it has been observed that their testing is not done in a consistent manner which makes it difficult for us to filter out test data accurately. Moreover, some of the tables do not have indicative columns to indicate whether a record is a test data or not. Hence, to overcome this, we have implemented the following measures:
- Narrowed the initial data range from Aug 2014 - Dec 2016 to Jan 2015 - Dec 2016 as Vanitee had mentioned that larger amounts of testing happened in 2014 when they first officially launched
- Remove records that have columns that indicate that the record is a test data (e.g. columns such as is_test, test_at, deleted_at)
- Manually searched for the keyword “test” that appeared in several columns that involve name and description
Another challenge we faced was due to the way Vanitee had structured a few of their tables in the database. For example, the Bookings & Categories table had way more records than expected. The main reason for this was that most of the records were actually related to each other where certain records had a master_id column that referenced another record within the same table. Vanitee’s rationale behind the master_id column was that they wanted to keep track of the changes made to each record but as new records instead of updating the current record. One specific example is that a single booking can have a parent booking (also known as the master booking). The multiple child bookings represent the different states that the booking has undergone. Hence, to overcome this, we have implemented the following measures:
- For those tables with the column master_id, we created a new column called is_master to indicate whether that record is a master record or not
- Non-master records are then excluded from analysis and the formulation of graphs
Data Cleaning and Exploration
Issues
Before exploring the data, we faced several issues such as duplication of data, missing values as well as changes in Vanitee’s business model. As these issues may potentially affect the accuracy of our analysis, we have implemented the following measures to overcome these issues prior to performing our analysis.
Duplicate Values
Out of the 7 tables identified above, only the Campaigns table had records with duplicate values. As seen in Figure 3 below, we observed that there were several records that had similar campaign names and almost identical creation date times. However, we realized that only 1 campaign record had campaign codes, while the others did not have. Hence, we made the assumption that only campaigns that had campaign codes were true campaigns that were carried out. Also, to help reduce such duplicate values, we made use of the column is_published to sieve out those campaigns that were actually published over the past 2 years.
Missing Values
After examining the tables, we realized that different tables had a varying amount of columns with missing values. Firstly, when we looked at the Bookings table, columns that contained monetary values (e.g. final_price, total_price, discount_amount etc.) mostly had missing values as well. Upon further inspection, we deduced that these missing values actually represented the monetary value of 0. Hence we replaced those missing values with the value of 0. Another column that had a huge amount of missing values was price_vanitee_transaction_fee which is the fee that Vanitee profits when an online booking has been successfully checked out. Subsequently, we found out that these missing values were attributed to changes in Vanitee’s business model which we will elaborate further in the next issue.
Another major table that had columns with missing values was the Campaigns table. Columns like start_at and end_at had several missing values that confused us initially. After analyzing Vanitee’s online dashboard, we learnt that campaigns need not have start and end dates to be created. Campaigns that had missing start dates would use the creation date as replacement as clarified with Vanitee. On the other hand, campaigns that had missing end dates would mean that the campaign codes could be redeemed as long as the start date had passed.
Changes in Business Model
After meetings with Vanitee and email correspondence with their developers, we learnt that Vanitee’s current business model was only recently implemented (around Nov to Dec 2016). As seen in Figure 4, there are several changes to their business model. To make matters worse, the Bookings table did not have a column that indicates whether a booking was based on the current or previous business model. In addition, this also partially explains why the column price_vanitee_transaction_fee had missing values (as mentioned above).
Since some of our analysis we were about to do involved calculating the profit or Vanitee fee made per booking, we came up with the formula (as shown in Figure 5) below as an alternative way of calculating that value. To put it simply, the profit per booking is the final price that the customer pays (after any discount) minus away the payout that the professional receives, minus away the transaction fee incurred from payment by credit card and finally minus away the cashback that the customer receives as credits. We felt that the benefit of using this formula was that it ignores the Vanitee fee in its calculation and uses the other columns that do not have much missing values instead. Also, it works for any booking regardless of past or current business model.
Findings
Users, Customers & Professionals
Figure 6 - Users breakdown by type |
Description |
Figure 7 - Customers breakdown by age |
Description |
Figure 8 - Customers breakdown by gender |
Description |
Figure 9 - Professionals breakdown by age |
Description |
Figure 10 - Professionals breakdown by gender |
Description |
Bookings
Figure 11 - Bookings breakdown by type |
Description |
Figure 12 - Bookings breakdown by status |
Description |
Figure 13 - Bookings breakdown by frequency |
Description |
Figure 14 - Bookings breakdown by year |
Description |
Figure 15 - Bookings breakdown by month |
Description |
Figure 16 - Bookings breakdown by month & year |
Description |
Figure 17 - Bookings breakdown by day |
Description |
Figure 18 - Bookings breakdown by recency (initial) |
Description |
Figure 19 - Bookings breakdown by recency (final) |
Description |
Figure 20 - Bookings breakdown by monetary value (final) |
Description |
Figure 21 - Bookings breakdown by duration from signup to 1st booking (initial) |
Description |
Figure 22 - Bookings breakdown by duration from signup to 1st booking (final) |
Description |
Figure 23 - Bookings breakdown by service count |
Description |
Figure 24 - Bookings breakdown by category |
Description |
Figure 25 - Bookings breakdown by campaign usage |
Description |
Figure 26 - Bookings breakdown by credit usage |
Description |
Figure 27 - Profit per booking formula comparison |
Description |
Services
Figure 28 - Services breakdown by price |
Description |
Figure 29 - Services breakdown by professional |
Description |
Figure 30 - Services breakdown by category |
Description |
Campaigns
Figure 31 - Campaigns breakdown by duration |
Description |
Figure 32 - Campaigns breakdown by type |
Description |
Figure 33 - Campaigns breakdown by discount type |
Description |
Figure 34 - Campaigns breakdown by discount amount |
Description |
Figure 35 - Campaigns breakdown by usage |
Description |
Figure 36 - Campaigns breakdown by duration from start to 1st booking (initial) |
Description |
Figure 37 - Campaigns breakdown by duration from start to 1st booking (final) |
Description |
Revised Methodology
Cluster Analysis
Survival Analysis
Revised Scope of Work
Project Timeline
Revised Work Plan