ANLY482 AY2016-17 T2 Group3: HOME/Interim

From Analytics Practicum
Revision as of 21:19, 22 February 2017 by Sarah.chow.2013 (talk | contribs)
Jump to navigation Jump to search
V Logo.png


HOME   ABOUT US   PROJECT OVERVIEW   PROJECT FINDINGS   PROJECT MANAGEMENT   DOCUMENTATION   ALL PROJECTS



INTERIM PROGRESS

Overview

Project Background & Motivation

Vanitee was officially launched in May 2015, in an attempt to bridge the gap between customers and independent beauty professionals. Typically, beauty professionals that are listed on the platform are emerging and independent beauty artists. To put it simply, they are professionals who want to grow their brand and customer base. By providing such a platform, Vanitee is able to help them showcase what they do best.

However, this does not mean that there are no competitors. Competitors include brick and mortar shops in local neighbourhoods and even bigger beauty brands with chain stores such as Jean Yip Group. Even though these are physical stores, they still pose as a threat as customers can still choose to go to these stores instead of using Vanitee to engage a beauty professional. Hence, Vanitee does not want to stop at just providing a platform for these beauty professionals and for customers to engage them. Furthermore, with an increasing number of professionals and customers coming on board, evaluating their performance so far becomes much more imperative.

Firstly, to further the success of their application, Vanitee has to place emphasis on attracting new customers as well as retaining their existing customers. Many customers might have become dormant after just one booking. Hence, analysis can be done to find out why they have turned dormant and identify possible solutions to attract them to make the next booking.

Secondly, in response to these dormant customers, Vanitee currently has an extensive loyalty program (as shown in Figure 1) in place that offers customers credits, gems as well as campaign codes with every booking made. However, one issue they face is the lack of understanding of how consumers utilize these in-app resources. Also, they wish to understand the effectiveness of such a loyalty program in encouraging customers to make repeated bookings in the future.


V Loyalty Program.png
Figure 1 - Vanitee’s current loyalty program


Project Objectives

Hence, by utilizing the data from their current application’s database, we would wish to discover meaningful and informative insights which will allow Vanitee to better retain their customers and beauty professionals and understand the effectiveness of their current loyalty program. To achieve the above mentioned, we have set the following objectives:
Customers

  • To determine the customer segmentation (different groups of customers) from the current booking patterns. Which customers are stagnant? Which customers are actively using the app?
  • To understand customers’ behaviour. When was the last time a customer used the app? How frequent does a customer use the app? How much does a customer spend on average?
  • To evaluate the effectiveness of using campaign codes to ensure customers repeat their bookings
  • To understand how customers are using credits and gems (refer to Figure 2), whether they are accumulating before use or using them in their next booking
V Process Flow.png
Figure 2 - Vanitee customers’ interaction with the application


  • To determine the Customer Lifetime Value (CLV) by campaign (which promotional campaign drives the highest value customer?) To which campaign do customers react to more? Do customers respond more to campaigns giving discounts in dollar amounts (e.g. $20 off) or to percentage amounts (e.g. 20% off)? Which customers react and respond more to campaigns, credits and gems?
  • Which services generate the most profits?


Beauty Professionals

  • To determine if there is any correlation on what makes beauty professionals more attractive to customers (based on the following hypothesis).
    • Are beauty professionals more attractive if they have a higher chat response rate?
    • Are beauty professionals more attractive if they have a greater variety of services?
    • Are beauty professionals more attractive if they offer less expensive services compared to other professionals?
    • Are beauty professionals more attractive if they offer services on non-working days or hours?


Data Integration and Filtering

Data Collection

To facilitate our data analysis, Vanitee has provided us with access to their current MongoDB database on the cloud. The database contains numerous tables such as customers, beauty professionals, bookings, campaigns etc. Our team has decided to use two full years worth of data which ranges from Jan 2015 to Dec 2016.

Extracted Tables

Different types of data are currently represented by different tables in the database. In this case, there were a total of 59 tables for us to utilize. After exploring each table and its suitability for analysis, we eventually narrowed down to the following 7 tables:

Bookings
A row in this table represents a specific booking of a Customer with a Beauty Professional. The detailed description of the main columns in this table is as follows:
V Bookings Table.png V Bookings Table New Cols.png

Campaigns
A row in this table represents a specific campaign (marketing initiative). The detailed description of the main columns in this table is as follows:
V Campaigns Table.png V Campaigns Table New Cols.png

Categories
A row in this table represents a specific category that can be used to classify services. The detailed description of the main columns in this table is as follows:
V Categories Table.png V Categories Table New Cols.png

Users
A row in this table represents a specific user that has an account on the application (either as a customer or professional). The detailed description of the main columns in this table is as follows:
V Users Table.png V Users Table New Cols.png

Customers
A row in this table represents a specific customer in relation to a specific beauty professional. The detailed description of the main columns in this table is as follows:
V Customers Table.png

Professionals
A row in this table represents a specific beauty professional. The detailed description of the main columns in this table is as follows:
V Professionals Table.png

Services
A row in this table represents a specific service offered by a beauty professional. The detailed description of the main columns in this table is as follows:
V Services Table.png V Services Table New Cols.png

Challenges

As a relatively new startup that is still growing, Vanitee has been doing quite a bit of testing to minimize the number of bugs in their application. However, it has been observed that their testing is not done in a consistent manner which makes it difficult for us to filter out test data accurately. Moreover, some of the tables do not have indicative columns to indicate whether a record is a test data or not. Hence, to overcome this, we have implemented the following measures:

  • Narrowed the initial data range from Aug 2014 - Dec 2016 to Jan 2015 - Dec 2016 as Vanitee had mentioned that larger amounts of testing happened in 2014 when they first officially launched
  • Remove records that have columns that indicate that the record is a test data (e.g. columns such as is_test, test_at, deleted_at)
  • Manually searched for the keyword “test” that appeared in several columns that involve name and description


Another challenge we faced was due to the way Vanitee had structured a few of their tables in the database. For example, the Bookings & Categories table had way more records than expected. The main reason for this was that most of the records were actually related to each other where certain records had a master_id column that referenced another record within the same table. Vanitee’s rationale behind the master_id column was that they wanted to keep track of the changes made to each record but as new records instead of updating the current record. One specific example is that a single booking can have a parent booking (also known as the master booking). The multiple child bookings represent the different states that the booking has undergone. Hence, to overcome this, we have implemented the following measures:

  • For those tables with the column master_id, we created a new column called is_master to indicate whether that record is a master record or not
  • Non-master records are then excluded from analysis and the formulation of graphs


Data Cleaning and Exploration

Issues

Before exploring the data, we faced several issues such as duplication of data, missing values as well as changes in Vanitee’s business model. As these issues may potentially affect the accuracy of our analysis, we have implemented the following measures to overcome these issues prior to performing our analysis.

Duplicate Values

Out of the 7 tables identified above, only the Campaigns table had records with duplicate values. As seen in Figure 3 below, we observed that there were several records that had similar campaign names and almost identical creation date times. However, we realized that only 1 campaign record had campaign codes, while the others did not have. Hence, we made the assumption that only campaigns that had campaign codes were true campaigns that were carried out. Also, to help reduce such duplicate values, we made use of the column is_published to sieve out those campaigns that were actually published over the past 2 years.

V Duplicate Values.png
Figure 3 - Campaign records that have duplicate values

Missing Values

After examining the tables, we realized that different tables had a varying amount of columns with missing values. Firstly, when we looked at the Bookings table, columns that contained monetary values (e.g. final_price, total_price, discount_amount etc.) mostly had missing values as well. Upon further inspection, we deduced that these missing values actually represented the monetary value of 0. Hence we replaced those missing values with the value of 0. Another column that had a huge amount of missing values was price_vanitee_transaction_fee which is the fee that Vanitee profits when an online booking has been successfully checked out. Subsequently, we found out that these missing values were attributed to changes in Vanitee’s business model which we will elaborate further in the next issue.

Another major table that had columns with missing values was the Campaigns table. Columns like start_at and end_at had several missing values that confused us initially. After analyzing Vanitee’s online dashboard, we learnt that campaigns need not have start and end dates to be created. Campaigns that had missing start dates would use the creation date as replacement as clarified with Vanitee. On the other hand, campaigns that had missing end dates would mean that the campaign codes could be redeemed as long as the start date had passed.

Changes in Business Model

After meetings with Vanitee and email correspondence with their developers, we learnt that Vanitee’s current business model was only recently implemented (around Nov to Dec 2016). As seen in Figure 4, there are several changes to their business model. To make matters worse, the Bookings table did not have a column that indicates whether a booking was based on the current or previous business model. In addition, this also partially explains why the column price_vanitee_transaction_fee had missing values (as mentioned above).

V Business Model.png
Figure 4 - Comparison of past and current business models
(*percentages are based on the price of the specific booking)

Since some of our analysis we were about to do involved calculating the profit or Vanitee fee made per booking, we came up with the formula (as shown in Figure 5) below as an alternative way of calculating that value. To put it simply, the profit per booking is the final price that the customer pays (after any discount) minus away the payout that the professional receives, minus away the transaction fee incurred from payment by credit card and finally minus away the cashback that the customer receives as credits. We felt that the benefit of using this formula was that it ignores the Vanitee fee in its calculation and uses the other columns that do not have much missing values instead. Also, it works for any booking regardless of past or current business model.

V Profit Formula.png
Figure 5 - Alternate formula to calculate profit per booking

Findings

Users, Customers & Professionals

1 V Findings Users Type.png
Figure 6 - Users breakdown by type

Description
2 V Findings Customers Age.png
Figure 7 - Customers breakdown by age

Description
3 V Findings Customers Gender.png
Figure 8 - Customers breakdown by gender

Description
4 V Findings Professionals Age.png
Figure 9 - Professionals breakdown by age

Description
5 V Findings Professionals Gender.png
Figure 10 - Professionals breakdown by gender

Description

Bookings

6 V Findings Bookings Type.png
Figure 11 - Bookings breakdown by type

Description
7 V Findings Bookings Status.png
Figure 12 - Bookings breakdown by status

Description
8 V Findings Bookings Frequency.png
Figure 13 - Bookings breakdown by frequency

Description
9 V Findings Bookings by Year.png
Figure 14 - Bookings breakdown by year

Description
10 V Findings Bookings by Month.png
Figure 15 - Bookings breakdown by month

Description
11 V Findings Bookings by Month Year.png
Figure 16 - Bookings breakdown by month & year

Description
12 V Findings Bookings by Day.png
Figure 17 - Bookings breakdown by day

Description
13 V Findings Bookings by Recency 1.png
Figure 18 - Bookings breakdown by recency (initial)

Description
14 V Findings Bookings by Recency 2.png
Figure 19 - Bookings breakdown by recency (final)

Description
15 V Findings Bookings Monetary Value.png
Figure 20 - Bookings breakdown by monetary value (final)

Description
16 V Findings Bookings Duration First Booking.png
Figure 21 - Bookings breakdown by duration from signup to 1st booking (initial)

Description
17 V Findings Bookings Duration First Booking 2.png
Figure 22 - Bookings breakdown by duration from signup to 1st booking (final)

Description
18 V Findings Bookings Service Count.png
Figure 23 - Bookings breakdown by service count

Description
19 V Findings Bookings Category.png
Figure 24 - Bookings breakdown by category

Description
20 V Findings Bookings Campaign Usage.png
Figure 25 - Bookings breakdown by campaign usage

Description
21 V Findings Bookings Credits Usage.png
Figure 26 - Bookings breakdown by credit usage

Description
21 V Findings Bookings Credits Usage.png
Figure 26 - Bookings breakdown by credit usage

Description

Services
Campaigns

Revised Methodology

Cluster Analysis

Survival Analysis

Revised Scope of Work

Project Timeline

Revised Work Plan