Difference between revisions of "AY1718 T2 Group21 Midterm Findings"

From Analytics Practicum
Jump to navigation Jump to search
Line 97: Line 97:
 
<br>
 
<br>
 
Knowing information on products was key to helping us with the other data categories as it allowed us to understand the available products better.
 
Knowing information on products was key to helping us with the other data categories as it allowed us to understand the available products better.
<p style="margin-left: 40px">
+
<p style="margin-left: 40px"><b>a.Top Products:</b>
 
 
  <b>a.Top Products:</b>
 
  
 
[[File:AY1718 T2 Group21 ProductRevenue&QuantitySold1 Tableau.PNG | 800px | center ]]
 
[[File:AY1718 T2 Group21 ProductRevenue&QuantitySold1 Tableau.PNG | 800px | center ]]
Line 106: Line 104:
 
One of the first exploratory steps for products were to observe the performance of products by looking at their Revenue and Sales Quantity figures, as shown in Figure 3. In Figure 3, the size of the boxes represents the quantity of each product sold, whereas the colour represents the sum of the revenue- the darkest colour being the highest-revenue product.
 
One of the first exploratory steps for products were to observe the performance of products by looking at their Revenue and Sales Quantity figures, as shown in Figure 3. In Figure 3, the size of the boxes represents the quantity of each product sold, whereas the colour represents the sum of the revenue- the darkest colour being the highest-revenue product.
 
After combining the two figures to one graph- we made two observations. Firstly that the distribution of quantity of products sold was very wide- the average quantity sold per product was 12.34 with standard deviation of 17.31. Another observation was that the two highest total revenue products were very much higher than the rest of the product’s revenue. The average product revenue was 16, 226 with a standard deviation of 29, 115.
 
After combining the two figures to one graph- we made two observations. Firstly that the distribution of quantity of products sold was very wide- the average quantity sold per product was 12.34 with standard deviation of 17.31. Another observation was that the two highest total revenue products were very much higher than the rest of the product’s revenue. The average product revenue was 16, 226 with a standard deviation of 29, 115.
</br>
 
  
 
   <b>b.Categorising the products into price bins:</b>
 
   <b>b.Categorising the products into price bins:</b>
Line 146: Line 143:
 
<strong>Session Medium</strong>
 
<strong>Session Medium</strong>
 
[[File:AY1718 T2 Group21 NumberofItemsSoldandTotalRevenuebySSessionMedium.png | 500px | center ]]
 
[[File:AY1718 T2 Group21 NumberofItemsSoldandTotalRevenuebySSessionMedium.png | 500px | center ]]
 
  
  
Line 153: Line 149:
 
[[File:AY1718 T2 Group21 PurchasePriceBinbyMedium.png| 500px | center ]]
 
[[File:AY1718 T2 Group21 PurchasePriceBinbyMedium.png| 500px | center ]]
  
[[File:AY1718 T2 Group21 PurchasePriceBinbyMedium.png| 500px | center ]]
+
-Continue with seasonality-
[[File:AY1718 T2 Group21 PurchasePriceBinbyMedium.png| 500px | center ]]
 
[[File:AY1718 T2 Group21 PurchasePriceBinbyMedium.png| 500px | center ]]
 
 
 
 
 
 
 
 
 
 
 
  
  

Revision as of 23:00, 25 February 2018

AY1718 T2 Group21 Logo.png

HOME

ABOUT US

PROJECT OVERVIEW

FINDINGS

DOCUMENTATION

PROJECT MANAGEMENT

BACK TO PROJECTS

Midterm

Final



Executive Summary


Problem Summary: Brainsmith, an e-commerce company that sells children educational products has been operating for over two years but their website conversion rates have been lower than industry average. Using customer behaviour patterns and purchase data - we hope to help identify website traffic patterns in order to identify possible methods to help the company increase their conversion rates


Definitions:

  • User: Every person who has every accessed the site
  • Customer: A website user that has made at least 1 purchases
  • User: A website user that has not yet made any purchases


Conversion Rates:
We segmented conversion into two approaches:

1. Customer Retention With customer behaviour data and information such as website pages clicked before purchasing and number of user sessions before purchase- we hope to identify factors that correlate with
2. Customer Acquisition With information on both Customers and Users, we hope to find correlations between the two sets of data.


Data Processing


Data Cleaning:
The data cleaning process was two fold:

1. Rechecking for human error: Matching of all corresponding web behaviour with the customers - pages visited and actions taken on the website, since the variables and data set were defined through human web-crawling and manual entry
2. Adapting and creating some sub-data files: This was done for ease of access to load onto R and briefly for Tableau and to de-aggregate our data, keep it succinct, useful and effective We recoded columns in our data, using R, as per our statistics analysis required.


Using preliminary visualisations, We clean these observation this out of our analysis, so as to avoid bias and skew.


Exploration and Visualisation

Data Exploration Methodology:
Most of our exploratory research and insight derivation has been through a trial basis by loading our relevant data onto Tableau. We looked at scatter plots, box plots, histograms and bar charts with varying degrees of complexity depending on the number of variables involved and made sense from a business perspective. Keeping in mind our business objectives, and the emphasis laid on different factors by our client, we focused our attention on certain key variable that we are going to be discussing.

AY1718 T2 Group21 1b DataCleaning.png


Initially, when analysing basic level data variables, for example the Average Session Duration on users on the website, as well as the Total No. of Page Views per customer, we found anomalies in terms of outliers, like these ones. These could be the founders and managers of the company in-charge of the website themselves, or teams like us, working in tandem projects with them.


The first part of our analysis will deal with mostly factors that will be useful to design policy to retain our existing cutomers.
Product Sales and Revenue
The company has a total of 112 products. In order to obtain Product-related data. We had to zoom in on the Customer data set to obtain numbers such as Total Quantity of Each Product Sold, the Average Price of each product (Since the product cost differed at different times).
Knowing information on products was key to helping us with the other data categories as it allowed us to understand the available products better.

a.Top Products:

AY1718 T2 Group21 ProductRevenue&QuantitySold1 Tableau.PNG
Figure 3: Each product’s total revenue and quantity sold. Yellow line show percentage of Revenue. Bar chart shows percentage revenue. Colour gradient shows average price of product where darkest shade is the highest average price


One of the first exploratory steps for products were to observe the performance of products by looking at their Revenue and Sales Quantity figures, as shown in Figure 3. In Figure 3, the size of the boxes represents the quantity of each product sold, whereas the colour represents the sum of the revenue- the darkest colour being the highest-revenue product. After combining the two figures to one graph- we made two observations. Firstly that the distribution of quantity of products sold was very wide- the average quantity sold per product was 12.34 with standard deviation of 17.31. Another observation was that the two highest total revenue products were very much higher than the rest of the product’s revenue. The average product revenue was 16, 226 with a standard deviation of 29, 115.

 b.Categorising the products into price bins:


With the knowledge that there was a large number of products available and there was a large variation in both revenue and quantity sold per product. We decided to bin the products and explored the process with several methods.
Bin by product category There were three main categories for all the products, namely:

  • Combo Packs: Products that consisted of bundles of the other products
  • Quantum Cards: Educational card sets
  • Wooden Toys: Toys and objects such as play tables and sorting blocks that were made of wood
AY1718 T2 Group21 AveragePriceDispersionbyCategoryBins.png
Fig. 4: The average price dispersion by category bins


From Figure 4, we observed the distribution of average prices for each bin and noted that for “Combo Packs” there were two price outliers that belonged to two particularly pricier products. The range of prices per product category also showed large variation. Despite the large variation in average prices within each category- we felt that categorising the products into product type was still a necessary method for future analysis since product categorisation identifies the product by function.


AY1718 T2 Group21 SaleQuantityandRevenuebyProductCategory.png
Fig. 5: Percentage sale quantity and revenue for each product category


From Figure 5, we observed that Quantum cards provided the total largest sale quantity and revenue. However for the same amount of revenue, quantum cards need about 6 times more sale quantity than Combo Packs.

Bin by average prices
We attempted to have two sets of bins with different number of categories to access which could better represent the subsequent data better: By using finding the respective prices at each percentile- we categorised all the products into their respective bins.

AY1718 T2 Group21 PriceBinAvgPrice.png
Fig. 6: Average Price Dispersion by Price Bins


AY1718 T2 Group21 PriceBinRevenue&Quantity.png
Fig. 7: Percent Revenue and Quantity Sold by Price Bin


After exploring the two binning methods, we decided that for future analysis we would base our analysis on the 5-category binning as this gives us more of an insight into the purchase patterns and product importances. There is a massive tendency for the majority of the revenue to be gained from a small minority of the items for small e-commerce businesses like this – and we observe this to be quite true in this case, as a significant majority of around 43% comes from only 20% of the product line – and only 12% of the purchases. Thus this analysis categorisation will provide us ways to better observe the purchase patterns of the more fringe product lines, and give us a better depth of understanding than the 3-category binning.

Session Medium

AY1718 T2 Group21 NumberofItemsSoldandTotalRevenuebySSessionMedium.png


AY1718 T2 Group21 TotalRevenue&AvgRevenuebyMedium.png
AY1718 T2 Group21 PurchasePriceBinbyMedium.png

-Continue with seasonality-