Difference between revisions of "ANLY482 AY2016-17 T2 Group3: PROJECT OVERVIEW/ Methodology"

From Analytics Practicum
Jump to navigation Jump to search
 
(5 intermediate revisions by 2 users not shown)
Line 21: Line 21:
 
| style="border-bottom:4px solid #000000; border-top:5px solid #000000;" width="1%" |  
 
| style="border-bottom:4px solid #000000; border-top:5px solid #000000;" width="1%" |  
 
| style="padding:0.3em; font-size:150%; background-color:#FFFFFF;  border-bottom:4px solid #000000; border-top:5px solid #000000; text-align:center;" width="12%" |  [[ANLY482_AY2016-17_T2_Group3: DOCUMENTATION | <font color="#000000" size=2 face="Open Sans"><b>DOCUMENTATION</b></font>]]
 
| style="padding:0.3em; font-size:150%; background-color:#FFFFFF;  border-bottom:4px solid #000000; border-top:5px solid #000000; text-align:center;" width="12%" |  [[ANLY482_AY2016-17_T2_Group3: DOCUMENTATION | <font color="#000000" size=2 face="Open Sans"><b>DOCUMENTATION</b></font>]]
 +
 +
| style="border-bottom:4px solid #000000; border-top:5px solid #000000;" width="1%" | &nbsp;
 +
| style="padding:0.3em; font-size:150%; background-color:#FFFFFF;  border-bottom:4px solid #000000; border-top:5px solid #000000; text-align:center;" width="12%" |  [[ANLY482 AY2016-17 Term 2 | <font color="#000000" size=2 face="Open Sans"><b>ALL PROJECTS</b></font>]]
 
|}  
 
|}  
 
<!-------------------Header------------------------>
 
<!-------------------Header------------------------>
Line 36: Line 39:
  
 
<!--------------------Content---------------------->
 
<!--------------------Content---------------------->
<div style="background: #EAEAEA; line-height: 0.3em; border-left: #000000 solid 8px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;"><font face ="Open Sans" color= "black" size="2"><b>REVISED METHODOLOGY</b></font></div></div>
+
<div style="background: #EAEAEA; line-height: 0.3em; border-left: #000000 solid 8px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;"><font face ="Open Sans" color= "black" size="2"><b>DATA COLLECTION</b></font></div></div>
 +
<div style="height: 1em"></div>
 +
<div><font face="Open Sans">
 +
We will use the data provided to us by Vanitee which through our access to their MongoDB database on the cloud. In particular, we will target data tables that pertain to customers, beauty professionals, bookings and loyalty programmes.
 +
</font></div>
 +
 
 +
<div style="height: 2em"></div>
 +
 
 +
<div style="background: #EAEAEA; line-height: 0.3em; border-left: #000000 solid 8px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;"><font face ="Open Sans" color= "black" size="2"><b>DATA PREPARATION</b></font></div></div>
 +
<div style="height: 1em"></div>
 +
<div><font face="Open Sans">
 +
As mentioned above, data rows within each data table may differ slightly in the number of columns (attributes) they contain. As such, we will attempt to consolidate the data into suitable and consistent formats to be used for analysis.
 +
 
 +
Additionally, data tables that have relationships with other data tables can be combined into one dataset. Hence, we will attempt to prepare different datasets according to the project objectives.
 +
</font></div>
 +
 
 +
<div style="height: 2em"></div>
 +
 
 +
<div style="background: #EAEAEA; line-height: 0.3em; border-left: #000000 solid 8px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;"><font face ="Open Sans" color= "black" size="2"><b>EXPLORATORY DATA ANALYSIS</b></font></div></div>
 +
<div style="height: 1em"></div>
 +
<div><font face="Open Sans">
 +
We will look into the bookings customers make and also the use of credits and campaign codes when they are making their bookings. From here, we will be able to understand the buying behaviour of customers and analyze the trends in their bookings. Additionally, we will also identify any trends in their usage of gems. As for beauty professionals, we will go into observing the frequency of their bookings, services they put up on the platform as well as their chat responsiveness.
 +
</font></div>
 +
 
 +
<div style="height: 2em"></div>
 +
 
 +
<div style="background: #EAEAEA; line-height: 0.3em; border-left: #000000 solid 8px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;"><font face ="Open Sans" color= "black" size="2"><b>DATA CLEANING</b></font></div></div>
 
<div style="height: 1em"></div>
 
<div style="height: 1em"></div>
 
<div><font face="Open Sans">
 
<div><font face="Open Sans">
Add text here!
+
Missing values and outliers observed during the exploration of data may invite unnecessary inaccuracy and skewness in our analysis. To handle missing values, we will look at the amount of missing values identified and determine if the value should be estimated or simply removing the entire data row. For outliers, we will attempt to analyze why they exist and decide if they are relevant enough to be included in our analysis.
 
</font></div>
 
</font></div>
  
 
<div style="height: 2em"></div>
 
<div style="height: 2em"></div>
  
<div style="background: #EAEAEA; line-height: 0.3em; border-left: #000000 solid 8px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;"><font face ="Open Sans" color= "black" size="2"><b>--- ANALYSIS</b></font></div></div>
+
<div style="background: #EAEAEA; line-height: 0.3em; border-left: #000000 solid 8px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;"><font face ="Open Sans" color= "black" size="2"><b>DATA NORMALISATION & TRANSFORMATION</b></font></div></div>
 
<div style="height: 1em"></div>
 
<div style="height: 1em"></div>
 
<div><font face="Open Sans">
 
<div><font face="Open Sans">
Add text here!
+
As the distribution of values differ amongst different attributes, we will attempt to normalize such attributes before commencing our analysis to prevent these attributes from dominating other attributes. Also, data transformation techniques such as discretization and binarization will be performed to convert the necessary data to categorical and binary form respectively.
 
</font></div>
 
</font></div>
  
 
<div style="height: 2em"></div>
 
<div style="height: 2em"></div>
  
<div style="background: #EAEAEA; line-height: 0.3em; border-left: #000000 solid 8px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;"><font face ="Open Sans" color= "black" size="2"><b>--- ANALYSIS</b></font></div></div>
+
<div style="background: #EAEAEA; line-height: 0.3em; border-left: #000000 solid 8px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;"><font face ="Open Sans" color= "black" size="2"><b>CLUSTER ANALYSIS</b></font></div></div>
 
<div style="height: 1em"></div>
 
<div style="height: 1em"></div>
 
<div><font face="Open Sans">
 
<div><font face="Open Sans">
Add text here!
+
Next, cluster analysis will be carried out to determine the existence of clusters amongst Vanitee’s customers and beauty professionals. We will attempt to identify the profiles of each cluster according to their booking history and examine the reasons affecting the performance of each cluster. Thereafter, we hope to translate the identified clusters into a form of customer segmentation to help Vanitee better understand its customer base.
 
</font></div>
 
</font></div>
  

Latest revision as of 03:55, 22 April 2017

V Logo.png


HOME   ABOUT US   PROJECT OVERVIEW   PROJECT FINDINGS   PROJECT MANAGEMENT   DOCUMENTATION   ALL PROJECTS



DATA COLLECTION

We will use the data provided to us by Vanitee which through our access to their MongoDB database on the cloud. In particular, we will target data tables that pertain to customers, beauty professionals, bookings and loyalty programmes.

DATA PREPARATION

As mentioned above, data rows within each data table may differ slightly in the number of columns (attributes) they contain. As such, we will attempt to consolidate the data into suitable and consistent formats to be used for analysis.

Additionally, data tables that have relationships with other data tables can be combined into one dataset. Hence, we will attempt to prepare different datasets according to the project objectives.

EXPLORATORY DATA ANALYSIS

We will look into the bookings customers make and also the use of credits and campaign codes when they are making their bookings. From here, we will be able to understand the buying behaviour of customers and analyze the trends in their bookings. Additionally, we will also identify any trends in their usage of gems. As for beauty professionals, we will go into observing the frequency of their bookings, services they put up on the platform as well as their chat responsiveness.

DATA CLEANING

Missing values and outliers observed during the exploration of data may invite unnecessary inaccuracy and skewness in our analysis. To handle missing values, we will look at the amount of missing values identified and determine if the value should be estimated or simply removing the entire data row. For outliers, we will attempt to analyze why they exist and decide if they are relevant enough to be included in our analysis.

DATA NORMALISATION & TRANSFORMATION

As the distribution of values differ amongst different attributes, we will attempt to normalize such attributes before commencing our analysis to prevent these attributes from dominating other attributes. Also, data transformation techniques such as discretization and binarization will be performed to convert the necessary data to categorical and binary form respectively.

CLUSTER ANALYSIS

Next, cluster analysis will be carried out to determine the existence of clusters amongst Vanitee’s customers and beauty professionals. We will attempt to identify the profiles of each cluster according to their booking history and examine the reasons affecting the performance of each cluster. Thereafter, we hope to translate the identified clusters into a form of customer segmentation to help Vanitee better understand its customer base.