ANLY482 AY2016-17 T2 Group3: PROJECT OVERVIEW/ Methodology

From Analytics Practicum
Revision as of 03:55, 22 April 2017 by Andrew.lim.2013 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
V Logo.png


HOME   ABOUT US   PROJECT OVERVIEW   PROJECT FINDINGS   PROJECT MANAGEMENT   DOCUMENTATION   ALL PROJECTS



DATA COLLECTION

We will use the data provided to us by Vanitee which through our access to their MongoDB database on the cloud. In particular, we will target data tables that pertain to customers, beauty professionals, bookings and loyalty programmes.

DATA PREPARATION

As mentioned above, data rows within each data table may differ slightly in the number of columns (attributes) they contain. As such, we will attempt to consolidate the data into suitable and consistent formats to be used for analysis.

Additionally, data tables that have relationships with other data tables can be combined into one dataset. Hence, we will attempt to prepare different datasets according to the project objectives.

EXPLORATORY DATA ANALYSIS

We will look into the bookings customers make and also the use of credits and campaign codes when they are making their bookings. From here, we will be able to understand the buying behaviour of customers and analyze the trends in their bookings. Additionally, we will also identify any trends in their usage of gems. As for beauty professionals, we will go into observing the frequency of their bookings, services they put up on the platform as well as their chat responsiveness.

DATA CLEANING

Missing values and outliers observed during the exploration of data may invite unnecessary inaccuracy and skewness in our analysis. To handle missing values, we will look at the amount of missing values identified and determine if the value should be estimated or simply removing the entire data row. For outliers, we will attempt to analyze why they exist and decide if they are relevant enough to be included in our analysis.

DATA NORMALISATION & TRANSFORMATION

As the distribution of values differ amongst different attributes, we will attempt to normalize such attributes before commencing our analysis to prevent these attributes from dominating other attributes. Also, data transformation techniques such as discretization and binarization will be performed to convert the necessary data to categorical and binary form respectively.

CLUSTER ANALYSIS

Next, cluster analysis will be carried out to determine the existence of clusters amongst Vanitee’s customers and beauty professionals. We will attempt to identify the profiles of each cluster according to their booking history and examine the reasons affecting the performance of each cluster. Thereafter, we hope to translate the identified clusters into a form of customer segmentation to help Vanitee better understand its customer base.