Difference between revisions of "ANLY482 AY2017-18T2 Group 25 : Project Overview / Methodology"
Line 38: | Line 38: | ||
==<div style="background: #404040; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#ffffff >Tools Used</font></div>== | ==<div style="background: #404040; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#ffffff >Tools Used</font></div>== | ||
− | |||
+ | [[File:Group25ToolsUsed.png|240px]] | ||
+ | <br> | ||
==<div style="background: #404040; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#ffffff >Methodology</font></div>== | ==<div style="background: #404040; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#ffffff >Methodology</font></div>== | ||
<b>Discovery</b><br/> | <b>Discovery</b><br/> | ||
− | + | ||
+ | Data Preparation and Cleaning | ||
+ | 4 raw datasets (>10GB), consisting of customer reservations and Electronic Direct Mailer (EDM) interactions data for the year of 2017 were used for analysis | ||
+ | Data cleaning was done in Jupyter Notebook due to the size of raw data files received, while Exploratory Data Analysis (EDA) was done using both Jupyter Notebook and JMP | ||
+ | |||
+ | Exploratory Data Analysis (EDA) | ||
+ | Initial data analysis was carried out separately to analyse the individual situation for both customer reservations and EDM, before joining both datasets to to track the conversion rates of EDM to reservations | ||
+ | |||
+ | Text Mining and Logistic Regression Analysis | ||
+ | Text cleaning and analysis were done using JMP’s built-in text explorer | ||
+ | A stepwise Logistic Regression model was used to explain the relationship of the words used with the conversion rate of each campaign |
Revision as of 15:58, 10 April 2018
Description | Methodology |
Data
Data Used (Only title used to maintain confidentiality):
Reservation Information
User Information
EDM Campaigns
User Email Activity
Tools Used
Methodology
Discovery
Data Preparation and Cleaning 4 raw datasets (>10GB), consisting of customer reservations and Electronic Direct Mailer (EDM) interactions data for the year of 2017 were used for analysis Data cleaning was done in Jupyter Notebook due to the size of raw data files received, while Exploratory Data Analysis (EDA) was done using both Jupyter Notebook and JMP
Exploratory Data Analysis (EDA) Initial data analysis was carried out separately to analyse the individual situation for both customer reservations and EDM, before joining both datasets to to track the conversion rates of EDM to reservations
Text Mining and Logistic Regression Analysis Text cleaning and analysis were done using JMP’s built-in text explorer A stepwise Logistic Regression model was used to explain the relationship of the words used with the conversion rate of each campaign