Difference between revisions of "AY1516 T2 Group 18 Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 14: Line 14:
 
[[AY1516 T2 Group 18 Data |
 
[[AY1516 T2 Group 18 Data |
 
<font color="#000000" size=2><b>DATA</b></font>]]
 
<font color="#000000" size=2><b>DATA</b></font>]]
 +
 +
| style="background:none;" width="1%" | &nbsp;
 +
| style="padding:0.3em; font-size:100%; background-color:#F5F5F5; text-align:center; color:#F5F5F5" width="10%" |
 +
[[AY1516 T2 Group 18 Project Findings |
 +
<font color="#000000" size=2><b>PROJECT FINDINGS</b></font>]]
  
 
| style="background:none;" width="1%" | &nbsp;
 
| style="background:none;" width="1%" | &nbsp;
 
| style="padding:0.3em; font-size:100%; background-color:#F5F5F5; text-align:center; color:#F5F5F5" width="10%" |  
 
| style="padding:0.3em; font-size:100%; background-color:#F5F5F5; text-align:center; color:#F5F5F5" width="10%" |  
 
[[AY1516 T2 Group 18 Project Management |
 
[[AY1516 T2 Group 18 Project Management |
<font color="#000000" size=2><b>PROJECT MANAGEMENT</b></font>]]
+
<font color="#000000" size=2><b>SCHEDULE</b></font>]]
  
 
| style="background:none;" width="1%" | &nbsp;
 
| style="background:none;" width="1%" | &nbsp;
Line 36: Line 41:
  
 
<div align="left">
 
<div align="left">
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Introduction and Project Background</font></div>==
+
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Introduction</font></div>==
  
Understanding your target audience remains at the heart of successful marketing. The Connected Life is TNS's global syndicated study to understand connected consumer better. It is the largest and most comprehensive study of digital behavior of global consumers across the world. <br><br>
+
Syndicated market research studies that aim to target clients from multiple industries prove to be a good source of revenue for market research companies (insert source). However, research studies of this nature typically contain long survey questionnaires since it consists of questions catered for multiple industries. As such, it typically takes a respondent an average of 30 minutes to complete. Under such circumstances, the following will happen: firstly, obtained responses tend to be suboptimal because long questionnaires often put a strain on respondents and tire them out. This leads to a decrease in response rates and quality of responses; and secondly, because of the large number of survey questions (and hence, many resulting variables), an increase in monetary incentive is needed to incentivise respondents to complete the entire survey. Should the survey be shorter, the added incentive can instead be used to gather more respondents to improve their results. Hence, there is a need for market research companies to look for ways to shorten their surveys in order to uphold the accuracy of their results.<br><br>
The need for the study includes the following: <br>
 
a) There was a gap in the market as no one was offering such comprehensive information about digital consumers <br>
 
b) It was cost prohibitive for one client to undertake such a global venture and hence, clients were only doing these studies selectively and where budgets allowed <br>
 
c) Other studies which also claim to have such a global footprint were either by publishers themselves or by media agencies, thus clients are apprehensive that the analysis offered by them is biased and hence an independent study like Connected Life has great appeal.
 
  
<div align="left">
+
In this report, we aim to build an effective explanatory model that will help to reduce the number of variables needed for a market research study. By identifying pertinent variables and omitting variables that do not add value to the study results, we will be able to effectively reduce the number of survey questions in a study and reduce strain on survey respondents, provided that the behavior and demographics captured of the consumers in the industry remain the same in future studies.<br><br>
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Project Motivation</font></div>==
 
  
With the advent of the internet and digital devices over the past decade, it has become increasingly complex to understand and influence the choices of consumers. The media landscape has been shifting and traditional marketing approaches no longer work as well today. Many companies now rely on digital marketing to reach out to consumers, where digital media growth have been estimated at 4.5 trillion online advertisements served annually with digital media spend to be at 48% in 2010. As such, it brings forth the need to formulate new marketing approaches in connecting today’s consumers with companies. TNS hopes to generate actionable insights from the dataset of the Connected Life study that will help marketers come up successful marketing strategies to aid business decisions.<br><br>
+
As our obtained dataset consists of questions catered for numerous different industries, we will be focusing our efforts on the Personal Care industry. Personal Care products include facial care products, cosmetics, perfume or cologne, skin care products, and hair care products. Our objective would be to identify the significant factors (comprises of social demographic and economic profile, devices, digital media platforms, and online behavior in terms of time spent, frequency, and part of day for devices and activities engagement of Personal Care consumers) that would allow us to quantify consumers’ behavior with their purchase pattern outcome of buying Personal Care products.<br><br>
  
As the study covers across 50 countries and over 58 product categories, we will be delving down and focusing on Singapore and Malaysia markets, and the Fast-Moving Consumer Goods (FMCG) sector for the purpose of our analysis.
+
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Study Context</font></div>==
  
<div align="left">
+
This report employs the dataset from a 2015 syndicated research study called Connected Life, conducted by Taylor Nelson Sofres (TNS) Singapore, a market research company under the WPP group. This study aims to identify the target consumer profiles, devices, and digital media platforms that today’s connected consumers engage in, so as to allow businesses from different industries to formulate more targeted marketing strategies to help them maximize the return on investment on their business decisions. Thus, the survey questionnaire is crafted in a way that would cover questions catered for a net of multiple different industries, including Personal Care, Airline, Mobile, etc. As a result, questions were crafted such that they were mostly general questions that cover the industry view. However, based on the results of the study, specific parts of the results could be taken out for further analysis for interested companies. <br>
  
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Project Objectives</font></div>==
+
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Project Methodology</font></div>==
  
The aim of this project is to help marketers from the FMCG industry identify target consumer profiles, digital media platforms, and as well as devices to allow for more targeted marketing strategies, thus maximizing return on investment (ROI) on their business decisions.<br><br>
+
See <b><u>[[AY1516 T2 Group 18 Data|here]]</u></b> for more information about our data<br><br>
  
As a market research company, TNS would need to present its findings and insights from the study to senior management and marketers from various companies. As such, our aim is to provide an interactive and dynamic dashboard that will instinctively display crucial actionable insights to marketers without the need for them to be equipped with technical knowledge, nor spending time delving into the data themselves.<br><br>
+
<b>&nbsp;&nbsp;&nbsp;&nbsp;Modelling Process:</b><br>
 +
[[Image: Modelling process.png|180px|link=]]<br><br>
  
The final deliverables will therefore aim to:<br>
+
The figure above illustrates the explanatory modelling process used for our analysis. The full list of data preparation procedures have been listed in the following section. After data preparation, we proceeded with the exploratory data analysis (EDA) to help us understand more about the data. During this process, we often find ourselves iterating back to the data preparation stage upon observing the distributions of some of the variables. <br><br>
*Identify FMCG target consumer profiles, digital media platforms they should use, and the type of device(s) used to best engage and connect with target consumers
 
*Allow end users to visualize data findings and generate actionable insights through the means of an interactive and dynamic dashboard
 
  
<br>
+
Similarly during the model fitting stage, we find ourselves iterating through the model fitting and evaluation stage as we calibrate the model for optimal results. We evaluate and assess the performance of the models with several statistics such as Whole Model Test, Assessing Individual Parameters, Receiver Operating Characteristic (ROC) Curve, Fit Statistics, Misclassification Rate and Confusion Matrix.<br><br>
 
 
In order to achieve these objectives, more granular questions must be answered through the course of our analysis:<br>
 
*Who are our target consumers?
 
*What are the digital media platforms and devices that allow marketers to get to my target consumers and connect with them?
 
*How do marketers improve their touchpoint planning?
 
*What are the digital media platforms and content that needs to be prioritized in order to drive engagement and advocacy amongst the target consumers?
 
*After engagement is done, how do marketers influence the mindsets and decisions of the connected consumer?
 
*How are marketers going to improve their company’s performance in order to enhance the connected customers’ satisfaction?
 
  
 +
Furthermore, after fitting and evaluating the model, we discover ways that we could improve our analysis. This brings us back to data preparation stage as we reorganize the data, followed by another round of EDA, model fitting and evaluation. Finally, we assess the models created and recommend list of actionable improvements to the marketers and market research firms. <br>
 
<br>
 
<br>
  
 
<div align="left">
 
<div align="left">
 +
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Analytical Tools</font></div>==
  
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Project Scope</font></div>==
+
<table border="1">
 
+
<tr>
The project scope can be split into 4 main areas. They are the Overview, Digital Media Platforms, Devices and Consumers interactive visualizations. The overview visualization is a compilation of key indicators to support operational decision-making by senior management. The other three areas are visualizations that allow marketers to work around the selections and graphics to gain deeper understanding and insights about the data.<br><br>
+
<td><center><b>JMP</b></center></td>
 
+
<td><center><b>SAS EM</b></center></td>
In order to visualize the data, the data should be cleaned in various ways such that allows meaningful visualization to take place. Fields that are irrelevant or redundant are removed or replaced with human readable detail for ease of visualization. For example, fields that are empty.<br><br>
+
</tr>
 
+
<tr>
The 4 visualizations cannot be done without first understanding what data is available for each of the visualization. After consolidating the data, prototyping of the visualization is important as it give us an idea what are we trying to achieve and whether our visualizations actually help achieve the intended goals. <br><br>
+
<td>Capable to write scripts using JMP Scripting Language to customize analysis and generating reports</td>
 
+
<td>Capable to write scripts using SAS Language to customize analysis and generating reports</td>
Implementation of the actual visualization shall begin after the prototype has been decided.
+
</tr>
 
+
<tr>
 
+
<td>JMP holds data in RAM. It cannot handle data sets as large as can be handled by SAS. However, with less data it works faster due to memory processing</td>
<div align="left">
+
<td>Can process data on secondary storage instead of RAM thus able to process huge amount of data or more data than the RAM can hold</td>
 
+
</tr>
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Project Methodology</font></div>==
+
<tr>
 
+
<td>Cheaper (about $5k for the first year license)</td>
See [[AY1516 T2 Group 18 Data|here]] for more information about our data<br><br>
+
<td>Expensive (over $100k for the first year license)</td>
 
+
</tr>
<b>The technologies to be used in the project are:</b><br>
+
<tr>
1. D3.js, a Javascript library that provides numerous functions to manipulate data and drawing of graphics.<br>
+
<td>Reporting tool built-in with JMP that provides general-use reporting capabilities</td>
2. JMP Pro, a statistical data discovery tool. We use this tool in data cleaning and preparation.<br><br>
+
<td>Powerful reporting tool with its Business Intelligence and Analytics software that allows very detailed customization of reports</td>
 
+
</tr>
<b>Visualizations</b><br>
+
<tr>
1. Overview<br>
+
<td>JMP does not provide a workflow or history of analysis to keep track of progress.</td>
&nbsp;&nbsp;&nbsp;&nbsp;a) This visualization aim to provide senior management the summary of key indicators for operational decision support. <br>
+
<td>Organizes analysis into projects and diagrams with process flow diagrams thus able to track analysis procedure</td>
&nbsp;&nbsp;&nbsp;&nbsp;b) Key indicators includes<br>
+
</tr>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;i. Digital Media Platforms<br>
+
<tr>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1. Percentage of consumers engaged in the various digital media platforms<br>
+
<td>JMP provides a very interactive GUI that allows users to do exploratory data analysis and try out various analytical methods  easily and quickly</td>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2. The periods of the day consumers are most actively engaged in digital media such as In bed after waking up or early/late morning<br>
+
<td>Provides a server version for ease of collaboration on data cleansing, integration, security and access</td>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3. The platforms in which certain categories of products are most engaged by consumers such as mobile application for emails and purchasing of movie tickets<br>
+
</tr>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ii. Devices<br>
+
</table>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1. Percentage of consumers engaged through the various devices<br>
+
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2. The periods of the day consumers are most actively engaged in various devices<br>
+
The following are the consideration for choosing JMP as our tool of choice:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3. The various services engaged by consumers with a certain device such as the laptop/tablet for watching movie and purchasing of apparels<br>
+
1.     JMP is easier to learn as we had some experience in JMP. It is also easier to explore and manipulate the data with its GUI. This reduces the amount of time and effort for us to learn a new tool while allowing us to enhance our knowledge of JMP<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;iii. Consumers Profiles<br>
+
2.     Both tools have the statistical methods we expect to need for the project although SAS provides more options as compared to JMP. JMP has the decision tree, bootstrap forest, boosted forest and K nearest neighbour, which we expect to be sufficient for our project<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1. The common characteristics of consumers that are actively participating in eCommerce such as having both internet connection at home with laptop/tablet and smartphone<br>
+
3.   Since we do not have huge amount of data that exceed the capacity that our RAM can hold, we do not require the capability of accessing secondary storage to process our data. Instead we do benefit from the relatively small data set that can be process by the RAM of our laptops which give a faster processing speed<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2. The category of services that the various groups of consumers engaged in such as consumers age between 16 - 24 are more active in the area of movies and TV<br><br>
+
4.   Although both the JMP and SAS Enterprise Miner are accessible to us and both provide the capabilities for our project, we decided to use JMP due to the reasons mentioned above.
[[Image: Visualization1.png|400px|link=]]<br><br>
 
2. Digital Media Platforms<br>
 
&nbsp;&nbsp;&nbsp;&nbsp;a) This visualization aim to allow marketers to gain insights from the manipulation of the data and various graphical representations regarding activity of the digital media platforms<br><br>
 
[[Image: Visualization2.png|600px|link=]]<br><br>
 
[[Image: Visualization3.png|350px|link=]]<br><br>
 
3. Devices<br>
 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a. This visualization aim to allow marketers to gain insights from the manipulation of the data and various graphical representation regarding the various devices engaged by the consumer<br><br>
 
[[Image: Visualization4.png|450px|link=]]<br><br>
 
[[Image: Visualization5.png|450px|link=]]<br><br>
 
4. Consumer Profiles<br>
 
&nbsp;&nbsp;&nbsp;&nbsp;a. This visualization aim to allow marketers to gain insights from the manipulation of the data and various graphical representation regarding the profiles of consumers<br><br>
 
[[Image: Visualization6.png|400px|link=]]
 
<br><br>
 
<div align="left">
 
 
 
==<div style="background:#ff4fa7; padding: 10px; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #D3D3D3 solid 25px;"><font color="white">Project Limitations & Asssumptions</font></div>==
 
 
 
*Datasets given by the sponsor are only based on 2 markets - Singapore and Malaysia, and thus the analysis cannot be a representative for other markets.  
 
 
 
 
<div align="left"> <!-- END CHUNK-->
 
<div align="left"> <!-- END CHUNK-->

Latest revision as of 19:39, 11 April 2016

HOME

 

PROJECT OVERVIEW

 

DATA

 

PROJECT FINDINGS

 

SCHEDULE

 

DOCUMENTATION


Taylor Nelson Sofres (TNS) is one of the largest research agencies worldwide. They provide actionable insights to help companies make impactful decisions that drive growth. TNS is part of Kantar, one of the world's largest insight, information and consultancy group.

Introduction

Syndicated market research studies that aim to target clients from multiple industries prove to be a good source of revenue for market research companies (insert source). However, research studies of this nature typically contain long survey questionnaires since it consists of questions catered for multiple industries. As such, it typically takes a respondent an average of 30 minutes to complete. Under such circumstances, the following will happen: firstly, obtained responses tend to be suboptimal because long questionnaires often put a strain on respondents and tire them out. This leads to a decrease in response rates and quality of responses; and secondly, because of the large number of survey questions (and hence, many resulting variables), an increase in monetary incentive is needed to incentivise respondents to complete the entire survey. Should the survey be shorter, the added incentive can instead be used to gather more respondents to improve their results. Hence, there is a need for market research companies to look for ways to shorten their surveys in order to uphold the accuracy of their results.

In this report, we aim to build an effective explanatory model that will help to reduce the number of variables needed for a market research study. By identifying pertinent variables and omitting variables that do not add value to the study results, we will be able to effectively reduce the number of survey questions in a study and reduce strain on survey respondents, provided that the behavior and demographics captured of the consumers in the industry remain the same in future studies.

As our obtained dataset consists of questions catered for numerous different industries, we will be focusing our efforts on the Personal Care industry. Personal Care products include facial care products, cosmetics, perfume or cologne, skin care products, and hair care products. Our objective would be to identify the significant factors (comprises of social demographic and economic profile, devices, digital media platforms, and online behavior in terms of time spent, frequency, and part of day for devices and activities engagement of Personal Care consumers) that would allow us to quantify consumers’ behavior with their purchase pattern outcome of buying Personal Care products.

Study Context

This report employs the dataset from a 2015 syndicated research study called Connected Life, conducted by Taylor Nelson Sofres (TNS) Singapore, a market research company under the WPP group. This study aims to identify the target consumer profiles, devices, and digital media platforms that today’s connected consumers engage in, so as to allow businesses from different industries to formulate more targeted marketing strategies to help them maximize the return on investment on their business decisions. Thus, the survey questionnaire is crafted in a way that would cover questions catered for a net of multiple different industries, including Personal Care, Airline, Mobile, etc. As a result, questions were crafted such that they were mostly general questions that cover the industry view. However, based on the results of the study, specific parts of the results could be taken out for further analysis for interested companies.

Project Methodology

See here for more information about our data

    Modelling Process:
Modelling process.png

The figure above illustrates the explanatory modelling process used for our analysis. The full list of data preparation procedures have been listed in the following section. After data preparation, we proceeded with the exploratory data analysis (EDA) to help us understand more about the data. During this process, we often find ourselves iterating back to the data preparation stage upon observing the distributions of some of the variables.

Similarly during the model fitting stage, we find ourselves iterating through the model fitting and evaluation stage as we calibrate the model for optimal results. We evaluate and assess the performance of the models with several statistics such as Whole Model Test, Assessing Individual Parameters, Receiver Operating Characteristic (ROC) Curve, Fit Statistics, Misclassification Rate and Confusion Matrix.

Furthermore, after fitting and evaluating the model, we discover ways that we could improve our analysis. This brings us back to data preparation stage as we reorganize the data, followed by another round of EDA, model fitting and evaluation. Finally, we assess the models created and recommend list of actionable improvements to the marketers and market research firms.

Analytical Tools

JMP
SAS EM
Capable to write scripts using JMP Scripting Language to customize analysis and generating reports Capable to write scripts using SAS Language to customize analysis and generating reports
JMP holds data in RAM. It cannot handle data sets as large as can be handled by SAS. However, with less data it works faster due to memory processing Can process data on secondary storage instead of RAM thus able to process huge amount of data or more data than the RAM can hold
Cheaper (about $5k for the first year license) Expensive (over $100k for the first year license)
Reporting tool built-in with JMP that provides general-use reporting capabilities Powerful reporting tool with its Business Intelligence and Analytics software that allows very detailed customization of reports
JMP does not provide a workflow or history of analysis to keep track of progress. Organizes analysis into projects and diagrams with process flow diagrams thus able to track analysis procedure
JMP provides a very interactive GUI that allows users to do exploratory data analysis and try out various analytical methods easily and quickly Provides a server version for ease of collaboration on data cleansing, integration, security and access


The following are the consideration for choosing JMP as our tool of choice:
1. JMP is easier to learn as we had some experience in JMP. It is also easier to explore and manipulate the data with its GUI. This reduces the amount of time and effort for us to learn a new tool while allowing us to enhance our knowledge of JMP
2. Both tools have the statistical methods we expect to need for the project although SAS provides more options as compared to JMP. JMP has the decision tree, bootstrap forest, boosted forest and K nearest neighbour, which we expect to be sufficient for our project
3. Since we do not have huge amount of data that exceed the capacity that our RAM can hold, we do not require the capability of accessing secondary storage to process our data. Instead we do benefit from the relatively small data set that can be process by the RAM of our laptops which give a faster processing speed
4. Although both the JMP and SAS Enterprise Miner are accessible to us and both provide the capabilities for our project, we decided to use JMP due to the reasons mentioned above.