Difference between revisions of "ANLY482 AY2016-17 T2 Group10 Project Overview: Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 64: Line 64:
 
==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em;  text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic';  border-left:8px solid #1b96fe"><font color= #000000><strong>Methods of Analysis</strong></font></div>==
 
==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em;  text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic';  border-left:8px solid #1b96fe"><font color= #000000><strong>Methods of Analysis</strong></font></div>==
 
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px">
 
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px">
 +
=== Correlations ===
 +
Some questions we hope to answer include what should the business invest in in order to achieve higher efficiency and growth and which sales method is the most efficient. For this, we could look at correlations between sales revenue and inputs. While correlation is not indicative of causation, it can be highly suggestive.
 +
 +
=== Cluster Analysis + Machine Learning (Artificial Neural Networks) ===
 +
Depending on quality of data and conversations in future, we also hope to create a machine learning model that will be able to do some predictive analytics. For example, by predicting how would performance vary if we change an input resource.
 +
We could do clustering on the client data, and then for each client cluster, we can train an artificial neural network (ANN) on the sales inputs, client characteristics and resulting revenue and thereby predict results based on sales input. This is to create a predictive model for each type of client.
 +
After the clustering, we could also compare the revenue to the sales input to identify the more efficient teams or methods and recommend GSK to analyze them in future to uncover the reasons behind the efficiency and to spread them as best practices through the organization.
 +
 +
=== Survival Analysis ===
 +
Survival Analysis is a statistical technique used to analyze the expected duration of time until an event occurs and also one of the cornerstones of customer analytics . An event in our project context can be customer attrition (where existing customers turnover to other companies) or inventory depletion (where certain pharmaceutical products run dry). An understanding of when customer is most likely turnover or when inventory needs to be replenished enables GSK to plan in advance churn prevention efforts and engage in proactive customer communication to effectively improve sales.
  
 
</div>
 
</div>
 
<!-- End Body --->
 
<!-- End Body --->

Revision as of 00:49, 16 January 2017

Kesmyjxlogo.png

HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

Overview

Data

Methodology

<< ANLY482 AY2016-17 T2 Projects

Data Collection

The data given by GSK are mainly in the form of flat files (Excel). Each contains 1 or more sheets with multiple columns. Hence the data is very high in dimensionality. Metadata is not yet available, but from column headers and the conversation with the sponsor, we have an idea on which ones will be more relevant to us. Such data include sales information, competency and results of sale staff, and data on the methods of the salespeople. These data have been promised to us. To discover potential insights through spatial clustering analysis of sale territories, we also intend to collect spatial data from its vertical industries: hospitals, clinics and retail pharmacies. This can be easily collected from Singapore’s public data website, Data.gov.sg, in SHP or KML formats.


Data Preparation

The stage of data preparation (or data wrangling, newly termed as data preparation taken to the next level ) would involve employing techniques of ETL (Extract, Transform, Load) to form an Analytics Sandbox used for further exploratory analysis purposes. To better facilitate future analysis, we will be conducting ETL process and exploratory data analysis cyclically such that if the latter is not satisfactory, we will go back to revise the former. The entire process of data preparation will be done using JMP Pro 13, which supersedes its predecessor SAS Enterprise Guide and Miner and has capabilities in the fields of descriptive and predictive modelling required by our team.


Methods of Analysis

Correlations

Some questions we hope to answer include what should the business invest in in order to achieve higher efficiency and growth and which sales method is the most efficient. For this, we could look at correlations between sales revenue and inputs. While correlation is not indicative of causation, it can be highly suggestive.

Cluster Analysis + Machine Learning (Artificial Neural Networks)

Depending on quality of data and conversations in future, we also hope to create a machine learning model that will be able to do some predictive analytics. For example, by predicting how would performance vary if we change an input resource. We could do clustering on the client data, and then for each client cluster, we can train an artificial neural network (ANN) on the sales inputs, client characteristics and resulting revenue and thereby predict results based on sales input. This is to create a predictive model for each type of client. After the clustering, we could also compare the revenue to the sales input to identify the more efficient teams or methods and recommend GSK to analyze them in future to uncover the reasons behind the efficiency and to spread them as best practices through the organization.

Survival Analysis

Survival Analysis is a statistical technique used to analyze the expected duration of time until an event occurs and also one of the cornerstones of customer analytics . An event in our project context can be customer attrition (where existing customers turnover to other companies) or inventory depletion (where certain pharmaceutical products run dry). An understanding of when customer is most likely turnover or when inventory needs to be replenished enables GSK to plan in advance churn prevention efforts and engage in proactive customer communication to effectively improve sales.