Difference between revisions of "ANLY482 AY2016-17 T2 Group10 Project Overview: Data"

From Analytics Practicum
Jump to navigation Jump to search
(data)
 
(12 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
<center>
 
<center>
 
{| style="background-color:#ffffff ; margin: 3px 10px 3px 10px; width="80%"|
 
{| style="background-color:#ffffff ; margin: 3px 10px 3px 10px; width="80%"|
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="190px" |  
+
| style="font-family:Century Gothic, Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #edf8b1" width="190px" |  
 
[[ANLY482_AY2016-17_T2_Group10|<font color="#3c3c3c"><strong>HOME</strong></font>]]
 
[[ANLY482_AY2016-17_T2_Group10|<font color="#3c3c3c"><strong>HOME</strong></font>]]
  
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="210px" |   
+
| style="font-family:Century Gothic, Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #c7e9b4" width="210px" |   
 
[[ANLY482_AY2016-17_T2_Group10_About_Us|<font color="#3c3c3c"><strong>ABOUT US</strong></font>]]
 
[[ANLY482_AY2016-17_T2_Group10_About_Us|<font color="#3c3c3c"><strong>ABOUT US</strong></font>]]
  
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #1b96fe" width="210px" |   
+
| style="font-family:Century Gothic, Open Sans, Arial, sans-serif; font-size:15px; text-align: center; background-color: #7fcdbb;border-top-right-radius:7px;border-top-left-radius:7px; border-top:solid #ffffff; border-bottom:solid #7fcdbb" width="210px" |   
[[ANLY482_AY2016-17_T2_Group10_Project_Overview|<font color="#3c3c3c"><strong>PROJECT OVERVIEW</strong></font>]]
+
[[ANLY482_AY2016-17_T2_Group10_Project_Overview|<font color="#fff"><strong>PROJECT OVERVIEW</strong></font>]]
  
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" |   
+
| style="font-family:Century Gothic, Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #41b6c4" width="230px" |   
 
[[ANLY482_AY2016-17_T2_Group10_Analysis_&_Findings|<font color="#3c3c3c"><strong>ANALYSIS & FINDINGS</strong></font>]]
 
[[ANLY482_AY2016-17_T2_Group10_Analysis_&_Findings|<font color="#3c3c3c"><strong>ANALYSIS & FINDINGS</strong></font>]]
  
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" |   
+
| style="font-family:Century Gothic, Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #2c7fb8" width="230px" |   
 
[[ANLY482_AY2016-17_T2_Group10_Project_Management|<font color="#3c3c3c"><strong>PROJECT MANAGEMENT</strong></font>]]
 
[[ANLY482_AY2016-17_T2_Group10_Project_Management|<font color="#3c3c3c"><strong>PROJECT MANAGEMENT</strong></font>]]
  
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="190px" |   
+
| style="font-family:Century Gothic, Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #253494" width="190px" |   
 
[[ANLY482_AY2016-17_T2_Group10_Documentation|<font color="#3c3c3c"><strong>DOCUMENTATION</strong></font>]]
 
[[ANLY482_AY2016-17_T2_Group10_Documentation|<font color="#3c3c3c"><strong>DOCUMENTATION</strong></font>]]
  
Line 32: Line 32:
 
<center>
 
<center>
 
{| style="background-color:#ffffff ; margin: 3px 10px 3px 10px;" width="80%"|
 
{| style="background-color:#ffffff ; margin: 3px 10px 3px 10px;" width="80%"|
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border:solid 1px #f5f5f5; border-radius: 7px; background-color: #fff" width="200px" |  
+
| style="font-family:Century Gothic, Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border:solid 1px #f5f5f5; border-radius: 7px; background-color: #fff;" width="200px" |  
 
[[ANLY482_AY2016-17_T2_Group10_Project_Overview|<font color="#3c3c3c"><strong>Overview</strong></font>]]
 
[[ANLY482_AY2016-17_T2_Group10_Project_Overview|<font color="#3c3c3c"><strong>Overview</strong></font>]]
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border:solid 1px #f5f5f5; border-radius: 7px; background-color: #f5f5f5" width="200px" |   
+
| style="font-family:Century Gothic, Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border:solid 2px #7fcdbb; border-radius: 7px; background-color: #fff" width="200px" |   
 
[[ANLY482_AY2016-17_T2_Group10_Project_Overview:_Data|<font color="#3c3c3c"><strong>Data</strong></font>]]
 
[[ANLY482_AY2016-17_T2_Group10_Project_Overview:_Data|<font color="#3c3c3c"><strong>Data</strong></font>]]
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border:solid 1px #f5f5f5; border-radius: 7px; background-color: #fff" width="200px" |   
+
| style="font-family:Century Gothic, Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border:solid 1px #f5f5f5; border-radius: 7px; background-color: #fff" width="200px" |   
 
[[ANLY482_AY2016-17_T2_Group10_Project_Overview:_Methodology|<font color="#3c3c3c"><strong>Methodology</strong></font>]]
 
[[ANLY482_AY2016-17_T2_Group10_Project_Overview:_Methodology|<font color="#3c3c3c"><strong>Methodology</strong></font>]]
 
|}
 
|}
 
</center>
 
</center>
 +
<div style="font-size: 13px; font-family:Century Gothic; background-color: #fff;  display: inline-block; padding: 5px; border-radius: 7px">
 +
[[ANLY482_AY2016-17_Term_2|<font color="#636363"><< ANLY482 AY2016-17 T2 Projects</font>]]
 +
</div>
 
<!------- End of Secondary Navigation Bar---->
 
<!------- End of Secondary Navigation Bar---->
  
 
<!-- Body -->
 
<!-- Body -->
 
+
==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em;  text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic';  border-left:8px solid #1b96fe; margin-bottom:5px"><font color= #000000><strong>Data Summary</strong></font></div>==
<big>Data Collection</big>
+
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px">
 
+
With the understanding of our sponsor’s motivations, our team sets in motion the data wrangling process – one that encompasses data cleaning, transformation and integration to obtain a consolidated JMP data table used for further analysis. For this research study, we have obtained a year’s worth of data from 2016. It consists of information on invoices, call details, employees and customers – each of which described in the summary table below
The data seems to be in the form of 12 excel files. Each contains 1 or more sheets with multiple columns. Hence the data is very high in dimensionality. Metadata is not yet available, but from column headers and the conversation with the sponsor, we have an idea on which ones will be more relevant to us. Such data include sales information, results of the sale staff, and data on the methods of the salespeople. These data have been promised to us.
+
 
+
<br />
We will need to determine which columns to focus our analysis on. This will be done in conversations with our sponsor as we seek to understand the data. Once we have understood the metadata, we will then be able to pull out the sales and other relevant data to begin exploratory data analysis. The reason for selecting only a portion of the data is that the large dimensionality would strain computer hardware and slow analysis. Additionally, there is a large amount of data that would not be in the scope of our project. We will be focusing on sales methods and results.
+
<center>
 
+
<table style="margin:auto;; border:1px solid #000; width: 80%">
 
+
  <tr style="background-color:#000">
<big>Data Preparation</big>
+
    <td style="border-right:1px solid #000;text-align:center;font-weight:bold;width:20%"><font color="#f2f4f4">File Name</font></td>
 
+
    <td style="border-right:1px solid #000;text-align:center;font-weight:bold;width:60%"><font color="#f2f4f4">Description</font></td>
We will need to clean the data. We would need to explore the data iteratively to identify anomalous patterns which we can then eliminate. For example, there could be many different versions of records that all refer to the same thing. “GSK”, “GlaxoSmithKline”, “GlaxoSmithKline plc” all refer to the same entity.
+
    <td style="border-right:1px solid #000;text-align:center;font-weight:bold;width:20%"><font color="#f2f4f4">No. of Rows</font></td>
 
+
  </tr>
Missing values will also be handled in this stage. The exact way we handle them will be determined when we actually take a look at the data. Our decision will be based on factors such as what data is missing, what proportion, etc. We may omit the rows with missing data from our analysis, or we may try to interpolate the missing data, etc.
+
  <tr>
 
+
    <td style="border-right:1px solid #000;text-align:center;width:20%">Call Details</td>
 
+
    <td style="border-right:1px solid #000;text-align:center;width:60%">Information on actual interaction between Sales Reps, Sale Targets for a Product Brand </td>
<big>Exploratory Data Analysis</big>
+
    <td style="border-right:1px solid #000;text-align:center;width:20%">42915</td>
 
+
  </tr>
A descriptive analytics dashboard will be created via JMP Pro. We will seek to uncover patterns and anomalies. We will perform scatter plots and histograms to identify trends. For example, if we find that certain teams have very little face-to-face interactions with customers, they may require more confidence training or the client they have been assigned is less receptive to face-to-face meetings. Any assumptions that we have, either by preconceived notions or passed to us by GSK will also be tested in this phase.
+
  <tr>
 
+
    <td style="border-right:1px solid #000;text-align:center;width:20%">Invoice Details</td>
 +
    <td style="border-right:1px solid #000;text-align:center;width:60%">Transactions of product purchases by Sales Targets</td>
 +
    <td style="border-right:1px solid #000;text-align:center;width:20%">110372</td>
 +
  </tr>
 +
  <tr>
 +
    <td style="border-right:1px solid #000;text-align:center;width:20%">Employee</td>
 +
    <td style="border-right:1px solid #000;text-align:center;width:60%">Information on employees and their teams - “Therapy Area” </td>
 +
    <td style="border-right:1px solid #000;text-align:center;width:20%">237</td>
 +
  </tr>
 +
  <tr>
 +
    <td style="border-right:1px solid #000;text-align:center;width:20%">HCP</td>
 +
    <td style="border-right:1px solid #000;text-align:center;width:60%">Information on individual doctors</td>
 +
    <td style="border-right:1px solid #000;text-align:center;width:20%">5871</td>
 +
  </tr>
 +
  <tr>
 +
    <td style="border-right:1px solid #000;text-align:center;width:20%">HCO</td>
 +
    <td style="border-right:1px solid #000;text-align:center;width:60%">Information on clinics, organizations</td>
 +
    <td style="border-right:1px solid #000;text-align:center;width:20%">4425</td>
 +
  </tr>
 +
</table>
 +
</center>
 +
</div>
 +
<!-- End Body --->
  
  
 +
<!-- Body -->
 +
==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em;  text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic';  border-left:8px solid #1b96fe; margin-bottom:5px"><font color= #000000><strong>Data Dictionary</strong></font></div>==
 +
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px">
 +
The data dictionary is available [https://docs.google.com/a/smu.edu.sg/document/d/17TYnpCpPC6bNzLZRglWHc83AfyzkjPMKdaQ-BkVIQMg/edit?usp=sharing here]
 +
</div>
 
<!-- End Body --->
 
<!-- End Body --->

Latest revision as of 10:29, 21 April 2017

Kesmyjxlogo.png

HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

Overview

Data

Methodology

<< ANLY482 AY2016-17 T2 Projects

Data Summary

With the understanding of our sponsor’s motivations, our team sets in motion the data wrangling process – one that encompasses data cleaning, transformation and integration to obtain a consolidated JMP data table used for further analysis. For this research study, we have obtained a year’s worth of data from 2016. It consists of information on invoices, call details, employees and customers – each of which described in the summary table below


File Name Description No. of Rows
Call Details Information on actual interaction between Sales Reps, Sale Targets for a Product Brand 42915
Invoice Details Transactions of product purchases by Sales Targets 110372
Employee Information on employees and their teams - “Therapy Area” 237
HCP Information on individual doctors 5871
HCO Information on clinics, organizations 4425


Data Dictionary

The data dictionary is available here