Difference between revisions of "ANLY482 AY2016-17 T2 Group10 Project Overview: Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 62: Line 62:
 
Next, to determine the causality relationship between interactions and sales revenue, we need to join Call Details and Invoice Details tables. This leads us to understand variables from both tables to find out 1) which two variables are the same and whether their formats are alike, 2) at which granularity does each row from both tables represents. Upon fully understanding them, we performed aggregations, standardizations of formats and inclusions of HCP and HCO tables to serve as links. Furthermore, we added the dimensionality of employees’ teams from Employee table, as it will be useful in describing the relationship as mentioned. Finally, we integrated relevant tables into a consolidated table and loaded it into the JMP server for analysis.
 
Next, to determine the causality relationship between interactions and sales revenue, we need to join Call Details and Invoice Details tables. This leads us to understand variables from both tables to find out 1) which two variables are the same and whether their formats are alike, 2) at which granularity does each row from both tables represents. Upon fully understanding them, we performed aggregations, standardizations of formats and inclusions of HCP and HCO tables to serve as links. Furthermore, we added the dimensionality of employees’ teams from Employee table, as it will be useful in describing the relationship as mentioned. Finally, we integrated relevant tables into a consolidated table and loaded it into the JMP server for analysis.
 
<br/><br/>
 
<br/><br/>
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig1.png|500px]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden; border: 1px solid black">[[File:Mattfig1.png|500px]]</div>
 
<br/>
 
<br/>
 
The above diagram illustrates how data tables are being integrated, which took a few stages to achieve.
 
The above diagram illustrates how data tables are being integrated, which took a few stages to achieve.
Line 73: Line 73:
 
<br/>
 
<br/>
 
# For missing values under “price$” column in Invoice Details, we imputed their values to “0” as these rows contain records where free samples given out to customers.
 
# For missing values under “price$” column in Invoice Details, we imputed their values to “0” as these rows contain records where free samples given out to customers.
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig2.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig2.png]]</div>
 
#* Steps Taken:
 
#* Steps Taken:
 
#*# Press Ctrl-F to show “search data tables” interface
 
#*# Press Ctrl-F to show “search data tables” interface
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig3.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig3.png]]</div>
 
#*# Enter the following fields:  
 
#*# Enter the following fields:  
 
#*#* Find what: *
 
#*#* Find what: *
Line 92: Line 92:
 
#*# Right click on Postal Code’s header -> Select “Insert Columns”
 
#*# Right click on Postal Code’s header -> Select “Insert Columns”
 
#*# Right  click  new  column ->  Select  “Formula” ->  Enter  the  formula  as shown below  
 
#*# Right  click  new  column ->  Select  “Formula” ->  Enter  the  formula  as shown below  
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig4.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig4.png]]</div>
 
#*# Click on “OK” to apply change
 
#*# Click on “OK” to apply change
  
Line 117: Line 117:
 
#*# Go to “Tables” and “Join”
 
#*# Go to “Tables” and “Join”
 
#*# Enter fields as shown in screenshot below
 
#*# Enter fields as shown in screenshot below
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig5.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig5.png]]</div>
 
#*# Ensure  that  Left  Outer  Join  is  selected  and  output  columns  are populated before clicking “OK” to apply join
 
#*# Ensure  that  Left  Outer  Join  is  selected  and  output  columns  are populated before clicking “OK” to apply join
  
Line 125: Line 125:
 
#*# Go to “Tables” and “Join”
 
#*# Go to “Tables” and “Join”
 
#*# Enter fields as shown in screenshot below
 
#*# Enter fields as shown in screenshot below
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig6.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig6.png]]</div>
 
#*# Ensure  that  Left  Outer  Join  is  selected  and  output  columns  are populated before clicking “OK” to apply join
 
#*# Ensure  that  Left  Outer  Join  is  selected  and  output  columns  are populated before clicking “OK” to apply join
  
Line 134: Line 134:
 
#*# Go to “Tables” and “Update”
 
#*# Go to “Tables” and “Update”
 
#*# Enter fields as shown in screenshot below
 
#*# Enter fields as shown in screenshot below
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig7.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig7.png]]</div>
 
#*# Click “OK” to apply update
 
#*# Click “OK” to apply update
 
</div>
 
</div>
Line 147: Line 147:
 
#*# Go to “Tables” and “Summary”
 
#*# Go to “Tables” and “Summary”
 
#*# Drop  fields  to  Statistics  and  Group  as  shown  in screenshot below
 
#*# Drop  fields  to  Statistics  and  Group  as  shown  in screenshot below
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig8.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig8.png]]</div>
 
#*# Click “OK” to summarize the tables
 
#*# Click “OK” to summarize the tables
  
Line 155: Line 155:
 
#*# Go to “Tables” and “Summary”
 
#*# Go to “Tables” and “Summary”
 
#*# Drop  fields  to  Statistics  and  Group  as  shown  in screenshot below
 
#*# Drop  fields  to  Statistics  and  Group  as  shown  in screenshot below
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig9.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig9.png]]</div>
 
#*# Click “OK” to summarize the tables
 
#*# Click “OK” to summarize the tables
 
</div>
 
</div>
Line 168: Line 168:
 
#* Steps Taken:
 
#* Steps Taken:
 
#*# We have identified the following matching variables from Call Details and Invoice Details
 
#*# We have identified the following matching variables from Call Details and Invoice Details
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig9b.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig9b.png]]</div>
 
#*# We will utilize Join function again to perform the inner join
 
#*# We will utilize Join function again to perform the inner join
 
#*# Go to “Tables” and “Join”
 
#*# Go to “Tables” and “Join”
 
#*# Enter fields as shown in screenshot below
 
#*# Enter fields as shown in screenshot below
<div style="border-radius:5px; display: inline-block; overflow: hidden;">[[File:Mattfig10.png]]</div>
+
<div style="border-radius:5px; display: inline-block; overflow: hidden;border: 1px solid black">[[File:Mattfig10.png]]</div>
 
#*# Ensure that inner join is selected and output columns are populated before clicking “OK” to finish the join
 
#*# Ensure that inner join is selected and output columns are populated before clicking “OK” to finish the join
 
</div>
 
</div>
 
<!-- End Body --->
 
<!-- End Body --->

Revision as of 11:21, 21 April 2017

Kesmyjxlogo.png

HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

Overview

Data

Methodology

<< ANLY482 AY2016-17 T2 Projects


Tools Used

Jmppro.png

SAS JMP Pro 13 is chosen as our primary tool for data preparation, exploratory and further analysis. It is an analytical software that can perform most statistical analysis on large datasets and generate results with interactive visualizations used by data scientists to manipulate data selection on the go. Furthermore, tutorials and guides are widely available online for us to learn JMP Pro 13’s different techniques and functions.
More importantly, its easy-to-use built-in tools enable us to conduct analysis of variance to determine relationship between interactions and sales revenue.


Data Preparation

Data preparation took us through time-consuming and tedious procedures to obtain high quality data. Though seemingly unrewarding, a set of high quality data allows for more accurate, reliable and consistent analysis of results. Therefore, it is imperative to invest a lot of time and effort on it to avoid getting false conclusions for our hypothesis. Firstly, using SAS JMP Pro, these tables are scanned for anomalies such as missing values or outliers. We corrected them appropriately, using imputation for missing values and omission for extreme outliers. This step ensures that the data we are using will not give us misleading insights.

Next, to determine the causality relationship between interactions and sales revenue, we need to join Call Details and Invoice Details tables. This leads us to understand variables from both tables to find out 1) which two variables are the same and whether their formats are alike, 2) at which granularity does each row from both tables represents. Upon fully understanding them, we performed aggregations, standardizations of formats and inclusions of HCP and HCO tables to serve as links. Furthermore, we added the dimensionality of employees’ teams from Employee table, as it will be useful in describing the relationship as mentioned. Finally, we integrated relevant tables into a consolidated table and loaded it into the JMP server for analysis.

Mattfig1.png


The above diagram illustrates how data tables are being integrated, which took a few stages to achieve.

Data Cleaning

The first stage of data preparation involves cleaning Invoice Details and Call Details tables.

  1. For missing values under “price$” column in Invoice Details, we imputed their values to “0” as these rows contain records where free samples given out to customers.
Mattfig2.png
    • Steps Taken:
      1. Press Ctrl-F to show “search data tables” interface
Mattfig3.png
      1. Enter the following fields:
        • Find what: *
        • Replace with 0
        • Tick “Match entire cell value”
        • Tick “Restrict to selected column” of “Price$”
        • Tick “Search data”
      2. Click on “Replace All” to apply change
  1. For negative “sales qty” and “amount$” values in Invoice Details, we did not take any action as they serve as records to void any sales that has been cancelled. Upon aggregation by quarters, no negative values will be present.
  1. For 5-digits postal codes in Invoice Details, we created a new column to store the converted ‘postal code’, with data type changed from numerical to categorical and formula function used to add the missing ‘0’.
    • Steps Taken:
      1. Right click on Postal Code’s header -> Select “Column Info” -> Change data type to “Character”
      2. Right click on Postal Code’s header -> Select “Insert Columns”
      3. Right click new column -> Select “Formula” -> Enter the formula as shown below
Mattfig4.png
      1. Click on “OK” to apply change
  1. To facilitate understanding quarterly performance of sales revenue in Invoice Details and Call Details, we added a “year-quarter” column for use during aggregation.
    • Steps Taken:
      1. Right click on “Invoice Date” of Invoice Details / “Date” of Call Details
      2. Select “New Formula Column” -> “Date Time” -> “Year Quarter” to create the new column
  2. For “Product Names” differences in Invoice Details and Call Details, we standardized a common format using the recode function. (No screenshots due to confidentiality clause)
    • Steps Taken:
      1. Click on “Product Name” of Invoice Details / “Product” of Call Details
      2. Go to “Cols” at the top -> Select “Recode”
      3. At the “recode” interface, input new values that are the standardized common format
      4. Click on “Done” and “In Place” to replace the old values

Adding Dimensionalities

The second stage of data preparation involves adding dimensionalities to Invoice Details and Call Details tables using HCO, HCP and Employee tables.

  1. Invoice Details and HCO are left outer joined to add dimensionality of clinic’s “name” needed for further joins, while keeping records in Invoice Details intact.
    • Steps Taken:
      1. We have identified the matching variables to be “CUSTOMER_CODE” in Invoice Details and “ZP Account” in HCO, both are IDs of clinics
      2. Go to “Tables” and “Join”
      3. Enter fields as shown in screenshot below
Mattfig5.png
      1. Ensure that Left Outer Join is selected and output columns are populated before clicking “OK” to apply join
  1. Call Details and HCP are left outer joined to add dimensionality of “primary parent” (clinic name of individual doctors) needed for further joins, while keeping records in Call Details intact.
    • Steps Taken:
      1. We have identified the common variables to be “Account” in Call Detailsand “Name” in HCP, both are name of doctors
      2. Go to “Tables” and “Join”
      3. Enter fields as shown in screenshot below
Mattfig6.png
      1. Ensure that Left Outer Join is selected and output columns are populated before clicking “OK” to apply join
  1. Invoice Details and Employee are left outer joined to add dimensionality of “therapy area” (sales teams) needed for further analysis, while keeping records in Call Details intact.
    • Steps Taken:
      1. We have identified the matching variables to be “Rep Name” in both Invoice Details and HCP, which are names of sales rep. Additionally, “year quarter” from both tables are also identified because sales reps may change their “Therapy area” (sales teams) by quarter basis.
      2. We will utilize another function “Update” to perform the same left outer join
      3. Go to “Tables” and “Update”
      4. Enter fields as shown in screenshot below
Mattfig7.png
      1. Click “OK” to apply update

Data Aggregation

The third stage of data preparation involves aggregating Invoice Details and Call Details

  1. Invoice Details are aggregated by “year-quarter” to derive additional columns “sum(sales qty)” and “sum(amount$)”, addition to variables we are interested at: “channel” (clinic type), “rep name”, “product name”, “name” (clinic’s name) and “therapy area” (sales teams).
    • Steps Taken:
      1. Aggregation will be performed using “Summary” function.
      2. Go to “Tables” and “Summary”
      3. Drop fields to Statistics and Group as shown in screenshot below
Mattfig8.png
      1. Click “OK” to summarize the tables
  1. Call Details are aggregated by “year-quarter” and “primary parent” to derive additional column of “no. of rows” (interaction count), addition to variables we are interested at: “call: owner name” (sales rep’s name) and “product”.
    • Steps Taken:
      1. Aggregation will be performed using “Summary” function.
      2. Go to “Tables” and “Summary”
      3. Drop fields to Statistics and Group as shown in screenshot below
Mattfig9.png
      1. Click “OK” to summarize the tables

Data Integration

The final stage of data preparation involves joining Invoice Details and Call Details

  1. Invoice Details and Call Details are inner joined by sales rep’s name, clinic’s name, product’s name and year-quarter. Other variables present in the final table are “channel”, “therapy area”, “interaction count”, “sum(sales qty)” and “sum(amount$).
    • Steps Taken:
      1. We have identified the following matching variables from Call Details and Invoice Details
Mattfig9b.png
      1. We will utilize Join function again to perform the inner join
      2. Go to “Tables” and “Join”
      3. Enter fields as shown in screenshot below
Mattfig10.png
      1. Ensure that inner join is selected and output columns are populated before clicking “OK” to finish the join