ANLY482 AY2016-17 T2 Group10 Project Overview: Methodology

From Analytics Practicum
Revision as of 11:56, 15 April 2017 by Jxsim.2013 (talk | contribs)
Jump to navigation Jump to search

Kesmyjxlogo.png

HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

Overview

Data

Methodology

<< ANLY482 AY2016-17 T2 Projects


Tools Used

SAS JMP Pro 13 is chosen as our primary tool for data preparation, exploratory and further analysis. It is an analytical software that can perform most statistical analysis on large datasets and generate results with interactive visualizations used by data scientists to manipulate data selection on the go. Furthermore, tutorials and guides are widely available online for us to learn JMP Pro 13’s different techniques and functions. More importantly, its easy-to-use built-in tools enable us to conduct analysis of variance to determine relationship between interactions and sales revenue.


Data Preparation

Data preparation took us through time-consuming and tedious procedures to obtain high quality data. Though seemingly unrewarding, a set of high quality data allows for more accurate, reliable and consistent analysis of results. Therefore, it is imperative to invest a lot of time and effort on it to avoid getting false conclusions for our hypothesis.
image
The above diagram illustrates how data tables are being integrated, which took a few stages to achieve.


Data Cleaning

Data preparation took us through time-consuming and tedious procedures to obtain high quality data. Though seemingly unrewarding, a set of high quality data allows for more accurate, reliable and consistent analysis of results. Therefore, it is imperative to invest a lot of time and effort on it to avoid getting false conclusions for our hypothesis.
image
The above diagram illustrates how data tables are being integrated, which took a few stages to achieve.
The first stage of data preparation involves cleaning Invoice Details and Call Details tables.

  1. For missing values under “price$” column in Invoice Details, we imputed their values to “0” as these rows contain records where free samples given out to customers.
    • Steps Taken:
      1. Press Ctrl-F to show “search data tables” interface
      2. Enter the following fields:
        • Find what: *
        • Replace with 0
        • Tick “Match entire cell value”
        • Tick “Restrict to selected column” of “Price$”
        • Tick “Search data”
      3. Click on “Replace All” to apply change
  2. For negative “sales qty” and “amount$” values in Invoice Details, we did not take any action as they serve as records to void any sales that has been cancelled. Upon aggregation by quarters, no negative values will be present.
  3. For 5-digits postal codes in Invoice Details, we created a new column to store the converted ‘postal code’, with data type changed from numerical to categorical and formula function used to add the missing ‘0’.
    • Steps Taken:
      1. Right click on Postal Code’s header -> Select “Column Info” -> Change data type to “Character”
      2. Right click on Postal Code’s header -> Select “Insert Columns”
      3. Right click new column -> Select “Formula” -> Enter the formula as shown below
      4. Click on “OK” to apply change
  4. To facilitate understanding quarterly performance of sales revenue in Invoice Details and Call Details, we added a “year-quarter” column for use during aggregation.
    • Steps Taken:
      1. Right click on “Invoice Date” of Invoice Details / “Date” of Call Details
      2. Select “New Formula Column” -> “Date Time” -> “Year Quarter” to create the new column
  5. For “Product Names” differences in Invoice Details and Call Details, we standardized a common format using the recode function. (No screenshots due to confidentiality clause)
    • Steps Taken:
      1. Click on “Product Name” of Invoice Details / “Product” of Call Details
      2. Go to “Cols” at the top -> Select “Recode”
      3. At the “recode” interface, input new values that are the standardized common format
      4. Click on “Done” and “In Place” to replace the old values


MCCP