Difference between revisions of "ANLY482 AY2016-17 T2 Group10 Project Overview: Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 67: Line 67:
 
# Negative values in Sales Qty and Amount$ columns
 
# Negative values in Sales Qty and Amount$ columns
 
# Some Postal Code with only 5 digits (because they start with 0)
 
# Some Postal Code with only 5 digits (because they start with 0)
 
+
====Handling of missing values in Price$ column====
 +
The Price$ column determines the unit price of a specific dosage (SKU) of a drug and it can vary across different customers, time for different reasons (marketing, incentive for new purchase, etc). It becomes important for us to know why some of them have missing values because the unit price of any drug is usually defined before any purchase.<br/>
 +
Upon close inspection on the missing values using data filter, we are made known the following:
 +
* 2379 rows with missing
 +
* 1677 rows belong to product E/F
 +
* Most records have sales amount which are $0
 +
* Either Bonus Qty or Sample Qty are positive
 +
This tell us that these rows represented transactions that took place when drugs are given as samples or bonuses to serve as goodwill.<br/>
 +
Actions taken: We will be assigning a fair value of 0 to the missing values as JMP will ignore rows which have missing values if we were to take into consideration of price in our predictive analysis.
 
</div>
 
</div>
 
<!-- End Body --->
 
<!-- End Body --->

Revision as of 18:06, 21 February 2017

Kesmyjxlogo.png

HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

Overview

Data

Methodology

<< ANLY482 AY2016-17 T2 Projects

Data Preparation

Data preparation involves cleaning, transformation, and integration, which are standard procedures to standardize data across different datasets for their many formats, errors in data entries and granularity. We will first look at each of the data files, determine best ways to standardize formats and then perform aggregations on more granular data for integration purposes.


MCCP


Invoice Details

Data Cleaning

A brief scan of the entire Invoice Details data table led to 3 main areas to be cleaned.

  1. Missing values in Price$ column
  2. Negative values in Sales Qty and Amount$ columns
  3. Some Postal Code with only 5 digits (because they start with 0)

Handling of missing values in Price$ column

The Price$ column determines the unit price of a specific dosage (SKU) of a drug and it can vary across different customers, time for different reasons (marketing, incentive for new purchase, etc). It becomes important for us to know why some of them have missing values because the unit price of any drug is usually defined before any purchase.
Upon close inspection on the missing values using data filter, we are made known the following:

  • 2379 rows with missing
  • 1677 rows belong to product E/F
  • Most records have sales amount which are $0
  • Either Bonus Qty or Sample Qty are positive

This tell us that these rows represented transactions that took place when drugs are given as samples or bonuses to serve as goodwill.
Actions taken: We will be assigning a fair value of 0 to the missing values as JMP will ignore rows which have missing values if we were to take into consideration of price in our predictive analysis.