ANLY482 AY2017-18T2 Group32 : Project Overview / Methodology

From Analytics Practicum
Revision as of 16:40, 16 April 2018 by Pauline.lee.2014 (talk | contribs) (Created page with "<!--Team Logo--> <center> 300px </center> <!--End of Team Logo--> <!--Header--> {|style="padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadd...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Pharma G Logo.png

Home

About Us

Project Overview

Project Findings

Project Management

Documentation

ANLY482 AY2017-18 Main Page

Description Data Methodology

6.0 Methodology

6.0 Initial Data Observation
For Invoice Data, after filtering for the TCE brands, here is a distribution of all the TCE brands and its sales performance.

Data 1.png
Data 2.png


6.1 Data Preparation & Cleaning

Before performing any data analysis, it is critical to check for errors such as incomplete and invalid values in the data, which could lead to inaccurate results. Hence, the data must be cleaned to handle those missing and invalid values so that it is suitable for any further analysis. The following summarizes the data cleaning process we have taken for each dataset:

HCP

1. Removed dummy variables with suspicious looking values in the “postal code” column such as (000001,000002, etc)

Methedology 1.png

Invoice Detail

1. Filter transactions to obtain TCE products using TCE_Brands sheet
2. For negative values in the “sales qty” and “amount$” columns, we kept the values as it is. This is because after checking with our sponsor, these records are products that have been returned back to them, so they served as sales that have been void. Upon aggregation by quarters, no negative values will be present.
3. For the missing values in “price$” columns, we imputed their values to “0” as these rows contain records where free samples are given out to the customers.

Methedology 3.png

4. Using Ctrl-F function to show “search data tables” interface, then enter the following fields:

Methedology 4.png

5. For 5-digits values in “postal code” column, a new column is created to store the converted ‘postal code’ in 6-digits format, with data type changed from numerical to categorical and formula function used to add the missing ‘0’.

Methedology 5.png

6.2 Data Transformation
With the use of JMP Pro, we explored and joined data sets using common joints to analyze the relationships between different variables across the various data sets.

For instance, we joined HCO data with the Invoice Details using “ZP Account” in HCO and “Customer_code” in Invoice, so that we can analyze the sales for each clinic and observe what products do they often purchase

6.3 Exploratory Data Analysis (EDA)
EDA is able to help us understand the data provided better and it also has the potential to enable us to derive meaningful insights beyond the hypothesis testing stage. EDA may also help us to formulate hypotheses. For instance, we may be able to hypothesize if the means of products is significant or now. Tools used in this stage include Microsoft Excel, JMP Pro and R. The key findings from this exploratory phase will help us to prepare and build our dashboard visualization.

6.4 Softwares Used
Exploratory: Excel & Tableau

In-depth Analysis: JMP Pro & R

Visualization: Qlik Sense