ANLY482 AY2017-18 T1 Group03/ProjectSteadfast Project Data

From Analytics Practicum
Revision as of 16:52, 13 October 2017 by Sohinid.2014 (talk | contribs)
Jump to navigation Jump to search

Logo3.PNG

Home About Us Project Overview Midterm Progress Final Progress Project Management Documentation

 


Data

Our client will be supplying us with both current and prior year (2015-2017) data. The type of files and invoices are summarised in the table below.

FileMidterms1.PNG


Data Cleaning and Transformation

We have done cleaning up and transformation of various files provided by the client with the help of both Excel software and JMP Pro software.

HCP:

1. Removal of duplicates Under the “Primary” column, there are 0 and 1, with 0 being those with old addresses and 1 being updated addresses. Therefore, with the help of JMP Pro software, we have removed all 0 as we will be following with the latest addresses in the invoice file. The new HCP file has been saved as HCP (Clean Copy).

HCO:

1. Standardization of data Once HCO file was imported into JMP Pro, we have decided to analyze if “ZP Account” is unique to “Name”.

By using the summary function for “Name” and “ZP Account”, we have found out that there are some ZP accounts representing more than one name. By analyzing further, some of the customer names are outdated when we cross-checked against the invoice file. Therefore, we have excluded those outdated names and renamed the file as HCO (Clean copy).

Invoice Detail Report (QlikView) – 2015 and 2016

1. Combination of raw files Client provided us with two invoice files which consist of the 2015 invoices and 2016/17 invoices. Using excel software, both invoice files are combined into one file.

2. Filtering out non-essential data After looking through the various columns, we have noticed that there are negative and insignificant sales amount. Those amounts were pertaining to voiding of sales, and free samples given out by the company respectively.

With the help of excel software, we have removed those amounts as they are not essential for our further analysis.

3. Including more details After analyzing the invoice file, we have decided to add in additional column for Therapy Area. As per our client, we should analyze sales per Therapy Area. Therefore, with the data provided from the client and from prior year project, we have added the column “Therapy Area” to further categorize the various products.

4. Changing of formats We have formatted the “Invoice date” and “Batch expiry” dates to short date with dd/mm/yyyy as standard date format. The purpose of doing so is that in the original invoice file, these 2 columns include dates and timings. By changing to the standard date format, the timings have been removed as we do not require them for further analysis.

We have also noticed that for “Postal Code”, there are some postal codes that have only five numeric digits when the file was imported into JMP Pro. The standard postal code should have six numeric digits in Singapore. Therefore, with the help of JMP Pro software, we have done the following to change those postal codes with five numeric digits to six:

i. Right click on Postal Code’s header -> Select “Column Info” -> Change data type to “Character” ii. Right click on Postal Code’s header -> Select “Insert Columns” iii. Right click new column -> Select “Formula” -> Enter the formula as shown below

PostalCode.png

iv. Click on “OK” to apply change

The new postal code column has been renamed to “Formatted Postal Code”. After performing the abovementioned steps, we have saved the invoice file as Clean Invoice (2015-2017) for both Excel and JMP Pro copies.