Difference between revisions of "ANLY482 AY2017-18 T1 Group03/ProjectSteadfast Project Data"

From Analytics Practicum
Jump to navigation Jump to search
Line 140: Line 140:
  
 
<b><u>Customer 360 SG (Phone Web)</u></b>
 
<b><u>Customer 360 SG (Phone Web)</u></b>
1. Cleaning up of data
+
 
 +
1. Cleaning up of data
 
The excel file call duration (minutes) was imported as a text string, affecting the ordering of values (2 minutes, 20 minutes, 3 minutes) as well as the count. Under Excel formulae we created a separate column with "=VALUE(..)" to convert the text string to values for further analysis in JMP.
 
The excel file call duration (minutes) was imported as a text string, affecting the ordering of values (2 minutes, 20 minutes, 3 minutes) as well as the count. Under Excel formulae we created a separate column with "=VALUE(..)" to convert the text string to values for further analysis in JMP.
  
2. Initial Analysis
+
2. Initial Analysis
 
Other than 3 entries with the employee category as MSL rather than REP, we faced no issue with the entries for the phone web tab. With JMP, we did a summary analysis of the remaining entries and found one with 105 minutes, far exceeding the next nearest of 22 minutes.
 
Other than 3 entries with the employee category as MSL rather than REP, we faced no issue with the entries for the phone web tab. With JMP, we did a summary analysis of the remaining entries and found one with 105 minutes, far exceeding the next nearest of 22 minutes.
  

Revision as of 17:18, 13 October 2017

Logo3.PNG

Home About Us Project Overview Midterm Progress Final Progress Project Management Documentation

 


Data

Our client will be supplying us with both current and prior year (2015-2017) data. The type of files and invoices are summarised in the table below.

FileMidterms1.PNG


Data Cleaning and Transformation

We have done cleaning up and transformation of various files provided by the client with the help of both Excel software and JMP Pro software.

HCP:

1. Removal of duplicates Under the “Primary” column, there are 0 and 1, with 0 being those with old addresses and 1 being updated addresses. Therefore, with the help of JMP Pro software, we have removed all 0 as we will be following with the latest addresses in the invoice file. The new HCP file has been saved as HCP (Clean Copy).

HCO:

1. Standardization of data Once HCO file was imported into JMP Pro, we have decided to analyze if “ZP Account” is unique to “Name”.

By using the summary function for “Name” and “ZP Account”, we have found out that there are some ZP accounts representing more than one name. By analyzing further, some of the customer names are outdated when we cross-checked against the invoice file. Therefore, we have excluded those outdated names and renamed the file as HCO (Clean copy).

Invoice Detail Report (QlikView) – 2015 and 2016

1. Combination of raw files Client provided us with two invoice files which consist of the 2015 invoices and 2016/17 invoices. Using excel software, both invoice files are combined into one file.

2. Filtering out non-essential data After looking through the various columns, we have noticed that there are negative and insignificant sales amount. Those amounts were pertaining to voiding of sales, and free samples given out by the company respectively.

With the help of excel software, we have removed those amounts as they are not essential for our further analysis.

3. Including more details After analyzing the invoice file, we have decided to add in additional column for Therapy Area. As per our client, we should analyze sales per Therapy Area. Therefore, with the data provided from the client and from prior year project, we have added the column “Therapy Area” to further categorize the various products.

4. Changing of formats We have formatted the “Invoice date” and “Batch expiry” dates to short date with dd/mm/yyyy as standard date format. The purpose of doing so is that in the original invoice file, these 2 columns include dates and timings. By changing to the standard date format, the timings have been removed as we do not require them for further analysis.

We have also noticed that for “Postal Code”, there are some postal codes that have only five numeric digits when the file was imported into JMP Pro. The standard postal code should have six numeric digits in Singapore. Therefore, with the help of JMP Pro software, we have done the following to change those postal codes with five numeric digits to six:

i. Right click on Postal Code’s header -> Select “Column Info” -> Change data type to “Character” ii. Right click on Postal Code’s header -> Select “Insert Columns” iii. Right click new column -> Select “Formula” -> Enter the formula as shown below

PostalCode.png

iv. Click on “OK” to apply change

The new postal code column has been renamed to “Formatted Postal Code”. After performing the abovementioned steps, we have saved the invoice file as Clean Invoice (2015-2017) for both Excel and JMP Pro copies.


Customer 360 SG (F2F)

1. Changing of formats Before importing the original file into JMP Pro, we have changed the date format for “Interaction Date” to short date (dd/mm/yyyy). After the file was imported into JMP Pro, we have also changed the data type of “Interaction Date” to Numeric (Ordinal- d/m/y).

2. Including more details Similar to what we did to the invoice file, we have included a new column “Therapy Area” for us to further analyze customer interaction per Therapy Area.

There is a product “SG_Dummy_Detail” which we have removed as the data pertaining to that product are for testing only.

There were some products that we could not find their respective Therapy Area to assign to and they are: i. Prolia ii. QIV iii. Corporate General We have removed these products as we have cross-checked against invoice file and they do not appear as products sold for the year. We have also created a new column “ZP Account” so as to link it to “Customer_Code” for our further analysis later. Using the Update function under the Table tab, we matched Account Name in the Customer 360 SG (F2F) file to Name in HCP (Clean Copy). 3. Reformat of Territory name For us to analyze how much time was spent by each sales representative, we have decided to reformat “Territory Name” into “Rep Code”. This was done to be consistent with the “Rep Code” found in the invoice file. The function used was Text to Columns under Cols tab. 4. Resolving double counting of duration issues When we did a further analysis of the file, we found out that there were double counting of duration in the same day and time. Below is an example of such issue:

FTF1.png

Taking above screenshot as an example, for the same touchpoint, the same sales representative can be selling 2 different products to his/her client. However, the sales representative recorded 310 minutes for both entries, making the total duration spent at client’s place to be 620 minutes. Therefore, we first did a summary function to count number of such double counted entries there are for each client. The groups to be summarized are shown below:

FTF2.png

With the number of products counted for each client by sales representative, we allocated the “Interaction Duration (Minutes)” based on the number of counts and create a new column “Allocated Duration”. The new allocated duration is shown below:

FTF3.png

With the new allocated duration column, we can then have more accurate analysis when we are looking at interaction duration for each product. We have saved the file as Customer 360 SG (F2F).

Customer 360 SG (Phone Web)

1. Cleaning up of data The excel file call duration (minutes) was imported as a text string, affecting the ordering of values (2 minutes, 20 minutes, 3 minutes) as well as the count. Under Excel formulae we created a separate column with "=VALUE(..)" to convert the text string to values for further analysis in JMP.

2. Initial Analysis Other than 3 entries with the employee category as MSL rather than REP, we faced no issue with the entries for the phone web tab. With JMP, we did a summary analysis of the remaining entries and found one with 105 minutes, far exceeding the next nearest of 22 minutes.