Difference between revisions of "ANLY482 AY2017-18 T1 Group03/ProjectSteadfast Project Data"

From Analytics Practicum
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 14: Line 14:
 
| style="font-size:120%; text-align:center; background-color:#ff6600" width="0.5%" |
 
| style="font-size:120%; text-align:center; background-color:#ff6600" width="0.5%" |
 
| style="font-size:120%;  text-align:center;background-color:#ffffff; width=12%" | [[ANLY482_AY2017-18_T1_Group03/About Us|<font face = "Century Gothic" color="#000000">About Us</font>]]
 
| style="font-size:120%;  text-align:center;background-color:#ffffff; width=12%" | [[ANLY482_AY2017-18_T1_Group03/About Us|<font face = "Century Gothic" color="#000000">About Us</font>]]
 
| style="font-size:120%; text-align:center; background-color:#ff6600" width="1%" |
 
| style="font-size:120%;  text-align:center;background-color:#ffcc99; width=12%" | [[ANLY482_AY2017-18_T1_Group03/Project_Overview|<font face = "Century Gothic" color="#000000">Project Overview</font>]]
 
  
 
| style="font-size:120%; text-align:center; background-color:#ff6600" width="0.5%" |
 
| style="font-size:120%; text-align:center; background-color:#ff6600" width="0.5%" |
Line 23: Line 20:
 
| style="font-size:120%; text-align:center; background-color:#ff6600" width="0.5%" |
 
| style="font-size:120%; text-align:center; background-color:#ff6600" width="0.5%" |
 
| style="font-size:120%;  text-align:center;background-color:#ffffff; width=12%" | [[ANLY482_AY2017-18_T1_Group03/Final_Progress|<font face = "Century Gothic" color="#000000">Final Progress</font>]]
 
| style="font-size:120%;  text-align:center;background-color:#ffffff; width=12%" | [[ANLY482_AY2017-18_T1_Group03/Final_Progress|<font face = "Century Gothic" color="#000000">Final Progress</font>]]
 +
 +
| style="font-size:120%; text-align:center; background-color:#ff6600" width="1%" |
 +
| style="font-size:120%;  text-align:center;background-color:#ffcc99; width=12%" | [[ANLY482_AY2017-18_T1_Group03/Project_Overview|<font face = "Century Gothic" color="#000000">Project Overview</font>]]
  
 
| style="font-size:120%; text-align:center; background-color:#ff6600" width="0.5%" |
 
| style="font-size:120%; text-align:center; background-color:#ff6600" width="0.5%" |
Line 63: Line 63:
  
 
<b><u> HCP: </u></b>  
 
<b><u> HCP: </u></b>  
 
+
[[Image:HCP.PNG|800px|center]]
1. Removal of duplicates
 
Under the “Primary” column, there are 0 and 1, with 0 being those with old addresses and 1 being updated addresses. Therefore, with the help of JMP Pro software, we have removed all 0 as we will be following with the latest addresses in the invoice file. The new HCP file has been saved as HCP (Clean Copy).
 
  
 
<b><u>HCO: </u></b>
 
<b><u>HCO: </u></b>
 
+
[[Image:HCO.PNG|800px|center]]
1. Standardization of data
 
Once HCO file was imported into JMP Pro, we have decided to analyze if “ZP Account” is unique to “Name”.
 
 
 
By using the summary function for “Name” and “ZP Account”, we have found out that there are some ZP accounts representing more than one name. By analyzing further, some of the customer names are outdated when we cross-checked against the invoice file. Therefore, we have excluded those outdated names and renamed the file as HCO (Clean copy).  
 
  
 
<b><u>Invoice Detail Report (QlikView) – 2015 and 2016 </u></b>
 
<b><u>Invoice Detail Report (QlikView) – 2015 and 2016 </u></b>
 
+
[[Image:Invoice1.PNG|800px|center]]
1. Combination of raw files
 
Client provided us with two invoice files which consist of the 2015 invoices and 2016/17 invoices. Using excel software, both invoice files are combined into one file.
 
 
 
2. Filtering out non-essential data
 
After looking through the various columns, we have noticed that there are negative and insignificant sales amount. Those amounts were pertaining to voiding of sales, and free samples given out by the company respectively.
 
 
 
With the help of excel software, we have removed those amounts as they are not essential for our further analysis.
 
 
 
3. Including more details
 
After analyzing the invoice file, we have decided to add in additional column for Therapy Area. As per our client, we should analyze sales per Therapy Area. Therefore, with the data provided from the client and from prior year project, we have added the column “Therapy Area” to further categorize the various products.
 
 
 
4. Changing of formats
 
We have formatted the “Invoice date” and “Batch expiry” dates to short date with dd/mm/yyyy as standard date format. The purpose of doing so is that in the original invoice file, these 2 columns include dates and timings. By changing to the standard date format, the timings have been removed as we do not require them for further analysis.
 
 
 
We have also noticed that for “Postal Code”, there are some postal codes that have only five numeric digits when the file was imported into JMP Pro. The standard postal code should have six numeric digits in Singapore. Therefore, with the help of JMP Pro software, we have done the following to change those postal codes with five numeric digits to six:
 
 
 
i. Right click on Postal Code’s header -> Select “Column Info” -> Change data type to “Character”
 
ii. Right click on Postal Code’s header -> Select “Insert Columns”
 
iii. Right click new column -> Select “Formula” -> Enter the formula as shown below
 
 
 
[[Image:PostalCode.png|700px|center]]
 
 
 
iv. Click on “OK” to apply change
 
 
 
The new postal code column has been renamed to “Formatted Postal Code”.
 
After performing the abovementioned steps, we have saved the invoice file as Clean Invoice (2015-2017) for both Excel and JMP Pro copies.
 
 
 
  
 
<b><u> Customer 360 SG (F2F) </u></b>
 
<b><u> Customer 360 SG (F2F) </u></b>
 
+
[[Image:F2F1.PNG|800px|center]]
1. Changing of formats
 
Before importing the original file into JMP Pro, we have changed the date format for “Interaction Date” to short date (dd/mm/yyyy). After the file was imported into JMP Pro, we have also changed the data type of “Interaction Date” to Numeric (Ordinal- d/m/y).
 
 
 
2. Including more details
 
Similar to what we did to the invoice file, we have included a new column “Therapy Area” for us to further analyze customer interaction per Therapy Area.
 
 
 
There is a product “SG_Dummy_Detail” which we have removed as the data pertaining to that product are for testing only.
 
 
 
There were some products that we could not find their respective Therapy Area to assign to and they are:
 
i. Prolia
 
ii. QIV
 
iii. Corporate General
 
We have removed these products as we have cross-checked against invoice file and they do not appear as products sold for the year.
 
We have also created a new column “ZP Account” so as to link it to “Customer_Code” for our further analysis later. Using the Update function under the Table tab, we matched Account Name in the Customer 360 SG (F2F) file to Name in HCP (Clean Copy).
 
3. Reformat of Territory name
 
For us to analyze how much time was spent by each sales representative, we have decided to reformat “Territory Name” into “Rep Code”. This was done to be consistent with the “Rep Code” found in the invoice file. The function used was Text to Columns under Cols tab.
 
4. Resolving double counting of duration issues
 
When we did a further analysis of the file, we found out that there were double counting of duration in the same day and time. Below is an example of such issue:
 
 
 
[[Image:FTF1.png|700px|center]]
 
 
 
Taking above screenshot as an example, for the same touchpoint, the same sales representative can be selling 2 different products to his/her client. However, the sales representative recorded 310 minutes for both entries, making the total duration spent at client’s place to be 620 minutes.
 
Therefore, we first did a summary function to count number of such double counted entries there are for each client. The groups to be summarized are shown below:
 
 
 
[[Image:FTF2.png|250px|center]]
 
 
 
With the number of products counted for each client by sales representative, we allocated the “Interaction Duration (Minutes)” based on the number of counts and create a new column “Allocated Duration”. The new allocated duration is shown below:
 
 
 
[[Image:FTF3.png|700px|center]]
 
 
 
With the new allocated duration column, we can then have more accurate analysis when we are looking at interaction duration for each product.
 
We have saved the file as Customer 360 SG (F2F).
 
  
 
<b><u>Customer 360 SG (Phone Web)</u></b>
 
<b><u>Customer 360 SG (Phone Web)</u></b>
 +
[[Image:PhoneWeb.PNG|800px|center]]
  
1. Cleaning up of data
 
The excel file call duration (minutes) was imported as a text string, affecting the ordering of values (2 minutes, 20 minutes, 3 minutes) as well as the count. Under Excel formulae we created a separate column with "=VALUE(..)" to convert the text string to values for further analysis in JMP.
 
 
2. Initial Analysis
 
Other than 3 entries with the employee category as MSL rather than REP, we faced no issue with the entries for the phone web tab. With JMP, we did a summary analysis of the remaining entries and found one with 105 minutes, far exceeding the next nearest of 22 minutes.
 
  
  
  
  
</font></div><br>
+
</div><br>

Latest revision as of 13:57, 16 October 2017

Logo3.PNG

Home About Us Midterm Progress Final Progress Project Overview Project Management Documentation

 


Data

Our client will be supplying us with both current and prior year (2015-2017) data. The type of files and invoices are summarised in the table below.

FileMidterms1.PNG


Data Cleaning and Transformation

We have done cleaning up and transformation of various files provided by the client with the help of both Excel software and JMP Pro software.

HCP:

HCP.PNG

HCO:

HCO.PNG

Invoice Detail Report (QlikView) – 2015 and 2016

Invoice1.PNG

Customer 360 SG (F2F)

F2F1.PNG

Customer 360 SG (Phone Web)

PhoneWeb.PNG