Difference between revisions of "Hiryuu Findings"
Wtchua.2013 (talk | contribs) |
|||
Line 112: | Line 112: | ||
<h3> Geospatial Analysis </h3> | <h3> Geospatial Analysis </h3> | ||
+ | |||
+ | Sponsor data must be compatible with shp files found online. This can be done so by joining unique Country-Code identifiers, so creating our own unique identifier. Sponsor data had to be cleaned as well as they sometimes input "District, City" as Receiver. | ||
+ | If shp map and sponsor data have to many mismatches, but lat long is provided, then we can create our own polygons and shp file based on these coordinates. |
Revision as of 17:27, 21 February 2017
|
|
|
|
|
|
Data Preparation | Analysis |
---|
Data Tools Used
Microsoft Excel
Used a VBA macro to merge all individual data files into a compilation Macro’s function was to open all csv files in a folder and add them into a master sheet by the rows
Pros:
- Excel was the tool of choice by the sponsors, and it preserves the character encoding of non-western characters
- Easy to use
Cons:
Limited in the number of rows it could handle
JMP Pro
Pros:
- Useful in joining data tables with large number of data
- Useful in tabulating data of different variables into a data table
- Easy creation of visuals to look into the data
Cons:
- Faced issues with non-western characters inconsistently that made it difficult to determine the cause
- Difficulty in trying to use advanced functions that required scripting ability
- No formula to find date difference for working days only
Objectives
- Identifying the correct set of data
- Identifying complete shipments
- Identifying shipments as inbound or outbound
- Ensuring no missing data
- Identifying the relevant geospatial data for analysis
- Clarify unclear data/abnormalities with Sponsors
- E.g. Quantity > 1
General Steps Taken
1. Joining data tables for shipment details and status update
- Used left outer join with statuses as the main table so as to add on the corresponding shipment details to each status update
- Used Match Flag to check if there is missing data from either source
2. Joined the data table with the status codes list
- Status data only contained the stage code without explanation
- Needed to join with another data table with the stage codes and relevant descriptions
3. Removing Duplicate Rows
- Duplicate rows with identical data were present in the data given to us
- Needed to delete duplicates to ensure greater data accuracy for our analysis
- Duplicated data rows serves no practical purpose hence ought to be removed
- Used a JMP jsl script posted on the JMP community website to delete duplicate rows whilst preserving the first unique occurrence of it
4. Checking the data for any abnormalities based on standard parameters given by the sponsor
- Basic guidelines put forth by the Sponsor includes:
- Quantity = 1
- All should be domestic shipments
- With the exception for App 2's data
- Sender and Receiver Information should be included
Detailed Data Preparation Steps
Exploratory Analysis
1. Identifying inbound(IB) / outbound(OB) shipments
Different methods are used for different data sets according to the specifications of our sponsor.
Methods used include:
- Checking the receiver and sender name
- Identifying specific customer account no for IB and OB shipments
- Identifying IB and OB shipments using starting and ending stage codes
2. Calculating turnaround time (TAT)
TAT time is used to calculate the total time taken from the starting point to the ending point of the shipment. It is used to flag out and ensure that shipments stay within the stipulated time.
Steps conducted:
- Identify the starting and ending dates using the starting and ending stage codes and their relevant stage updated dates.
- Calculate the TAT time by the difference in the number of days between the ending and starting date.
For shipments that are missing either a starting or ending point, they are flagged out and labelled as "Incomplete" shipments.
Shipments that are complete, they will either by labelled as "Pass" or "Fail" depending on the calculated TAT time. If the TAT time <= N, it will be labelled as "Pass" else "Fail".
Geospatial Analysis
Sponsor data must be compatible with shp files found online. This can be done so by joining unique Country-Code identifiers, so creating our own unique identifier. Sponsor data had to be cleaned as well as they sometimes input "District, City" as Receiver. If shp map and sponsor data have to many mismatches, but lat long is provided, then we can create our own polygons and shp file based on these coordinates.