Difference between revisions of "Hiryuu Methodology"

From Analytics Practicum
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 64: Line 64:
 
<br>Multiple iterations of the dashboard will be conducted to increase the usability for our sponsor. We will conduct frequent feedbacks with our supervisor and sponsor to ensure that the dashboard is equipped with the data statistics and KPI most readily useful for the decision making. </p>
 
<br>Multiple iterations of the dashboard will be conducted to increase the usability for our sponsor. We will conduct frequent feedbacks with our supervisor and sponsor to ensure that the dashboard is equipped with the data statistics and KPI most readily useful for the decision making. </p>
  
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Analysis</strong></font></div></div>==
+
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Restrictions in Hong Kong</strong></font></div></div>==
  
<p><h3>1. Exploratory Analysis</h3></p>
+
<p>Due to the limitations in the postal code system in Hong Kong as well as the inconsistencies in recording addresses in the database, it is thus difficult for our team to conduct geospatial analysis until further data resolutions are conducted in the future. </p>
An exploratory analysis will be conducted first to analyse the shipping behaviour of different customers in different countries.
 
* Determine the average turnaround time from the first to the last stage.
 
* Determine the average turnaround time for the statuses closure
 
* Identify patterns between destinations and shipment issues.
 
* Identify types of shipments with frequent shipment issues.
 
  
<p><h3>2. Geospatial Analysis</h3></p>
+
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Alternative Ways to Geocoding - Japan</strong></font></div></div>==
To align with the sponsor's requirements, we will do a choropleth mapping for both inbound and outbound shipments in both countries, Australia and Japan. The choropleth mapping will reflect the percentage of passes in each city. Each city on the map will also display further information such as:
 
* The TOTAL number of shipments to that city
 
* Number of shipments that PASSED
 
* Number of shipments that FAILED
 
* City and State information
 
  
<p><h3>3. Time Series Analysis</h3></p>
+
Other than the postal code system in Japan, they also have a geocoding system known as the JIS Code (市区町村コード). Unlike postal codes that may see updates following changes in addresses and how postal codes may be assigned separately for the commercial entities beyond geolocation specifications, JIS codes are bound to the addresses by geolocation. JIS codes are also used as identifiers in geospatial data files for Japan unlike postal codes, allowing for greater convenience and compatibility in using them as an identifier.  
As the data could be organised by the date, a time series analysis could be conducted. The time series analysis would be broken down into time periods of weeks and month to analyse and identify patterns and trends in the shipment and customer data.
 
<p>We will also attempt to determine if there are seasonality trends in shipment patterns across different countries for different shipments.</p>
 
  
 +
[[File:Figure5.png|500px|center]]
 +
<center>Fig: Sample of JIS Codes</center>
 +
 +
The JIS code system is handled by Japan’s Ministry of Internal Affairs and Communications. The JIS code system assigns a unique number to identify a specific geolocation based on geographical classifications used in the country. For example, JIS code 131041 is Shinjuku ward of Tokyo. This makes it more compatible with geospatial data files such as shapefiles that tends to have polygons on the same level of detail. Hence we have used the JIS code for our geospatial analysis.
 +
 +
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Determining Different End Points in a Shipment</strong></font></div></div>==
 +
 +
<p>One major difference between App1 and App2 is the list of endpoints recorded for each system. App2 provides an advantage is having specific starting and ending points to a shipment. This thus allows the ease in calculating the turnaround time of a shipment in App2. However, with specific starting and ending points, this results in less flexibility in further understanding the process of a shipment. As such, this is incorporated into App1, which contains a list of stage codes in categorising an ending point.</p><br>
 +
 +
<p>However with flexibility, there exists complexity is easily identifying an endpoint or the first delivery attempt of a shipment and thus difficulty in calculating the turnaround time for a shipment as seen in figure 6. As such, our team have come up with the solution in providing flexibility for our sponsors in determining the endpoints for shipments from a check box provided in the dashboard. The dashboard takes into account the endpoints checked and calculates the turnaround time for each shipment from the start till the end.</p>
 +
 +
[[File:Figure6.png|300px|center]]
 +
<center>Fig: Sample of Delivery Statuses</center>
 +
 +
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Packages</strong></font></div></div>==
 +
 +
<h4>Graph</h4>
 +
<p>To prepare the data for visualisation, numerous packages in R have been used. Firstly, ggplot2 which is a popular plotting system used in Python and R for making professional looking plots have been used to create and display different graphs. Additionally, plotly R allows for making interactive quality graphs which have helped to create tooltips upon hover as well as create drilldown charts and tables for further insights. </p><br>
 +
 +
<h4>Table</h4>
 +
<p>The data table uses the DT package for R Shiny, which provides an R interface to using the JavaScript library DataTables, creating R data objects that can be displayed as tables on HTML pages with other features for higher degree of manipulating the data tables. </p><br>
 +
 +
<p>In preparing the data for the data table, we have used the packages <i>dplyr</i>, <i>plyr</i>, <i>timeDate</i>, <i>bizdays</i> to perform data cleaning and calculations. <i>Dplyr</i>, in particular, allowed for manipulating data frames with operations like SQL functions which made it a lot easier in cleaning up the data and performing data table functions. </p><br>
 +
 +
<h4>Map</h4>
 +
<p>In order to perform accurate and meaningful geospatial analysis, we utilised the following R packages. <i>Tmap</i> is the thematic maps package which provides geographical maps in which spatial data distributions are visualized. This package was used with its key ability to create multiple flexible choropleth maps. The <i>tmaptools</i> package provides a set of tools for reading and processing spatial data. This package was utilised together with <i>tmap</i> to map data over into the relevant polygons. The <i>leaflet</i> package provided the base layer map, which is a basic geographical and visually pleasing map of the world. <i>Leaflet</i> also allows better usability for users as zooming in and out is enabled with the scrolling of the mouse. To write and save the shapefiles, the <i>rgdal</i> package allowed this to be done efficiently with straightforward methods. The publicly available online shapefiles were rather large to read into the application and caused a lot of loading issues. So, to improve the loading time, we used the gSimplify method from the <i>rgeos</i> package to reduce the quality of the polygons but still retaining the shape and accuracy.</p><br>
 +
 +
<i>simplifying the shp files for faster loading</i><br>
 +
library(rgeos)<br><br>
 +
 +
<i>online shp file</i><br>
 +
<b>au_map <- getData('GADM', country='AU', level=2)</b><br><br>
 +
 +
<i>method cannot work with NA values</i><br>
 +
<b>au_map <- au_map[,-(10)] #removes the CCN_2 column because of NA values</b><br><br>
 +
 +
<i>simplify the object</i><br>
 +
<b>au_map_simpl <- gSimplify(au_map, tol=0.01, topologyPreserve=TRUE)</b><br><br>
 +
 +
<i>The returned object is just the geometry, not the attributes, so you have to construct a new SpatialPolygonsDataFrame with the simplified geometry and the attribute data from the original</i><br>
 +
<b>final_map <- SpatialPolygonsDataFrame(au_map_simpl, data=au_map@data)</b> <br><br>
 +
 +
<i>save file</i><br>
 +
<b>saveRDS(final_map, "australia.rds")</b><br><br>
 
<!--------------- Body End ---------------------->
 
<!--------------- Body End ---------------------->

Latest revision as of 21:21, 22 April 2017

Current Project

Logo Hiryuu.png


Home

About Us

Project Overview

Findings

Project Management

Documentations

Background Data Methodology

Introduction

The main aim of this practicum is to give our sponsor an insight into the delivery patterns in the different countries managed, focusing on Australia and Japan as these 2 countries have posed the most problems. To do so we will first analyse the trends from 3 months worth of data using 3 main techniques, Exploratory, Time Series, and Geospatial.

With these analysis done, we hope to give our sponsors a clearer picture as to the reasons of failed deliveries so that it will aid the company in avoiding similar pitfalls in the future.

Objectives

There are 5 main objectives we aim to achieve:

  1. Understanding the patterns and trends across shipment routes in different countries
  2. Identify patterns such as the locations and timing for shipments with frequent issues
  3. Conducting time series analysis to determine the presence of seasonality in shipments
  4. Build a dashboard for single view of all data statistics for a particular country that can measure KPI easily. The sponsors we’re working with are focused on the marketing simplicity and efficiency only. The functions we hope to show includes the following:


Design Specification

  • Showing data records of parcels picked up but not replied
  • Show visual summary of shipments and current status
  • View failed deliveries at a single glance and detailed breakdown at a single click, including track by reference number for both inbound and outbound
  • Peak of the failure points when time series analysis
  • Simple to understand bar charts and histograms that represents KPI


Multiple iterations of the dashboard will be conducted to increase the usability for our sponsor. We will conduct frequent feedbacks with our supervisor and sponsor to ensure that the dashboard is equipped with the data statistics and KPI most readily useful for the decision making.

Restrictions in Hong Kong

Due to the limitations in the postal code system in Hong Kong as well as the inconsistencies in recording addresses in the database, it is thus difficult for our team to conduct geospatial analysis until further data resolutions are conducted in the future.

Alternative Ways to Geocoding - Japan

Other than the postal code system in Japan, they also have a geocoding system known as the JIS Code (市区町村コード). Unlike postal codes that may see updates following changes in addresses and how postal codes may be assigned separately for the commercial entities beyond geolocation specifications, JIS codes are bound to the addresses by geolocation. JIS codes are also used as identifiers in geospatial data files for Japan unlike postal codes, allowing for greater convenience and compatibility in using them as an identifier.

Figure5.png
Fig: Sample of JIS Codes

The JIS code system is handled by Japan’s Ministry of Internal Affairs and Communications. The JIS code system assigns a unique number to identify a specific geolocation based on geographical classifications used in the country. For example, JIS code 131041 is Shinjuku ward of Tokyo. This makes it more compatible with geospatial data files such as shapefiles that tends to have polygons on the same level of detail. Hence we have used the JIS code for our geospatial analysis.

Determining Different End Points in a Shipment

One major difference between App1 and App2 is the list of endpoints recorded for each system. App2 provides an advantage is having specific starting and ending points to a shipment. This thus allows the ease in calculating the turnaround time of a shipment in App2. However, with specific starting and ending points, this results in less flexibility in further understanding the process of a shipment. As such, this is incorporated into App1, which contains a list of stage codes in categorising an ending point.


However with flexibility, there exists complexity is easily identifying an endpoint or the first delivery attempt of a shipment and thus difficulty in calculating the turnaround time for a shipment as seen in figure 6. As such, our team have come up with the solution in providing flexibility for our sponsors in determining the endpoints for shipments from a check box provided in the dashboard. The dashboard takes into account the endpoints checked and calculates the turnaround time for each shipment from the start till the end.

Figure6.png
Fig: Sample of Delivery Statuses

Packages

Graph

To prepare the data for visualisation, numerous packages in R have been used. Firstly, ggplot2 which is a popular plotting system used in Python and R for making professional looking plots have been used to create and display different graphs. Additionally, plotly R allows for making interactive quality graphs which have helped to create tooltips upon hover as well as create drilldown charts and tables for further insights.


Table

The data table uses the DT package for R Shiny, which provides an R interface to using the JavaScript library DataTables, creating R data objects that can be displayed as tables on HTML pages with other features for higher degree of manipulating the data tables.


In preparing the data for the data table, we have used the packages dplyr, plyr, timeDate, bizdays to perform data cleaning and calculations. Dplyr, in particular, allowed for manipulating data frames with operations like SQL functions which made it a lot easier in cleaning up the data and performing data table functions.


Map

In order to perform accurate and meaningful geospatial analysis, we utilised the following R packages. Tmap is the thematic maps package which provides geographical maps in which spatial data distributions are visualized. This package was used with its key ability to create multiple flexible choropleth maps. The tmaptools package provides a set of tools for reading and processing spatial data. This package was utilised together with tmap to map data over into the relevant polygons. The leaflet package provided the base layer map, which is a basic geographical and visually pleasing map of the world. Leaflet also allows better usability for users as zooming in and out is enabled with the scrolling of the mouse. To write and save the shapefiles, the rgdal package allowed this to be done efficiently with straightforward methods. The publicly available online shapefiles were rather large to read into the application and caused a lot of loading issues. So, to improve the loading time, we used the gSimplify method from the rgeos package to reduce the quality of the polygons but still retaining the shape and accuracy.


simplifying the shp files for faster loading
library(rgeos)

online shp file
au_map <- getData('GADM', country='AU', level=2)

method cannot work with NA values
au_map <- au_map[,-(10)] #removes the CCN_2 column because of NA values

simplify the object
au_map_simpl <- gSimplify(au_map, tol=0.01, topologyPreserve=TRUE)

The returned object is just the geometry, not the attributes, so you have to construct a new SpatialPolygonsDataFrame with the simplified geometry and the attribute data from the original
final_map <- SpatialPolygonsDataFrame(au_map_simpl, data=au_map@data)

save file
saveRDS(final_map, "australia.rds")