Difference between revisions of "Hiryuu Methodology"

From Analytics Practicum
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 73: Line 73:
  
 
[[File:Figure5.png|500px|center]]
 
[[File:Figure5.png|500px|center]]
<center>Fig 5: Sample of JIS Codes</center>
+
<center>Fig: Sample of JIS Codes</center>
  
 
The JIS code system is handled by Japan’s Ministry of Internal Affairs and Communications. The JIS code system assigns a unique number to identify a specific geolocation based on geographical classifications used in the country. For example, JIS code 131041 is Shinjuku ward of Tokyo. This makes it more compatible with geospatial data files such as shapefiles that tends to have polygons on the same level of detail. Hence we have used the JIS code for our geospatial analysis.
 
The JIS code system is handled by Japan’s Ministry of Internal Affairs and Communications. The JIS code system assigns a unique number to identify a specific geolocation based on geographical classifications used in the country. For example, JIS code 131041 is Shinjuku ward of Tokyo. This makes it more compatible with geospatial data files such as shapefiles that tends to have polygons on the same level of detail. Hence we have used the JIS code for our geospatial analysis.
 
  
 
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Determining Different End Points in a Shipment</strong></font></div></div>==
 
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Determining Different End Points in a Shipment</strong></font></div></div>==
Line 84: Line 83:
 
<p>However with flexibility, there exists complexity is easily identifying an endpoint or the first delivery attempt of a shipment and thus difficulty in calculating the turnaround time for a shipment as seen in figure 6. As such, our team have come up with the solution in providing flexibility for our sponsors in determining the endpoints for shipments from a check box provided in the dashboard. The dashboard takes into account the endpoints checked and calculates the turnaround time for each shipment from the start till the end.</p>
 
<p>However with flexibility, there exists complexity is easily identifying an endpoint or the first delivery attempt of a shipment and thus difficulty in calculating the turnaround time for a shipment as seen in figure 6. As such, our team have come up with the solution in providing flexibility for our sponsors in determining the endpoints for shipments from a check box provided in the dashboard. The dashboard takes into account the endpoints checked and calculates the turnaround time for each shipment from the start till the end.</p>
  
[[File:Figure6.png|500px|center]]
+
[[File:Figure6.png|300px|center]]
<center>Fig 6: Sample of Delivery Statuses</center>
+
<center>Fig: Sample of Delivery Statuses</center>
 +
 
 +
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Packages</strong></font></div></div>==
  
 +
<h4>Graph</h4>
 +
<p>To prepare the data for visualisation, numerous packages in R have been used. Firstly, ggplot2 which is a popular plotting system used in Python and R for making professional looking plots have been used to create and display different graphs. Additionally, plotly R allows for making interactive quality graphs which have helped to create tooltips upon hover as well as create drilldown charts and tables for further insights. </p><br>
  
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Manipulating Different Data Systems</strong></font></div></div>==
+
<h4>Table</h4>
 +
<p>The data table uses the DT package for R Shiny, which provides an R interface to using the JavaScript library DataTables, creating R data objects that can be displayed as tables on HTML pages with other features for higher degree of manipulating the data tables. </p><br>
  
<p>As mentioned and elaborated above in the differences in data structures for App1 and App2, we have thus created two separate dashboards to accommodate the differences. However, the reporting of the information and charts used will be same to provide consistency in data understanding for our clients. </p>
+
<p>In preparing the data for the data table, we have used the packages <i>dplyr</i>, <i>plyr</i>, <i>timeDate</i>, <i>bizdays</i> to perform data cleaning and calculations. <i>Dplyr</i>, in particular, allowed for manipulating data frames with operations like SQL functions which made it a lot easier in cleaning up the data and performing data table functions. </p><br>
  
=Dashboard Design=
+
<h4>Map</h4>
<p>In this section, we will explain our application design of the dashboard we have created to best fit the shipment data given to us by our sponsor. In this project, we have used R Shiny to create a dashboard to integrate the different insights we have gathered.</p>
+
<p>In order to perform accurate and meaningful geospatial analysis, we utilised the following R packages. <i>Tmap</i> is the thematic maps package which provides geographical maps in which spatial data distributions are visualized. This package was used with its key ability to create multiple flexible choropleth maps. The <i>tmaptools</i> package provides a set of tools for reading and processing spatial data. This package was utilised together with <i>tmap</i> to map data over into the relevant polygons. The <i>leaflet</i> package provided the base layer map, which is a basic geographical and visually pleasing map of the world. <i>Leaflet</i> also allows better usability for users as zooming in and out is enabled with the scrolling of the mouse. To write and save the shapefiles, the <i>rgdal</i> package allowed this to be done efficiently with straightforward methods. The publicly available online shapefiles were rather large to read into the application and caused a lot of loading issues. So, to improve the loading time, we used the gSimplify method from the <i>rgeos</i> package to reduce the quality of the polygons but still retaining the shape and accuracy.</p><br>
  
<p>R Shiny is a web application framework for R, which allows us to turn our analyses done in R into interactive web applications that can be hosted on a server for easy access by our sponsors. Our choice of using R Shiny is because of its ease of use, and flexibility in integrating different types of charts, as well as it being open-sourced and free. Compared to other commercial platforms available, R Shiny would serve to be a more sustainable platform for our sponsors to use for that it is free and that no web development skills are required, making it easier for them to make changes to fit their situation. An interactive application would best fit the needs of the sponsors, for the easy usage with controls fit to their specifications would suit the needs of the sales team.</p>
+
<i>simplifying the shp files for faster loading</i><br>
 +
library(rgeos)<br><br>
  
<p>The full dashboard consists of the main body and the sidebar. The sidebar consists of filters and the navigation tabs for the main body. The main body displays the different data visualizations available, such as the graphs, the data tables and the geospatial map. We will explain the 4 main parts to our dashboard below, namely, the Filters, Summary and Graphs, Data table and Geospatial Map.</p>
+
<i>online shp file</i><br>
 +
<b>au_map <- getData('GADM', country='AU', level=2)</b><br><br>
  
* Filters
+
<i>method cannot work with NA values</i><br>
<p>To capture the flexibility in determining the start and ending points of a shipment, filters in the form of a check box have been created to allow our sponsors to set the start and end statuses according to their specifications to be taken into the calculation for the turnaround time.
+
<b>au_map <- au_map[,-(10)] #removes the CCN_2 column because of NA values</b><br><br>
Additionally, an Inbound/Outbound (IB/OB) filter has been created to allow our sponsors to easily filter to those categories of shipments so that they can understand the situation of the shipments respectively.</p>
 
  
[[File:Figure7.png|center|150px]]
+
<i>simplify the object</i><br>
<center>Fig 7: Screenshot of Dashboard Filters</center>
+
<b>au_map_simpl <- gSimplify(au_map, tol=0.01, topologyPreserve=TRUE)</b><br><br>
  
* Graphs
+
<i>The returned object is just the geometry, not the attributes, so you have to construct a new SpatialPolygonsDataFrame with the simplified geometry and the attribute data from the original</i><br>
<h4>Summary - Understanding of the Overall Situation</h4>
+
<b>final_map <- SpatialPolygonsDataFrame(au_map_simpl, data=au_map@data)</b> <br><br>
[[File:Figure8.png|200px]]
 
[[File:Figure9.png|200px]]
 
<h4>Deeper Understanding of the Situation</h4>
 
  
 +
<i>save file</i><br>
 +
<b>saveRDS(final_map, "australia.rds")</b><br><br>
 
<!--------------- Body End ---------------------->
 
<!--------------- Body End ---------------------->

Latest revision as of 21:21, 22 April 2017

Current Project

Logo Hiryuu.png


Home

About Us

Project Overview

Findings

Project Management

Documentations

Background Data Methodology

Introduction

The main aim of this practicum is to give our sponsor an insight into the delivery patterns in the different countries managed, focusing on Australia and Japan as these 2 countries have posed the most problems. To do so we will first analyse the trends from 3 months worth of data using 3 main techniques, Exploratory, Time Series, and Geospatial.

With these analysis done, we hope to give our sponsors a clearer picture as to the reasons of failed deliveries so that it will aid the company in avoiding similar pitfalls in the future.

Objectives

There are 5 main objectives we aim to achieve:

  1. Understanding the patterns and trends across shipment routes in different countries
  2. Identify patterns such as the locations and timing for shipments with frequent issues
  3. Conducting time series analysis to determine the presence of seasonality in shipments
  4. Build a dashboard for single view of all data statistics for a particular country that can measure KPI easily. The sponsors we’re working with are focused on the marketing simplicity and efficiency only. The functions we hope to show includes the following:


Design Specification

  • Showing data records of parcels picked up but not replied
  • Show visual summary of shipments and current status
  • View failed deliveries at a single glance and detailed breakdown at a single click, including track by reference number for both inbound and outbound
  • Peak of the failure points when time series analysis
  • Simple to understand bar charts and histograms that represents KPI


Multiple iterations of the dashboard will be conducted to increase the usability for our sponsor. We will conduct frequent feedbacks with our supervisor and sponsor to ensure that the dashboard is equipped with the data statistics and KPI most readily useful for the decision making.

Restrictions in Hong Kong

Due to the limitations in the postal code system in Hong Kong as well as the inconsistencies in recording addresses in the database, it is thus difficult for our team to conduct geospatial analysis until further data resolutions are conducted in the future.

Alternative Ways to Geocoding - Japan

Other than the postal code system in Japan, they also have a geocoding system known as the JIS Code (市区町村コード). Unlike postal codes that may see updates following changes in addresses and how postal codes may be assigned separately for the commercial entities beyond geolocation specifications, JIS codes are bound to the addresses by geolocation. JIS codes are also used as identifiers in geospatial data files for Japan unlike postal codes, allowing for greater convenience and compatibility in using them as an identifier.

Figure5.png
Fig: Sample of JIS Codes

The JIS code system is handled by Japan’s Ministry of Internal Affairs and Communications. The JIS code system assigns a unique number to identify a specific geolocation based on geographical classifications used in the country. For example, JIS code 131041 is Shinjuku ward of Tokyo. This makes it more compatible with geospatial data files such as shapefiles that tends to have polygons on the same level of detail. Hence we have used the JIS code for our geospatial analysis.

Determining Different End Points in a Shipment

One major difference between App1 and App2 is the list of endpoints recorded for each system. App2 provides an advantage is having specific starting and ending points to a shipment. This thus allows the ease in calculating the turnaround time of a shipment in App2. However, with specific starting and ending points, this results in less flexibility in further understanding the process of a shipment. As such, this is incorporated into App1, which contains a list of stage codes in categorising an ending point.


However with flexibility, there exists complexity is easily identifying an endpoint or the first delivery attempt of a shipment and thus difficulty in calculating the turnaround time for a shipment as seen in figure 6. As such, our team have come up with the solution in providing flexibility for our sponsors in determining the endpoints for shipments from a check box provided in the dashboard. The dashboard takes into account the endpoints checked and calculates the turnaround time for each shipment from the start till the end.

Figure6.png
Fig: Sample of Delivery Statuses

Packages

Graph

To prepare the data for visualisation, numerous packages in R have been used. Firstly, ggplot2 which is a popular plotting system used in Python and R for making professional looking plots have been used to create and display different graphs. Additionally, plotly R allows for making interactive quality graphs which have helped to create tooltips upon hover as well as create drilldown charts and tables for further insights.


Table

The data table uses the DT package for R Shiny, which provides an R interface to using the JavaScript library DataTables, creating R data objects that can be displayed as tables on HTML pages with other features for higher degree of manipulating the data tables.


In preparing the data for the data table, we have used the packages dplyr, plyr, timeDate, bizdays to perform data cleaning and calculations. Dplyr, in particular, allowed for manipulating data frames with operations like SQL functions which made it a lot easier in cleaning up the data and performing data table functions.


Map

In order to perform accurate and meaningful geospatial analysis, we utilised the following R packages. Tmap is the thematic maps package which provides geographical maps in which spatial data distributions are visualized. This package was used with its key ability to create multiple flexible choropleth maps. The tmaptools package provides a set of tools for reading and processing spatial data. This package was utilised together with tmap to map data over into the relevant polygons. The leaflet package provided the base layer map, which is a basic geographical and visually pleasing map of the world. Leaflet also allows better usability for users as zooming in and out is enabled with the scrolling of the mouse. To write and save the shapefiles, the rgdal package allowed this to be done efficiently with straightforward methods. The publicly available online shapefiles were rather large to read into the application and caused a lot of loading issues. So, to improve the loading time, we used the gSimplify method from the rgeos package to reduce the quality of the polygons but still retaining the shape and accuracy.


simplifying the shp files for faster loading
library(rgeos)

online shp file
au_map <- getData('GADM', country='AU', level=2)

method cannot work with NA values
au_map <- au_map[,-(10)] #removes the CCN_2 column because of NA values

simplify the object
au_map_simpl <- gSimplify(au_map, tol=0.01, topologyPreserve=TRUE)

The returned object is just the geometry, not the attributes, so you have to construct a new SpatialPolygonsDataFrame with the simplified geometry and the attribute data from the original
final_map <- SpatialPolygonsDataFrame(au_map_simpl, data=au_map@data)

save file
saveRDS(final_map, "australia.rds")