Difference between revisions of "Hiryuu Methodology"

From Analytics Practicum
Jump to navigation Jump to search
(Created page with "<!-- LOGO --> <!--MAIN HEADER --> {|style="background-color:#F5A9A9;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" | | style="font-family:Roboto; fon...")
 
 
(19 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
<!-- LOGO -->
 
<!-- LOGO -->
 +
[[Image:HomeButtonRounded400.png|50px |link=https://wiki.smu.edu.sg/ANLY482/ANLY482_AY2016-17_Term_2 |alt=Current Project]]
 +
[[File:Logo Hiryuu.png|250px|center]]<br>
 +
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
{|style="background-color:#F5A9A9;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
+
{|style="background-color:#A9F5F2;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
+
| style="font-family:Roboto; font-size:100%; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #0174DF; background:#848484; text-align:center;" width="15%" |  
| style="font-family:Roboto; font-size:15px; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #8A4B08; background:#848484; text-align:center;" width="10%" |  
 
 
;
 
;
[[Hiryuu_Home| <font color="#F7FE2E">Home</font>]]
+
[[Hiryuu_Home| <font color="#FFFFFF">Home</font>]]
  
| style="font-family:Roboto; font-size:15px; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #8A4B08; background:#848484; text-align:center;" width="10%" |  
+
| style="font-family:Roboto; font-size:100%; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #0174DF; background:#848484; text-align:center;" width="15%" |  
 
;
 
;
[[Hiryuu_About_Us| <font color="#F7FE2E">About Us</font>]]
+
[[Hiryuu_About_Us| <font color="#FFFFFF">About Us</font>]]
  
| style="font-family:Roboto; font-size:15px; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #FF0040; background:#848484; text-align:center;" width="20%" |  
+
| style="font-family:Roboto; font-size:100%; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #00BFFF; background:#A4A4A4; text-align:center;" width="20%" |  
 
;
 
;
[[Hiryuu_Project_Overview| <font color="#F7FE2E">Project Overview</font>]]
+
[[Hiryuu_Project_Overview| <font color="#FFFFFF">Project Overview</font>]]
  
| style="font-family:Roboto; font-size:15px; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #8A4B08; background:#848484; text-align:center;" width="10%" |  
+
| style="font-family:Roboto; font-size:100%; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #0174DF; background:#848484; text-align:center;" width="15%" |  
 
;
 
;
[[Hiryuu_Findings| <font color="#F7FE2E">Findings</font>]]
+
[[Hiryuu_Findings| <font color="#FFFFFF">Findings</font>]]
  
| style="font-family:Roboto; font-size:15px; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #8A4B08; background:#848484; text-align:center;" width="20%" |  
+
| style="font-family:Roboto; font-size:100%; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #0174DF; background:#848484; text-align:center;" width="20%" |  
 
;
 
;
[[Hiryuu_Project_Management| <font color="#F7FE2E">Project Management</font>]]
+
[[Hiryuu_Project_Management| <font color="#FFFFFF">Project Management</font>]]
  
| style="font-family:Roboto; font-size:15px; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #8A4B08; background:#848484; text-align:center;" width="15%" |  
+
| style="font-family:Roboto; font-size:100%; font-weight: bold; solid #F5A9A9; border-bottom:7px solid #0174DF; background:#848484; text-align:center;" width="15%" |  
 
;
 
;
[[Hiryuu_Documentations| <font color="#F7FE2E">Documentations</font>]]
+
[[Hiryuu_Documentations| <font color="#FFFFFF">Documentations</font>]]
 
|}
 
|}
<!-- End of MAIN HEADER -->
 
  
 
<!---------------START of sub menu ---------------------->
 
<!---------------START of sub menu ---------------------->
{| style="background-color:#ffffff; margin: 3px auto 0 auto" width="55%"
+
{| style="background-color:#ffffff; margin: 3px auto 0 auto" width="55%"|-  
|-  
 
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[Hiryuu_Project Overview| <span style="color:#3d3d3d">Background</span>]]
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[Hiryuu_Project Overview| <span style="color:#3d3d3d">Background</span>]]
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
Line 46: Line 46:
 
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Introduction</strong></font></div></div>==
 
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Introduction</strong></font></div></div>==
  
<p>The main aim of this practicum is to give our sponsor an insight into the delivery patterns in the different countries managed, focusing on Australia and Japan as these 2 countries have posed the most problems. To do so we will first analyse the trends from 3 months worth of data use 4 main techniques, Exploratory, Clustering, Time Series, and Geospatial. </p>
+
<p>The main aim of this practicum is to give our sponsor an insight into the delivery patterns in the different countries managed, focusing on Australia and Japan as these 2 countries have posed the most problems. To do so we will first analyse the trends from 3 months worth of data using 3 main techniques, Exploratory, Time Series, and Geospatial. </p>
 
<p>With these analysis done, we hope to give our sponsors a clearer picture as to the reasons of failed deliveries so that it will aid the company in avoiding similar pitfalls in the future. </p>
 
<p>With these analysis done, we hope to give our sponsors a clearer picture as to the reasons of failed deliveries so that it will aid the company in avoiding similar pitfalls in the future. </p>
  
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Tools Used</strong></font></div></div>==
+
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Objectives</strong></font></div></div>==
 +
 
 +
<p>There are 5 main objectives we aim to achieve:
 +
# Understanding the patterns and trends across shipment routes in different countries
 +
# Identify patterns such as the locations and timing for shipments with frequent issues
 +
# Conducting time series analysis to determine the presence of seasonality in shipments
 +
# Build a dashboard for single view of all data statistics for a particular country that can measure KPI easily. The sponsors we’re working with are focused on the marketing simplicity and efficiency only. The functions we hope to show includes the following:
 +
<br>Design Specification
 +
* Showing data records of parcels picked up but not replied
 +
* Show visual summary of shipments and current status
 +
* View failed deliveries at a single glance and detailed breakdown at a single click, including track by reference number for both inbound and outbound
 +
* Peak of the failure points when time series analysis
 +
* Simple to understand bar charts and histograms that represents KPI
 +
<br>Multiple iterations of the dashboard will be conducted to increase the usability for our sponsor. We will conduct frequent feedbacks with our supervisor and sponsor to ensure that the dashboard is equipped with the data statistics and KPI most readily useful for the decision making. </p>
 +
 
 +
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Restrictions in Hong Kong</strong></font></div></div>==
 +
 
 +
<p>Due to the limitations in the postal code system in Hong Kong as well as the inconsistencies in recording addresses in the database, it is thus difficult for our team to conduct geospatial analysis until further data resolutions are conducted in the future. </p>
 +
 
 +
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Alternative Ways to Geocoding - Japan</strong></font></div></div>==
 +
 
 +
Other than the postal code system in Japan, they also have a geocoding system known as the JIS Code (市区町村コード). Unlike postal codes that may see updates following changes in addresses and how postal codes may be assigned separately for the commercial entities beyond geolocation specifications, JIS codes are bound to the addresses by geolocation. JIS codes are also used as identifiers in geospatial data files for Japan unlike postal codes, allowing for greater convenience and compatibility in using them as an identifier.
 +
 
 +
[[File:Figure5.png|500px|center]]
 +
<center>Fig: Sample of JIS Codes</center>
 +
 
 +
The JIS code system is handled by Japan’s Ministry of Internal Affairs and Communications. The JIS code system assigns a unique number to identify a specific geolocation based on geographical classifications used in the country. For example, JIS code 131041 is Shinjuku ward of Tokyo. This makes it more compatible with geospatial data files such as shapefiles that tends to have polygons on the same level of detail. Hence we have used the JIS code for our geospatial analysis.
 +
 
 +
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Determining Different End Points in a Shipment</strong></font></div></div>==
 +
 
 +
<p>One major difference between App1 and App2 is the list of endpoints recorded for each system. App2 provides an advantage is having specific starting and ending points to a shipment. This thus allows the ease in calculating the turnaround time of a shipment in App2. However, with specific starting and ending points, this results in less flexibility in further understanding the process of a shipment. As such, this is incorporated into App1, which contains a list of stage codes in categorising an ending point.</p><br>
 +
 
 +
<p>However with flexibility, there exists complexity is easily identifying an endpoint or the first delivery attempt of a shipment and thus difficulty in calculating the turnaround time for a shipment as seen in figure 6. As such, our team have come up with the solution in providing flexibility for our sponsors in determining the endpoints for shipments from a check box provided in the dashboard. The dashboard takes into account the endpoints checked and calculates the turnaround time for each shipment from the start till the end.</p>
 +
 
 +
[[File:Figure6.png|300px|center]]
 +
<center>Fig: Sample of Delivery Statuses</center>
 +
 
 +
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Packages</strong></font></div></div>==
 +
 
 +
<h4>Graph</h4>
 +
<p>To prepare the data for visualisation, numerous packages in R have been used. Firstly, ggplot2 which is a popular plotting system used in Python and R for making professional looking plots have been used to create and display different graphs. Additionally, plotly R allows for making interactive quality graphs which have helped to create tooltips upon hover as well as create drilldown charts and tables for further insights. </p><br>
 +
 
 +
<h4>Table</h4>
 +
<p>The data table uses the DT package for R Shiny, which provides an R interface to using the JavaScript library DataTables, creating R data objects that can be displayed as tables on HTML pages with other features for higher degree of manipulating the data tables. </p><br>
 +
 
 +
<p>In preparing the data for the data table, we have used the packages <i>dplyr</i>, <i>plyr</i>, <i>timeDate</i>, <i>bizdays</i> to perform data cleaning and calculations. <i>Dplyr</i>, in particular, allowed for manipulating data frames with operations like SQL functions which made it a lot easier in cleaning up the data and performing data table functions. </p><br>
  
<p>We’ll be manually extracting the data we need from the raw data sheets provided. There is also the need to combine the data from both company’s applications (App1 and App2). After which we will proceed with the analysis using JMPro and Power BI to perform exploratory analysis, clustering, and time series. We agree that JMPro is a more powerful too but the reason for using Power BI is because our sponsors are familiar with the software so we want to get familiarise with its display as well so that we can have a better idea how to construct our final web app. QGIS will be our main application for the Geospatial analysis.</p>
+
<h4>Map</h4>
<p>Eventually we will display our findings on a single display (most probably Javascript) as per requested by the sponsor.</p>
+
<p>In order to perform accurate and meaningful geospatial analysis, we utilised the following R packages. <i>Tmap</i> is the thematic maps package which provides geographical maps in which spatial data distributions are visualized. This package was used with its key ability to create multiple flexible choropleth maps. The <i>tmaptools</i> package provides a set of tools for reading and processing spatial data. This package was utilised together with <i>tmap</i> to map data over into the relevant polygons. The <i>leaflet</i> package provided the base layer map, which is a basic geographical and visually pleasing map of the world. <i>Leaflet</i> also allows better usability for users as zooming in and out is enabled with the scrolling of the mouse. To write and save the shapefiles, the <i>rgdal</i> package allowed this to be done efficiently with straightforward methods. The publicly available online shapefiles were rather large to read into the application and caused a lot of loading issues. So, to improve the loading time, we used the gSimplify method from the <i>rgeos</i> package to reduce the quality of the polygons but still retaining the shape and accuracy.</p><br>
  
==<div style="background: #95A5A6; line-height: 0.3em; font-family:Roboto;  border-left: #6C7A89 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#ffffff"><strong>Analysis</strong></font></div></div>==
+
<i>simplifying the shp files for faster loading</i><br>
 +
library(rgeos)<br><br>
  
<p><h3>1. Exploratory Analysis</h3></p>
+
<i>online shp file</i><br>
An exploratory analysis will be conducted first to analyse the shipping behaviour of different customers in different countries.
+
<b>au_map <- getData('GADM', country='AU', level=2)</b><br><br>
* Determine the average turnaround time from the first to the last stage.
 
* Determine the average turnaround time for the statuses closure
 
* Identify patterns between destinations and shipment issues.
 
* Identify types of shipments with frequent shipment issues.
 
  
<p><h3>2. Geospatial Analysis</h3></p>
+
<i>method cannot work with NA values</i><br>
Shipping patterns and behaviour can be identified using geospatial analysis. The analysis will be narrowed down to the country, state/city and postal code. We will seek to answer the following questions:
+
<b>au_map <- au_map[,-(10)] #removes the CCN_2 column because of NA values</b><br><br>
* Where different customers lie on the map and hopefully identify the more popular areas and their reasons
 
* How different locations and proximity to the warehouses can affect shipment time and procedures.
 
* Identify and flag out destinations with high probability of shipment issues.
 
* Track different shipping routes from the start to the final to determine the average time required.
 
* Track different shipment status gap to determine partner’s performance in data provision/updates
 
  
<p><h3>3. Clustering</h3></p>
+
<i>simplify the object</i><br>
We plan to cluster our data based on type of customer, shipping history, activity level and any other potential classifications which we may identify in the future. Each customer/vendor will then be assigned a cluster number.
+
<b>au_map_simpl <- gSimplify(au_map, tol=0.01, topologyPreserve=TRUE)</b><br><br>
  
<p><h3>4. Time Series Analysis</h3></p>
+
<i>The returned object is just the geometry, not the attributes, so you have to construct a new SpatialPolygonsDataFrame with the simplified geometry and the attribute data from the original</i><br>
As the data could be organised by the date, a time series analysis could be conducted. The time series analysis would be broken down into time periods of weeks and month to analyse and identify patterns and trends in the shipment and customer data.
+
<b>final_map <- SpatialPolygonsDataFrame(au_map_simpl, data=au_map@data)</b> <br><br>
<p>We will also attempt to determine if there are seasonality trends in shipment patterns across different countries for different shipments.</p>
 
  
 +
<i>save file</i><br>
 +
<b>saveRDS(final_map, "australia.rds")</b><br><br>
 
<!--------------- Body End ---------------------->
 
<!--------------- Body End ---------------------->

Latest revision as of 21:21, 22 April 2017

Current Project

Logo Hiryuu.png


Home

About Us

Project Overview

Findings

Project Management

Documentations

Background Data Methodology

Introduction

The main aim of this practicum is to give our sponsor an insight into the delivery patterns in the different countries managed, focusing on Australia and Japan as these 2 countries have posed the most problems. To do so we will first analyse the trends from 3 months worth of data using 3 main techniques, Exploratory, Time Series, and Geospatial.

With these analysis done, we hope to give our sponsors a clearer picture as to the reasons of failed deliveries so that it will aid the company in avoiding similar pitfalls in the future.

Objectives

There are 5 main objectives we aim to achieve:

  1. Understanding the patterns and trends across shipment routes in different countries
  2. Identify patterns such as the locations and timing for shipments with frequent issues
  3. Conducting time series analysis to determine the presence of seasonality in shipments
  4. Build a dashboard for single view of all data statistics for a particular country that can measure KPI easily. The sponsors we’re working with are focused on the marketing simplicity and efficiency only. The functions we hope to show includes the following:


Design Specification

  • Showing data records of parcels picked up but not replied
  • Show visual summary of shipments and current status
  • View failed deliveries at a single glance and detailed breakdown at a single click, including track by reference number for both inbound and outbound
  • Peak of the failure points when time series analysis
  • Simple to understand bar charts and histograms that represents KPI


Multiple iterations of the dashboard will be conducted to increase the usability for our sponsor. We will conduct frequent feedbacks with our supervisor and sponsor to ensure that the dashboard is equipped with the data statistics and KPI most readily useful for the decision making.

Restrictions in Hong Kong

Due to the limitations in the postal code system in Hong Kong as well as the inconsistencies in recording addresses in the database, it is thus difficult for our team to conduct geospatial analysis until further data resolutions are conducted in the future.

Alternative Ways to Geocoding - Japan

Other than the postal code system in Japan, they also have a geocoding system known as the JIS Code (市区町村コード). Unlike postal codes that may see updates following changes in addresses and how postal codes may be assigned separately for the commercial entities beyond geolocation specifications, JIS codes are bound to the addresses by geolocation. JIS codes are also used as identifiers in geospatial data files for Japan unlike postal codes, allowing for greater convenience and compatibility in using them as an identifier.

Figure5.png
Fig: Sample of JIS Codes

The JIS code system is handled by Japan’s Ministry of Internal Affairs and Communications. The JIS code system assigns a unique number to identify a specific geolocation based on geographical classifications used in the country. For example, JIS code 131041 is Shinjuku ward of Tokyo. This makes it more compatible with geospatial data files such as shapefiles that tends to have polygons on the same level of detail. Hence we have used the JIS code for our geospatial analysis.

Determining Different End Points in a Shipment

One major difference between App1 and App2 is the list of endpoints recorded for each system. App2 provides an advantage is having specific starting and ending points to a shipment. This thus allows the ease in calculating the turnaround time of a shipment in App2. However, with specific starting and ending points, this results in less flexibility in further understanding the process of a shipment. As such, this is incorporated into App1, which contains a list of stage codes in categorising an ending point.


However with flexibility, there exists complexity is easily identifying an endpoint or the first delivery attempt of a shipment and thus difficulty in calculating the turnaround time for a shipment as seen in figure 6. As such, our team have come up with the solution in providing flexibility for our sponsors in determining the endpoints for shipments from a check box provided in the dashboard. The dashboard takes into account the endpoints checked and calculates the turnaround time for each shipment from the start till the end.

Figure6.png
Fig: Sample of Delivery Statuses

Packages

Graph

To prepare the data for visualisation, numerous packages in R have been used. Firstly, ggplot2 which is a popular plotting system used in Python and R for making professional looking plots have been used to create and display different graphs. Additionally, plotly R allows for making interactive quality graphs which have helped to create tooltips upon hover as well as create drilldown charts and tables for further insights.


Table

The data table uses the DT package for R Shiny, which provides an R interface to using the JavaScript library DataTables, creating R data objects that can be displayed as tables on HTML pages with other features for higher degree of manipulating the data tables.


In preparing the data for the data table, we have used the packages dplyr, plyr, timeDate, bizdays to perform data cleaning and calculations. Dplyr, in particular, allowed for manipulating data frames with operations like SQL functions which made it a lot easier in cleaning up the data and performing data table functions.


Map

In order to perform accurate and meaningful geospatial analysis, we utilised the following R packages. Tmap is the thematic maps package which provides geographical maps in which spatial data distributions are visualized. This package was used with its key ability to create multiple flexible choropleth maps. The tmaptools package provides a set of tools for reading and processing spatial data. This package was utilised together with tmap to map data over into the relevant polygons. The leaflet package provided the base layer map, which is a basic geographical and visually pleasing map of the world. Leaflet also allows better usability for users as zooming in and out is enabled with the scrolling of the mouse. To write and save the shapefiles, the rgdal package allowed this to be done efficiently with straightforward methods. The publicly available online shapefiles were rather large to read into the application and caused a lot of loading issues. So, to improve the loading time, we used the gSimplify method from the rgeos package to reduce the quality of the polygons but still retaining the shape and accuracy.


simplifying the shp files for faster loading
library(rgeos)

online shp file
au_map <- getData('GADM', country='AU', level=2)

method cannot work with NA values
au_map <- au_map[,-(10)] #removes the CCN_2 column because of NA values

simplify the object
au_map_simpl <- gSimplify(au_map, tol=0.01, topologyPreserve=TRUE)

The returned object is just the geometry, not the attributes, so you have to construct a new SpatialPolygonsDataFrame with the simplified geometry and the attribute data from the original
final_map <- SpatialPolygonsDataFrame(au_map_simpl, data=au_map@data)

save file
saveRDS(final_map, "australia.rds")