Difference between revisions of "ANLY482 AY2017-18T2 Group 11 Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 25: Line 25:
 
{| style="background-color:#ffffff; margin: 3px auto 0 auto" width="55%"
 
{| style="background-color:#ffffff; margin: 3px auto 0 auto" width="55%"
 
|-  
 
|-  
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[ANLY482_AY2017-18T2_Group 11 Project Overview| <span style="color:#149de7">Updated</span>]]
+
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[ANLY482_AY2017-18T2_Group 11 Project Overview| <span style="color:#149de7">Final</span>]]
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #149de7" width="150px"| [[ANLY482_AY2017-18T2_Group11 Project Overview Old| <span style="color:#149de7">Previous</span>]]
+
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #149de7" width="150px"| [[ANLY482_AY2017-18T2_Group11 Project Overview Old| <span style="color:#149de7">Initial</span>]]
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
|}
 
|}
Line 37: Line 37:
 
<br />
 
<br />
 
<div style="padding-left:70px; text-align: justify; width:980px;">
 
<div style="padding-left:70px; text-align: justify; width:980px;">
Technology has opened the floodgates for users globally to amass and transmit data, providing millions with unprecedented opportunities and benefits. This has given rise to the field of Data Analytics, which has become an important tool for companies to improve the efficiency in their operations. However, some companies are still lacking data analysis capabilities and often find it time-consuming to visualise, modify and interrogate data – especially when data is arising from more than one source such as machinery sensor, mobile devices, wearables, weblogs etc.  
+
The lack of use of a VRS to provide delivery routing solutions is one that resonates strongly with some SME companies operating within the logistics sector. A possible reason is due to the high cost associated with the purchase of such a software. To illustrate, Paragon – a VRS provider in the United Kingdom, revealed that a system designed for a Delivery Service Company with a fleet size of 100 vehicles can expect license fees of up to £50,000. To exacerbate this situation, the listed price has yet to include additional cost such as the training cost and the maintenance fee that is usually required with the use of the software. This example serves to exemplify the high cost associated with a VRS and it can be observed that this is not in line with the purchasing capabilities of a typical SME. Thus, the team believes that it is even more crucial to identify and create low-cost solutions so as to ensure SMEs, too, have vehicle routing tools at their fingertips.
 
<br/><br/>
 
<br/><br/>
While big data presents opportunities for many companies to leverage on, it requires a certain level of technical skill in order to successfully capitalise on this opportunity. The lack of technical capabilities within the company to derive operational solutions from data is a problem that resonates strongly with the sponsor company, much like many others. With the company slowly gaining foothold in the industry, it is imperative for the company to enhance itself by analysing its data and obtain solutions to counter its high operational costs.
+
In addition, another underlying motivation for this research is due to the lack of relevant skillset in the current job market. To be more specific, in order to derive a VRS, it is imperative of the developer to possess domain expertise in the geospatial and computer science field. However, the lack of low-cost solutions in the market seems to indicate a possibility that such skills are scarce. Besides, given the team dynamics, it seems that the team is well-poised to derive a VRS for the SMEs.
 
</div>
 
</div>
  
 
<!--- End Motivation Content -->
 
<!--- End Motivation Content -->
  
<!-- Start of Data Provided Content-->
+
<!-- Start of Project Objective & Goal Content -->
 
{| width=1080px cellspacing="0" cellpadding="7"  
 
{| width=1080px cellspacing="0" cellpadding="7"  
| style="background-color:#235778; font-weight: bold; text-indent: 15px; border-left: 15px solid #4BB9FF" | <font color="#FFF" size=3 > Data Provided </font>
+
| style="background-color:#235778; font-weight: bold; text-indent: 15px; border-left: 15px solid #4BB9FF" | <font color="#FFF" size=3 >Project Objective & Goal </font>
 
|}
 
|}
 
<br />
 
<br />
<div style="padding-left:70px; text-align: justify; width:980px;">
+
<div style="padding-left:70px; text-align: left; width:980px;">
The data provided is obtained from Company ABC’s database server and this database is updated whenever parcels are being received and delivered. The data given will be 2 years’ worth of delivery data in the Central and Western parts of Singapore and this adds up to approximately 750,000 rows of data points. It will contain details such as the delivery address, quality, date, etc…
+
Like all business models, operating with such a business model is not without its flaws. This form of business model has resulted in the company being susceptible to the integrity and capabilities of external parties – Temporary Drivers. As such, even if the company were to know the number of parcels to deliver in the following day, it might still be unable to accurately determine the number of Temporary Drivers needed. While the company has pre-existing solutions in solving this issue, they are still interested in exploring alternatives that will help improve their capabilities in this aspect.
 +
<br /><br />
 +
In response, the team suggests that the company can engage in a systematic method to guide its decision on deciding the number of Temporary Drivers to employ. This could be accomplished by utilising a “Logistics Application” – Vehicle Routing Software (VRS), which is able to provide delivery routing details such as the time and route taken when a predetermined number of drivers is hired to complete the days’ worth of delivery. Ultimately, when the company is armed with these details, the company would not be at the mercy of its external contractors.
 
</div>
 
</div>
  
<!--- End Data Provided Content -->
+
<!--- End Objective Content -->
  
<!-- Start of Project Objective & Goal Content -->
+
<!-- Start of System Architecture Content -->
 
{| width=1080px cellspacing="0" cellpadding="7"  
 
{| width=1080px cellspacing="0" cellpadding="7"  
| style="background-color:#235778; font-weight: bold; text-indent: 15px; border-left: 15px solid #4BB9FF" | <font color="#FFF" size=3 >Project Objective & Goal </font>
+
| style="background-color:#235778; font-weight: bold; text-indent: 15px; border-left: 15px solid #4BB9FF" | <font color="#FFF" size=3 >System Architecture Design</font>
 
|}
 
|}
 
<br />
 
<br />
 
<div style="padding-left:70px; text-align: left; width:980px;">
 
<div style="padding-left:70px; text-align: left; width:980px;">
Based on the problem statement our sponsor have given to us, we have derived 3 main objectives for this project. The 3 objectives are:
+
<p style="padding-left: 30px">
# Identify other possible ways to minimise operational costs for the company
+
This system architecture contains the interaction between various software and libraries used in deriving the model. The specific details pertaining to each of these components will be explored in the next section.
# Identify the optimal number of Drivers that Company ABC would require
+
</p>
# Minimise failed delivery by identifying erroneous forms before goods are being dispatched
+
[[Image:TWO System architecture.png | 800px | center]]
The objectives and problems listed can be summarised as following :
+
 
[[Image:T.W.O Objective.JPG | 700px ]]
 
 
</div>
 
</div>
  
<!--- End Objective Content -->
+
<!--- End System Architecture Content -->
  
 
<!-- Start of Methodology Content -->
 
<!-- Start of Methodology Content -->
 
{| width=1080px cellspacing="0" cellpadding="7"  
 
{| width=1080px cellspacing="0" cellpadding="7"  
| style="background-color:#235778; font-weight: bold; text-indent: 15px; border-left: 15px solid #4BB9FF" | <font color="#FFF" size=3 >Methodology </font>
+
| style="background-color:#235778; font-weight: bold; text-indent: 15px; border-left: 15px solid #4BB9FF" | <font color="#FFF" size=3 >Methodology - Framework</font>
 
|}
 
|}
 
<br />
 
<br />
 +
[[Image:TWO Framework.png | 800px | center]]
 
<div style="padding-left:70px; text-align: justify; width:980px;">
 
<div style="padding-left:70px; text-align: justify; width:980px;">
To provide operational recommendations from the given dataset, we will thoroughly examine the dataset via the following four-step approach:
+
1.  <b>Inputs</b> <br/>
<div style="padding-left: 30px">
+
<p>
1.  <b>Data Exploratory</b> <br/>
+
<u>OpenStreetMap</u> - This is a collaborative project dedicated to creating free and editable map of the world. The main motivation underlying this project is to solve the issue of unavailability of map information across the world. OpenStreetMap attempts to do so by tapping on the advent of inexpensive portable satellite navigation devices. The open-sourced map data is collected from scratch by volunteers worldwide who utilised tools such as a handheld GPS unit, voice recorder or notebook. In addition to Pedestrian Mapping, it has also expanded its functionality to include Road and Cycling Networks which has proved to be useful to many. This data can be viewed on multiple softwares such as OsmAnd and Maps.me.  Moreover, these data can be downloaded into user’s workspace and can be accessed offline. This allows flexibility in less financially capable SME who has no constant access to internet and hence, a strong reason for the adoption of this mapping tool.
<p style="padding-left: 30px">
+
More critically, OpenStreetMap uses actual vector data which provides routing capabilities. As a result of the underlying vectors in OpenStreetMap, it will be extremely pertinent in building a routing application.
As the dataset is provided in Excel format, little data preparation is required by the team. Following which, the team would use methods such as summary statistics, to determine if there are any inconsistencies, missing and invalid values in the dataset.  
 
 
</p>
 
</p>
2.  <b>Data Cleaning</b> <br/>
+
<p>
<p style="padding-left: 30px">
+
<u>oneMap </u> - This is an integrated online geospatial platform that provides location-based information. It’s open-sourced API has many functions, amongst which are its ability to allow users to geocode locations based on postal code. Besides being open-sourced, this API is developed by the Singapore Land Authority which adds credibility to its capabilities.  
As errors such as outliers and invalid values could lead to inaccurate results, the data must be cleaned to ensure that it is suitable for further analysis. Based on the dataset, the two most probable data errors are <u> inconsistency data </u> and <u>missing or invalid values</u>.
 
 
</p>
 
</p>
  
3.  <b>Data Analysis</b><br/>
+
2.  <b>Database Tier</b> <br/>
 +
<p>
 +
<u>PostgreSQL</u> - Also known as Postgres, this is an object-relational, enterprise database management system, whose primary functions are to stores data securely and returns the data in response to requests from other software programs. This open-sourced software is jointly developed by individuals and diverse group of companies who are also known as the PostgreSQL Global Development Group. Its developers have created it to possess the capabilities of handling workloads ranging from single-machine applications to large web-facing applications with many concurrent users. In addition, its SQL-specification conformance and data integrity features allow only the strictest ways to interact with the database, much to the delight of security fans. Due to these capabilities, many industries adopt this database management such as Telecom, Media and e-Commerce.
 +
</p>
 +
<p>
 +
<u>PostGIS</u> - This is an open-sourced geospatial extension of PostgreSQL.  It follows the simple features for SQL specification from the Open Geospatial Consortium and turns PostgreSQL into a spatial database by adding 3 key features and they are: - 1) Spatial Types, 2) Spatial Indexes, 3) Spatial Functions. As these functions are built on PostgreSQL, a key advantage in using this extension is how it automatically inherits important “enterprise” features from PostgreSQL such as crash recovery, hot backup and replication. As a result of this advantage, there are numerous companies such as Uber who utilise heavily on this extension. PostGIS has a full list of case studies in which users has successfully manged to commercialise the use of PostGIS.
 +
</p>
  
<p style="padding-left: 30px">
+
3. <b>Application Tier</b> <br/>
After cleaning up the relevant data, an in-depth analysis will be performed on the data to gain meaningful insights. Based on preliminary discussion, we will be looking into these 4 analytical methods in analysing the data:
+
<p>
<div style="padding-left: 50px">a) <b>Time series analysis</b> – As the data provided contains time-series variables, the team will be performing Time Series Analysis on the data. This will allow the team to identify many trends such as those pertaining to the number of  Drivers. This would then aid the team in forecasting the optimal number of drivers required for future deliveries. </div><br/>
+
<u>pgRouting</u> - This is an extension to PostGIS/ PostgreSQL geospatial database and provide geospatial routing functionality based on cost metrics. Besides the shortest route algorithm by Dijkstra, pgRouting also contain other routing algorithms such as All Pairs Shortest Path – Johnson’s Algorithm and Floyd-Warshall Algorithm. These routing algorithms are jointly created by individuals and corporations and have been utilised extensively such as in the case of emergency management or traffic management.  
<div style="padding-left: 50px">b) <b>Frequency distribution & Maximum likelihood </b> – During the data cleaning process, the causes for invalid values will be recorded. Frequency distribution will be used to determine the frequency of each causes. After which, maximum likelihood will be performed to identify which reasons contributed most significantly to the invalid values. These factors will be analysed in greater depth as they are the primary reasons for a failed delivery attempt. </div><br/>
 
<div style="padding-left: 50px">c) <b>Cluster Analysis </b> – By using variables such as delivery area, number of parcel and size of parcel, the team will be able to profile its consumer segments and analyse how it changes over time. With this information, the team can then better estimate the number of drivers required for each region. This will help the company to optimise the number of driver required for each area, and potentially reduce the operation costs. </div><br/>
 
<div style="padding-left: 50px">d) <b>Correlation & Regression </b> – Correlation Analysis will be performed to identify the relationship between explanatory variables. Besides correlating the variables, the team will also attempt to perform regression on explanatory variables against outcome variables. This analysis will allow the team to determine the relationship between the variables and derive many conclusions pertaining to operational costs. For example, the team will be able to understand how variables such as quantity and weight of parcel can significantly impact operation cost. </div><br/>
 
 
</p>
 
</p>
  
4.  <b>Data Visualisations</b><br/>
+
4.  <b>Client Interface Tier</b> <br/>
<p style="padding-left: 30px">
+
<p>
Lastly, we will also be looking at creating a dashboard with the following visualisations which will ultimately help the team present its recommendation. Some of the visualisations that will be derived are:
+
<u>QGIS</u> - This is an open-sourced geographic information system to create, edit, visualize, analyze and publish geospatial information.
<div style="padding-left: 50px">a) <b>Spider Chart </b> – To visualise the reason that contribute to data inconsistency  </div><br/>
+
</p>
<div style="padding-left: 50px">b) <b>Time Series Line Chart </b> – To forecast the optimal number of drivers needed in the future based on past data </div><br/>
+
<p>
<div style="padding-left: 50px">c) <b>Heatmap </b> – To identify the number of drivers needed at the various location </div><br/>
+
<u>R’s Shiny</u> - This is open-sourced package available through R that possess interactive capabilities allowing users to build web apps quickly and efficiently.
 
</p>
 
</p>
</div>
+
 
 
</div>
 
</div>
  
 
<!--- End Methodology Content -->
 
<!--- End Methodology Content -->
  
<!-- Start of Technology Used Content -->
 
{| width=1080px cellspacing="0" cellpadding="7"
 
| style="background-color:#235778; font-weight: bold; text-indent: 15px; border-left: 15px solid #4BB9FF" | <font color="#FFF" size=3 >Technology Used </font>
 
|}
 
<br />
 
<div style="padding-left:70px; text-align: left; width:980px;">
 
[[Image:T.W.O Toolused.png | 900px | center]]
 
</div>
 
 
<!--- End Technology Content -->
 
 
</center>
 
</center>

Latest revision as of 00:20, 16 April 2018

T.W.O Banner.png
HOME PROJECT OVERVIEW ANALYSIS & INSIGHTS PROJECT MANAGEMENT DOCUMENTATION ANLY482 MAIN


Final Initial
Motivation


The lack of use of a VRS to provide delivery routing solutions is one that resonates strongly with some SME companies operating within the logistics sector. A possible reason is due to the high cost associated with the purchase of such a software. To illustrate, Paragon – a VRS provider in the United Kingdom, revealed that a system designed for a Delivery Service Company with a fleet size of 100 vehicles can expect license fees of up to £50,000. To exacerbate this situation, the listed price has yet to include additional cost such as the training cost and the maintenance fee that is usually required with the use of the software. This example serves to exemplify the high cost associated with a VRS and it can be observed that this is not in line with the purchasing capabilities of a typical SME. Thus, the team believes that it is even more crucial to identify and create low-cost solutions so as to ensure SMEs, too, have vehicle routing tools at their fingertips.

In addition, another underlying motivation for this research is due to the lack of relevant skillset in the current job market. To be more specific, in order to derive a VRS, it is imperative of the developer to possess domain expertise in the geospatial and computer science field. However, the lack of low-cost solutions in the market seems to indicate a possibility that such skills are scarce. Besides, given the team dynamics, it seems that the team is well-poised to derive a VRS for the SMEs.


Project Objective & Goal


Like all business models, operating with such a business model is not without its flaws. This form of business model has resulted in the company being susceptible to the integrity and capabilities of external parties – Temporary Drivers. As such, even if the company were to know the number of parcels to deliver in the following day, it might still be unable to accurately determine the number of Temporary Drivers needed. While the company has pre-existing solutions in solving this issue, they are still interested in exploring alternatives that will help improve their capabilities in this aspect.

In response, the team suggests that the company can engage in a systematic method to guide its decision on deciding the number of Temporary Drivers to employ. This could be accomplished by utilising a “Logistics Application” – Vehicle Routing Software (VRS), which is able to provide delivery routing details such as the time and route taken when a predetermined number of drivers is hired to complete the days’ worth of delivery. Ultimately, when the company is armed with these details, the company would not be at the mercy of its external contractors.


System Architecture Design


This system architecture contains the interaction between various software and libraries used in deriving the model. The specific details pertaining to each of these components will be explored in the next section.

TWO System architecture.png


Methodology - Framework


TWO Framework.png

1. Inputs

OpenStreetMap - This is a collaborative project dedicated to creating free and editable map of the world. The main motivation underlying this project is to solve the issue of unavailability of map information across the world. OpenStreetMap attempts to do so by tapping on the advent of inexpensive portable satellite navigation devices. The open-sourced map data is collected from scratch by volunteers worldwide who utilised tools such as a handheld GPS unit, voice recorder or notebook. In addition to Pedestrian Mapping, it has also expanded its functionality to include Road and Cycling Networks which has proved to be useful to many. This data can be viewed on multiple softwares such as OsmAnd and Maps.me. Moreover, these data can be downloaded into user’s workspace and can be accessed offline. This allows flexibility in less financially capable SME who has no constant access to internet and hence, a strong reason for the adoption of this mapping tool. More critically, OpenStreetMap uses actual vector data which provides routing capabilities. As a result of the underlying vectors in OpenStreetMap, it will be extremely pertinent in building a routing application.

oneMap - This is an integrated online geospatial platform that provides location-based information. It’s open-sourced API has many functions, amongst which are its ability to allow users to geocode locations based on postal code. Besides being open-sourced, this API is developed by the Singapore Land Authority which adds credibility to its capabilities.

2. Database Tier

PostgreSQL - Also known as Postgres, this is an object-relational, enterprise database management system, whose primary functions are to stores data securely and returns the data in response to requests from other software programs. This open-sourced software is jointly developed by individuals and diverse group of companies who are also known as the PostgreSQL Global Development Group. Its developers have created it to possess the capabilities of handling workloads ranging from single-machine applications to large web-facing applications with many concurrent users. In addition, its SQL-specification conformance and data integrity features allow only the strictest ways to interact with the database, much to the delight of security fans. Due to these capabilities, many industries adopt this database management such as Telecom, Media and e-Commerce.

PostGIS - This is an open-sourced geospatial extension of PostgreSQL. It follows the simple features for SQL specification from the Open Geospatial Consortium and turns PostgreSQL into a spatial database by adding 3 key features and they are: - 1) Spatial Types, 2) Spatial Indexes, 3) Spatial Functions. As these functions are built on PostgreSQL, a key advantage in using this extension is how it automatically inherits important “enterprise” features from PostgreSQL such as crash recovery, hot backup and replication. As a result of this advantage, there are numerous companies such as Uber who utilise heavily on this extension. PostGIS has a full list of case studies in which users has successfully manged to commercialise the use of PostGIS.

3. Application Tier

pgRouting - This is an extension to PostGIS/ PostgreSQL geospatial database and provide geospatial routing functionality based on cost metrics. Besides the shortest route algorithm by Dijkstra, pgRouting also contain other routing algorithms such as All Pairs Shortest Path – Johnson’s Algorithm and Floyd-Warshall Algorithm. These routing algorithms are jointly created by individuals and corporations and have been utilised extensively such as in the case of emergency management or traffic management.

4. Client Interface Tier

QGIS - This is an open-sourced geographic information system to create, edit, visualize, analyze and publish geospatial information.

R’s Shiny - This is open-sourced package available through R that possess interactive capabilities allowing users to build web apps quickly and efficiently.