Difference between revisions of "Group08 proposal"

From ISSS608-Visual Analytics and Applications
Jump to navigation Jump to search
(Nav bar)
Line 1: Line 1:
 +
<!----------- template based on : https://wiki.smu.edu.sg/1718t3isss608/Group08_Proposal ------------>
 
<!----------- Main Header ------------>  
 
<!----------- Main Header ------------>  
 +
 
<div style="background:#E4EBF0; padding:24px; text-align:center;">  
 
<div style="background:#E4EBF0; padding:24px; text-align:center;">  
 
<font size = 8; color="#176585"><span style="font-family:Segoe UI Light;">Re-imagining Bus Transport Network in Singapore</span></font>
 
<font size = 8; color="#176585"><span style="font-family:Segoe UI Light;">Re-imagining Bus Transport Network in Singapore</span></font>
 
</div>
 
</div>
 
  
 
<!------------ Navigation bar ------------>  
 
<!------------ Navigation bar ------------>  
Line 27: Line 28:
  
  
<font size="5">'''Overview'''</font> <br/>
+
== <big>Overview</big> ==
Corn or Maize (as called in some countries) was first grown in ancient Central America. Corn has become a staple in many parts of the world, providing not only substances that we fill our belly with, but also act as the raw ingredient for corn ethanol, animal feed etc. The United States accounts for about 40% of production of corn in the world<sup>1</sup>, which makes it the largest corn producer. The major portion of production is found in the Midwestern states, such as Illinois, Iowa, Nebraska and Minnesota – these states were grouped and eventually became known as the ‘Corn Belt’. The Corn Belt has about 96,000,000 acres of land just for corn production. The states that make up the Corn Belt were selected due to leveled land, fertile and highly organic soils<sup>2</sup>. <br/>
+
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D">
 +
Bus rides in Singapore are just really slow, isn’t it? Do you ever have experiences where a bus ride that is supposed to be short and quick took way longer than expected? Are you frustrated that the bus stops at every stop even though there’s nobody boarding or alighting? And why do we have so many bus stops that almost nobody uses?
 +
 
 +
What if we can reimagine the public bus network in Singapore through data?
 +
 
 +
In this project, we will use data visualization techniques to map out all transportation nodes in Singapore, and re-propose a different way of organizing our bus services, which include bus stops, bus routes, and connectivity within subregion and from subregions to another subregion.
 +
</font></div></div>
 +
 
 +
== <big>Scope</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">  
 +
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D">
 +
The scope of the project is limited to public buses in Singapore.
 +
 
 +
In order to map out the pattern of transportation in Singapore, we will mainly be using datasets from LTA datamall (https://www.mytransport.sg/content/mytransport/home/dataMall.html). In addition, we may supplement the data with other relevant datasets, such as geographical socioeconomic data, land use data (industrial area, commercial area, residential area), population density, or weather data.  
 +
</font></div></div></div>
 +
 
 +
== <big>Approach</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<div align="justify">
 +
<font size = 3; color ="#0F334D
 +
">
 +
The app aims to provide policy makers with the following information:
 +
 
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Transportation Flow</font></div>===
 +
* Visualise the flow of people across the bus stops/planning subzones and the criticality of the bus stops/planning subzones to Singapore’s public bus transport network.
 +
 
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Travel Demand</font></div>===
 +
Estimate the travel demand: Volume of people expected to travel between a particular origin and destination via a particular route and mode of travel (switch bus/direct bus)
 +
 
 +
The app should also be able to:
 +
 
 +
* Show the Impediment Value of subregion of bus stops and planning subzones
 +
* Show the Degree of Centrality - The number of regions a region is connected via bus services
 +
* Show the Closeness Centrality of every bus stops to identify how easily each bus stop can reach another bus stop
 +
* Show the Betweenness Centrality of a bus stop as a connector or bridge between locations to locations
 +
* Show the Connectivity of a bus stop based on the region the area is connected to. (i.e. higher frequency of buses per hour, higher connectivity)
 +
</font></div></div>
 +
 
 +
== <big>Outcome</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<div align="justify">
 +
<font size = 3; color ="#0F334D
 +
">
 +
Where the visualisation could have useful practical implications to inform decision makers on policy making decisions in order to:
 +
 
 +
* Optimise bus routes to improve route utilisation. Reduce number of bus stops a service stops at to reduce congestion
 +
* Optimise bus routes which could help to reduce congestion along certain bus stops
 +
* Planning of bus stops, where should we place bus stops in order to maximize overall utility
 +
* Plan the frequency of buses at certain times to minimize bus wait time and maximize throughput
 +
* Advice city planners with regards to transportation flow in congested areas
 +
 
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Data Source</font></div>===
 +
We will primarily be using data from LTA Data Mall. Data is not publically available but available upon a written request. For this project, we will need to write a script in order to make an API call to extract the data we need.
 +
 
 +
Data includes Live data as well as Historical data.
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Bus Arrival</font></div>===
 +
Live data. Returns real-time Bus Arrival information for Bus Services at a queried Bus Stop, including: Estimated Time of Arrival (ETC), Estimated Location, Load information (how crowded the bus is).
 +
 
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Bus Services</font></div>===
 +
Returns detailed service information for all buses currently in operation, including: first stop, last stop, peak / off peak frequency of dispatch.
 +
 
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Bus Route</font></div>===
 +
Returns detailed route information for all services currently in operation, including: all bus stops along each route, first/last bus timings for each stop.
 +
 
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Bus Stops</font></div>===
 +
Returns detailed information for all bus stops currently being services by buses, including: Bus Stop Code, location coordinates.
 +
 
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Passenger Volume by Bus Stops</font></div>===
 +
Returns tap in and tap out passenger volume by weekdays and weekends for individual bus stop.
 +
 
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Passenger Volume by Origin Destination Bus Stops</font></div>===
 +
Returns number of trips by weekdays and weekends from the origin to destination bus stops.
 +
</font></div></div></div>
  
Corn has been known to be able to grow in a wide range of climatic conditions, hence it would be a challenge to set precise conditions for corn production. However, there is still a limit of this wide window of conditions, such as corn is grown mostly in tropical latitudes, corn has a cold limit of 19°C, corn grows the best in warm temperatures between 21°C to 27°C and the growing season to grow hovers between 120 -180 days<sup>3</sup>. Hence, breeders have been experimenting with various types of corn hybrids, each of them specifically created to have high yield despite the environment it is planted in. Over the years, the farmers have been using trial and error method to identify the best hybrids to plant by planting each of these hybrids in different locations with different environmental factors; this process has been proven to be slow and not very effective<sup>4</sup>. This project aims to explore the meteorology and geographical factors that makes a corn, the a-maize-ing crop that we know today, which would benefit the corn breeder greatly. <br> <br>
+
== <big>Visualization Feature</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<div align="justify">
 +
<font size = 3; color ="#0F334D
 +
">
 +
The visualization that we are trying to build is <b>graphical</b> and <b>geospatial</b> in nature. Bus stop will become a node and the bus route will become the edge between nodes.  
  
=Scope=
+
Examples:
The scope of the project is limited to the corn produced in USA, specifically from the Corn Belt regions. Due to time constraint, we will analyse at the ‘Environment’ level (aggregated) instead of ‘Hybrid’ level. Each Environment can be treated as a plantation, where each Environment has various numbers of hybrids being planted. All years are taken into consideration, with a focus on the individual growing season of each environment (around Mar/April to September). We will also limit the environmental factors to the following: <br>
 
# Precipitation <br>
 
# Exposure length to sun (Radiation) <br>
 
# Average Temperature <br>
 
# Location of where the hybrid is planted  <br>
 
  
=Objective=
+
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Geospatial Flow Chart</font></div>===
The first objective of this project is to visualise our dataset:  
+
image here
# '''Weather Data''': Precipitation, Average Temperature,Length of (sun) Radiation over the years <br>  
 
# '''Performance Data''': Average Yield by State, by Plantation. However after skimming through our data, about 45% of our plantations are only used once. Hence we do not have enough data for us to do a time-series analysis. Instead, we will be doing cross-sectional analysis, where we will analyse and visualise our data year by year. <br>
 
  
The second objective is to predict the yield of a plantation, given the soil conditions and topography of the plantation. The model that we are going to implement is Geo-weighted Regression (GWR) Model. This is will be further elaborated on in the Methodology section. <br>
+
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Visualizing Connectivity</font></div>===
 +
image here
  
=Data Source=
+
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Finding centrality</font></div>===
This data from the 'Syngenta Crop Challenge 2019'. [https://www.ideaconnection.com/syngenta-crop-challenge/challenge.php#datasets Click here to see the data.]<br>
+
image here
  
==Performance Data==
+
</font></div></div></div>
This dataset is the main dataset: we have both the individual hybrid yield, as well as the average yield (of several hybrids) of a particular environment. We have '''5,382''' unique hybrids, with '''579''' unique environments. The table below shows the main variables that we will be using for our analysis: <br>
 
{| class="wikitable"
 
|-
 
! Variable !! Description
 
|-
 
| <b><i>YEAR</i></b>|| Year grown
 
|-
 
| <b><i>HYBRID_ID</i></b>|| Identifier for the tested hybrid
 
|-
 
| <b><i>ENV_ID</i></b>|| Identifier for the tested location and year
 
|-
 
| <b><i>YIELD</i></b>|| Yield of tested hybrid in tested location (quintiles/hectare)
 
|-
 
| <b><i>ENV_YIELD_MEAN</i></b>|| Average Yield of tested location
 
|-
 
| <b><i>LAT, LONG</i></b>|| Latitude and Longitude of tested location to nearest 0.1 degree
 
|-
 
| <b><i>ELEVATION</i></b>|| Elevation of field at tested location
 
|-
 
| <b><i>PLANT_DATE</i></b>|| Date the hybrid was planted
 
|-
 
| <b><i>HARVEST_DATE</i></b>|| Date the hybrid was harvested
 
|-
 
| <b><i>CLAY, SILT, SAND, AWC, pH, etc</i></b>|| Properties of soil at tested location
 
|}
 
==Weather Data==
 
This dataset is the supporting dataset: this has all the environmental data for each environment for 365 days (each). There are three important things to note: <br>
 
# We will only use the days that corresponding each of the growing season of that environment, not the data for all 365 days. <br>
 
# Same location (one set of Long-Lat coordinates) may have various ENV_IDs. For example, level of precipitation at Location A on 8<sup>th</sup> January 2013 and on 8<sup>th</sup> January 2015 would be different, hence different ENV_ID despite being at the same location. <br>
 
# One ENV_ID correspond to One plantation. In other words, one plantation could have multiple ENV_ID, but never at the same year. <br>
 
The table below shows the main variables that we will be using for our analysis: <br>  
 
{| class="wikitable"
 
|-
 
! Variable !! Description
 
  
|-
+
== <big>Methodology</big> ==
| <b><i>ENV_ID</i></b>|| Identifier for the tested location and year (same one as in Performance Data)
+
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">  
|-
+
<div style="font-family:Segoe UI;">
| <b><i>DAY_NUM</i></b>|| Day number within year of weather variables (365 days)
+
<div align="justify">
|-
+
<font size = 3; color ="#0F334D
| <b><i>DAYL</i></b>|| Day length (seconds)
+
">
|-
 
| <b><i>PREC</i></b>|| Precipitation (mm)
 
|-
 
| <b><i>SRAD</i></b>|| Solar radiation (W/m<sup>2</sup>)
 
|-
 
| <b><i>TMAX</i></b>|| Maximum temperature (degrees Celsius)
 
|-
 
| <b><i>TMIN</i></b>|| Minimum temperature (degrees Celsius)
 
|}
 
=Visualisations=
 
For our '''Weather Data''', we will aggregate the environments into the states that they are in, and then we will be implementing geofacet time series. This would give an overview of how the weather is like over the years for that particular state. However, we know that Nature knows no boundary, hence we will plot Isohyetal (Preciptation) and Isothermal (Average Temperature) Maps to see how the distribution is like over the Corn Belt. <br>  <br>
 
For our '''Performance Data''', we will be implementing isoline graphs to visualise the yield by location, by plantation. This would give a good visualisation on which part of the corn belt has better yield. <br> <br>
 
Our prediction model would be done using '''GWmodel''' package to implement GWR model, which will be discussed in the following section. Our GWR model would estimate the relationships between our independent variable (yield) with our dependent variables (soil, Drought-resistant, Longitude, Latitide, Elevation etc), which eventually could be used by corn breeders in predicting the yield of a plantation, given a certain set of environmental factors. <br>
 
  
=Methodology: Geo-weighted Regression =
+
We started this project with a broad question in mind, ‘How can we improve the bus transportation system in Singapore’. The methodology of the project is iterative in nature, we will build broad visualization, identify areas to deep dive and propose solutions for these issues.  
Geo-weighted Regression (GWR) explores spatially varying relationships between the dependent and independent variables at location 𝑖, where  𝑢<sub>𝑖</sub> and 𝑣<sub>𝑖</sub> are the coordinates of 𝑖. As data are geographically weighted, nearer observations have more influence in determining the local set of regression coefficients.
 
Any GWR model will return these 2 important parameters: Estimates and corresponding t-value. From the estimate value, we can see the correlation between that observation with the dependent variable. If the value is positive, means it is positively correlated, and vice versa. With the given t-values, we will convert to the corresponding p-values in R to see significance. The t-value is specific thing for a specific statistical test, whereas the p-value gives the statistical significance level. We take 5% significant level for our analysis.<br>
 
  
There are two parameters that are important to GWR Model: Kernel and Bandwidth.
+
<b>Data Extraction </b> - Requesting for access to data from LTA and building API interface to extract data
 +
<b>Preliminary study</b> - Reading up on existing work done on Singapore bus transportation network and understand transportation engineering models
 +
<b>First phase analysis and visualization</b> - Building of high level visualizations to clearly show status quo and areas for improvement
 +
<b>Second phase analysis and visualization</b> - Deep dive into issues and identify solutions
 +
<b>Implementation</b> - Build R-Shiny app and report
  
==Kernels for GWR==
+
Solution may include graphical analysis, geospatial analysis in the realm of transport engineering such as Gravity Model, Network analysis, modelling Centrality.  
[[File:Kernels GWR.png|250px|thumb|]]
+
</font></div></div>
There are 6 kernel functions available: <br>  
 
  
The global GWR model gives equal weightage to all observations, which is actually just a normal linear regression. The other kernel functions are used to determine the weightage of an observation by calculating the distance between 2 observations. Gaussian and Exponential kernels are continuous functions of the distance between 2 observations. The weightage will be maximum at a GW model calibrated point (d=0), and decrease accordingly to its function. These two functions give lesser weightage to observations beyond the bandwidth. Boxcar, Bisquare and Tricube kernels are discontinuous functions, giving zero weightage to observations greater than the bandwidth. Boxplot gives equal weightage to observations within the bandwidth, whereas Bisqaure and Tricube give decreasing weightage until the bandwidth. <br><br>  
+
== <big>Team Members</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<div align="justify">
 +
<font size = 3; color ="#0F334D
 +
">
  
We will build both Global and Local GWR models in this project.
+
* Chan Jia Yi - https://www.linkedin.com/in/jiayi-chan123456/
 +
* Koh Yong Shan - https://www.linkedin.com/in/yongshan-koh/
 +
* Lee Meng Yong - https://www.linkedin.com/in/mylee1/
 +
</font></div></div>
  
==Bandwidth==
+
== <big>Tools & Packages</big> ==
The bandwidth gives the range where the kernel will be applied on the each observation. The smaller the bandwidth, the smaller the range.
+
<div style="font-family:Segoe UI Light; font-size:100%; padding: 20px 0px 0px 15px;">  
=Tools & Packages=
 
These are some of the tools that we will be using: <br>
 
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Package!! Useage of Package
+
| Land Transport Datamall Documentation - ''published by LTA'' || https://www.mytransport.sg/content/mytransport/home/dataMall.html
 
 
 
|-
 
|-
| <b><i>lubridate</i></b>|| Cleaning Data
+
| Spatial Network Analysis of Public Transport Systems - ''published by Data2X'' || https://www.researchgate.net/publication/254746478_Spatial_Network_Analysis_of_Public_Transport_Systems_Developing_a_Strategic_Planning_Tool_to_Assess_the_Congruence_of_Movement_and_Urban_Structure_in_Australian_Cities
 
|-
 
|-
| <b><i>tidyverse</i></b>|| Cleaning Data
+
| Weighted complex network analysis of travel route in Singapore public transport system- ''published by NUS'' || https://www.comp.nus.edu.sg/~wongls/psZ/xiuju-lta10.pdf
 
|-
 
|-
| <b><i>ggplot2</i></b>|| General Graph Plots
+
| Graphical visualisation of flows - ''published by Flows Mag'' || https://www.flowsmag.com/2017/05/09/the-graphic-visualisation-of-flows/
 
|-
 
|-
| <b><i>geofacet</i></b>|| Visualising both Weather Data and Performance Data
+
| Rerouting Buses using Data Science - Part I- ''published by Govtech Singapore'' || https://blog.data.gov.sg/rerouting-buses-using-data-science-part-i-4d6c9d4f1f
 
|-
 
|-
| <b><i>tmap, gstat, sp, sf, rgdal, rgeos:raster</i></b>|| Visualising isolines for both Weather Data and Performance Data
+
| Modelling the public transport network - Part II- ''published by Govtech Singapore'' || https://blog.data.gov.sg/modelling-the-public-transport-network-part-ii-a6da2f3bd28c
 
|-
 
|-
| <b><i>GWmodel</i></b>|| GWR model for Performance Data
+
| How Govtech simulates four million bus rides a day- ''published by Govtech Singapore'' || https://www.tech.gov.sg/media/technews/how-govtech-simulates-four-million-bus-rides-a-day
 
|-
 
|-
| <b><i>shiny</i></b>|| Dashboard Design
+
| Journey to the end of the line - SMU MITB Project, Group 2 T17/18- ''published by SMU MITB'' || https://wiki.smu.edu.sg/1718t3isss608/Group25_Analysis
 
|-
 
|-
| <b><i>shinydashboard</i></b>|| Dashboard Design
+
| Interactive Web Maps with R- ''published by R Studio'' || https://blog.rstudio.com/2015/06/24/leaflet-interactive-web-maps-with-r/
|-
+
 
| <b><i>shinythemes</i></b>|| Dashboard Design
 
 
|}
 
|}
 
+
</div>
=Reference=
 
The image for the banner was tken from https://iegvu.agribusinessintelligence.informa.com/CO215920/South-Africa-corn-planting-plummets. <br>  
 
Kernel image is from Gollini, I., Lu, B., Charlton, M., Brunsdon, C., & Harris, P. (2013). GWmodel: an R package for exploring spatial heterogeneity using geographically weighted models. arXiv preprint arXiv:1306.0413. <br>
 
 
 
[1] Olson, R. A., & Sander, D. H. (1988). Corn production. Corn and corn improvement, (cornandcornimpr), 639-686. <br>
 
[2] Smith, C. W. (2004). Corn: origin, history, technology, and production (Vol. 4). John Wiley & Sons.<br>
 
[3] Shaw, R. H. (1988). Climate requirement. Corn and corn improvement, (cornandcornimpr), 609-638. <br>
 
[4] https://www.ideaconnection.com/syngenta-crop-challenge/challenge.php
 

Revision as of 12:46, 27 February 2020


Re-imagining Bus Transport Network in Singapore

Proposal

Poster

Application

Report

Back to Main ↗


Overview

Bus rides in Singapore are just really slow, isn’t it? Do you ever have experiences where a bus ride that is supposed to be short and quick took way longer than expected? Are you frustrated that the bus stops at every stop even though there’s nobody boarding or alighting? And why do we have so many bus stops that almost nobody uses?

What if we can reimagine the public bus network in Singapore through data?

In this project, we will use data visualization techniques to map out all transportation nodes in Singapore, and re-propose a different way of organizing our bus services, which include bus stops, bus routes, and connectivity within subregion and from subregions to another subregion.

Scope

The scope of the project is limited to public buses in Singapore.

In order to map out the pattern of transportation in Singapore, we will mainly be using datasets from LTA datamall (https://www.mytransport.sg/content/mytransport/home/dataMall.html). In addition, we may supplement the data with other relevant datasets, such as geographical socioeconomic data, land use data (industrial area, commercial area, residential area), population density, or weather data.

Approach

The app aims to provide policy makers with the following information:

Transportation Flow

  • Visualise the flow of people across the bus stops/planning subzones and the criticality of the bus stops/planning subzones to Singapore’s public bus transport network.

Travel Demand

Estimate the travel demand: Volume of people expected to travel between a particular origin and destination via a particular route and mode of travel (switch bus/direct bus)

The app should also be able to:

  • Show the Impediment Value of subregion of bus stops and planning subzones
  • Show the Degree of Centrality - The number of regions a region is connected via bus services
  • Show the Closeness Centrality of every bus stops to identify how easily each bus stop can reach another bus stop
  • Show the Betweenness Centrality of a bus stop as a connector or bridge between locations to locations
  • Show the Connectivity of a bus stop based on the region the area is connected to. (i.e. higher frequency of buses per hour, higher connectivity)

Outcome

Where the visualisation could have useful practical implications to inform decision makers on policy making decisions in order to:

  • Optimise bus routes to improve route utilisation. Reduce number of bus stops a service stops at to reduce congestion
  • Optimise bus routes which could help to reduce congestion along certain bus stops
  • Planning of bus stops, where should we place bus stops in order to maximize overall utility
  • Plan the frequency of buses at certain times to minimize bus wait time and maximize throughput
  • Advice city planners with regards to transportation flow in congested areas

Data Source

We will primarily be using data from LTA Data Mall. Data is not publically available but available upon a written request. For this project, we will need to write a script in order to make an API call to extract the data we need.

Data includes Live data as well as Historical data.

Bus Arrival

Live data. Returns real-time Bus Arrival information for Bus Services at a queried Bus Stop, including: Estimated Time of Arrival (ETC), Estimated Location, Load information (how crowded the bus is).

Bus Services

Returns detailed service information for all buses currently in operation, including: first stop, last stop, peak / off peak frequency of dispatch.

Bus Route

Returns detailed route information for all services currently in operation, including: all bus stops along each route, first/last bus timings for each stop.

Bus Stops

Returns detailed information for all bus stops currently being services by buses, including: Bus Stop Code, location coordinates.

Passenger Volume by Bus Stops

Returns tap in and tap out passenger volume by weekdays and weekends for individual bus stop.

Passenger Volume by Origin Destination Bus Stops

Returns number of trips by weekdays and weekends from the origin to destination bus stops.

Visualization Feature

The visualization that we are trying to build is graphical and geospatial in nature. Bus stop will become a node and the bus route will become the edge between nodes.

Examples:

Geospatial Flow Chart

image here

Visualizing Connectivity

image here

Finding centrality

image here

Methodology

We started this project with a broad question in mind, ‘How can we improve the bus transportation system in Singapore’. The methodology of the project is iterative in nature, we will build broad visualization, identify areas to deep dive and propose solutions for these issues.

Data Extraction - Requesting for access to data from LTA and building API interface to extract data Preliminary study - Reading up on existing work done on Singapore bus transportation network and understand transportation engineering models First phase analysis and visualization - Building of high level visualizations to clearly show status quo and areas for improvement Second phase analysis and visualization - Deep dive into issues and identify solutions Implementation - Build R-Shiny app and report

Solution may include graphical analysis, geospatial analysis in the realm of transport engineering such as Gravity Model, Network analysis, modelling Centrality.

Team Members

Tools & Packages

Land Transport Datamall Documentation - published by LTA https://www.mytransport.sg/content/mytransport/home/dataMall.html
Spatial Network Analysis of Public Transport Systems - published by Data2X https://www.researchgate.net/publication/254746478_Spatial_Network_Analysis_of_Public_Transport_Systems_Developing_a_Strategic_Planning_Tool_to_Assess_the_Congruence_of_Movement_and_Urban_Structure_in_Australian_Cities
Weighted complex network analysis of travel route in Singapore public transport system- published by NUS https://www.comp.nus.edu.sg/~wongls/psZ/xiuju-lta10.pdf
Graphical visualisation of flows - published by Flows Mag https://www.flowsmag.com/2017/05/09/the-graphic-visualisation-of-flows/
Rerouting Buses using Data Science - Part I- published by Govtech Singapore https://blog.data.gov.sg/rerouting-buses-using-data-science-part-i-4d6c9d4f1f
Modelling the public transport network - Part II- published by Govtech Singapore https://blog.data.gov.sg/modelling-the-public-transport-network-part-ii-a6da2f3bd28c
How Govtech simulates four million bus rides a day- published by Govtech Singapore https://www.tech.gov.sg/media/technews/how-govtech-simulates-four-million-bus-rides-a-day
Journey to the end of the line - SMU MITB Project, Group 2 T17/18- published by SMU MITB https://wiki.smu.edu.sg/1718t3isss608/Group25_Analysis
Interactive Web Maps with R- published by R Studio https://blog.rstudio.com/2015/06/24/leaflet-interactive-web-maps-with-r/