Difference between revisions of "Group08 proposal"

From ISSS608-Visual Analytics and Applications
Jump to navigation Jump to search
 
(49 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<div style=background:#FFCC99 border:#FFCC99>
+
<!----------- template based on : https://wiki.smu.edu.sg/1718t3isss608/Group08_Proposal ------------>
[[File:Long bus.jpg|400px]]
+
<!----------- Main Header ------------>
<font size = 6; color="#000000">      Re-imagining Bus Transport Network in Singapore </font>
+
 
 +
<div style=background:#ffffff  border:#A3BFB1>
 +
[[File:Group_logo2.png|1000px|frameless|center]]
 
</div>
 
</div>
  
<!--NAV BAR -->  
+
<!------------ Navigation bar ------------>
{|style= background-color:"#FFCC99"; width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" border-width="10"|  
+
{|style="background-color:#ffffff;" width="100%" |
+
 
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFCC99; text-align:center;" width="16%" |
+
| style="font-family:Segoe UI Semibold; font-size:100%; text-align:center;border-bottom:solid #176585" width="16.6%" |
;
+
[[Group08_proposal | <font size = 4; color="#176585">Proposal</font>]]
[[ISSS608_Group07_Proposal|<font color="#000000">Proposal</font>]]  
+
 
 +
| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |
 +
[[Group08_poster | <font size = 4; color="#4180AB">Poster</font>]]
 +
 
 +
| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |
 +
[[Group08_application| <font size = 4; color="#4180AB">Application & User Guide</font>]]
  
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" |
+
| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |
;
+
[[Group08_research_paper | <font size = 4; color="#4180AB">Research Paper</font>]]
[[ISSS608_Group07_Poster| <font color="#000000">Poster</font>]]  
 
  
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" | 
+
| style="font-family:Segoe UI Light; font-size:100%; text-align:center;border-bottom:solid #BDD1DE" width="16.6%" |
;
+
[[Project_Groups | <font size = 4; color="#4180AB">Back to Main ↗</font>]]
[[ISSS608_Group07_Application| <font color="#000000">Application</font>]]
 
 
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" |
 
;
 
[[ISSS608_Group07_Report| <font color="#000000">Report</font>]]  
 
  
| style="font-family:’Helvetica’, font-size:100%; solid #0433ff; background:#FFDEAD; text-align:center;" width="16%" | 
 
;
 
[[Project_Groups| <font color="#000000">Return to All Projects</font>]]
 
 
 
|}
 
|}
 +
<!------------ End of navi bar ------------>
 +
 +
 +
== <big>Overview</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D">
 +
Singapore's public transport use rose to hit a record high in 2018, with a total of 7.54 million trips made on buses or trains each day.<ref>https://www.budgetdirect.com.sg/car-insurance/research/public-transport-singapore</ref>
 +
 +
Here's what may come across your mind : Do you ever have experiences where a bus ride that is supposed to be short and quick took way longer than expected? Are you frustrated that the bus stops at every stop even though there’s nobody boarding or alighting? And why do we have so many bus stops that almost nobody uses?
 +
 +
What if we can reimagine the public bus network in Singapore through data?
 +
 +
In this project, we will use data visualization techniques to map out all transportation nodes in Singapore, and re-propose a different way of organizing our bus services, which include bus stops, bus routes, and connectivity within subregion and from subregions to another subregion.
 +
</font></div></div>
 +
 +
== <big>Scope</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D">
 +
The scope of the project is limited to public buses in Singapore.
 +
 +
In order to map out the pattern of transportation in Singapore, we will mainly be using datasets from LTA datamall (https://www.mytransport.sg/content/mytransport/home/dataMall.html). In addition, we may supplement the data with other relevant datasets, such as geographical socioeconomic data, land use data (industrial area, commercial area, residential area), population density, or weather data.
 +
</font></div></div></div>
 +
 +
== <big>Approach</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D
 +
">
 +
We will use a primal approach to the analysis of movement networks, by treating intersections as nodes and street segments as edges.<ref>https://www.researchgate.net/publication/254746478_Spatial_Network_Analysis_of_Public_Transport_Systems_Developing_a_Strategic_Planning_Tool_to_Assess_the_Congruence_of_Movement_and_Urban_Structure_in_Australian_Cities</ref>
 +
 +
The app aims to provide policy makers with the following information:
 +
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Transportation Flow</font></div>===
 +
* Visualise the flow of people across the bus stops/planning subzones and the criticality of the bus stops/planning subzones to Singapore’s public bus transport network.
 +
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Travel Demand</font></div>===
 +
Estimate the travel demand: Volume of people expected to travel between a particular origin and destination via a particular route and mode of travel (direct bus)
 +
 +
The app should also be able to:
 +
 +
* Show the Degree of Centrality - The number of regions a region is connected via bus services
 +
* Show the Closeness Centrality of every bus stops to identify how easily each bus stop can reach another bus stop
 +
* Show the Betweenness Centrality of a bus stop on the shortest path between location to location
 +
* Show the Connectivity of a bus stop based on the region the area is connected to. (i.e. higher frequency of buses per hour, higher connectivity)
 +
 +
We have a few recommendations for the network assessment task:
 +
* Out of all possible paths in between 2 points in a network, the model we propose needs to discriminate in favour of paths that occurs in a minimum number of transfers between public buses.
 +
* When considering transfers along a path between 2 points in a network, the model needs to define consistent standards of number of nodes to be recommended as an interchange node.
 +
* Wherever a pair of nodes is connected by a minimum of two edges, the path with the lowest cumulative distance has been chosen, regardless of the number of transfers required.
 +
* Diffusing some travel demand away from its trunk routes and thus of achieving more geographically balanced passenger flows with fewer squeeze points for capacity
 +
 +
</font></div></div>
 +
 +
== <big>Outcome</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D
 +
">
 +
Where the visualisation could have useful practical implications to inform decision makers on policy making decisions in order to:
 +
 +
* Optimise bus routes to improve route utilisation. Reduce number of bus stops a service stops at to reduce congestion
 +
* Optimise bus routes which could help to reduce congestion along certain bus stops
 +
* Planning of bus stops, where should we place bus stops in order to maximize overall utility
 +
* Plan the frequency of buses at certain times to minimize bus wait time and maximize throughput
 +
* Advice city planners with regards to transportation flow in congested areas
 +
</font></div></div>
 +
 +
== <big>Data Source</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D
 +
">
  
<br/>
+
We will primarily be using data from LTA Data Mall <ref>https://www.mytransport.sg/content/mytransport/home/dataMall/dynamic-data.html</ref>. Data is not publically available but available upon a written request. For this project, we will need to write a script in order to make an API call to extract the data we need. Data includes Live data as well as Historical data.
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Bus Arrival</font></div>===
 +
Live data. Returns real-time Bus Arrival information for Bus Services at a queried Bus Stop, including: Estimated Time of Arrival (ETC), Estimated Location, Load information (how crowded the bus is).
  
 +
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Bus Services</font></div>===
 +
Returns detailed service information for all buses currently in operation, including: first stop, last stop, peak / off peak frequency of dispatch.
  
<font size="5">'''Overview'''</font> <br/>
+
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Bus Route</font></div>===
Corn or Maize (as called in some countries) was first grown in ancient Central America. Corn has become a staple in many parts of the world, providing not only substances that we fill our belly with, but also act as the raw ingredient for corn ethanol, animal feed etc. The United States accounts for about 40% of production of corn in the world<sup>1</sup>, which makes it the largest corn producer. The major portion of production is found in the Midwestern states, such as Illinois, Iowa, Nebraska and Minnesota – these states were grouped and eventually became known as the ‘Corn Belt’. The Corn Belt has about 96,000,000 acres of land just for corn production. The states that make up the Corn Belt were selected due to leveled land, fertile and highly organic soils<sup>2</sup>. <br/>
+
Returns detailed route information for all services currently in operation, including: all bus stops along each route, first/last bus timings for each stop.
  
Corn has been known to be able to grow in a wide range of climatic conditions, hence it would be a challenge to set precise conditions for corn production. However, there is still a limit of this wide window of conditions, such as corn is grown mostly in tropical latitudes, corn has a cold limit of 19°C, corn grows the best in warm temperatures between 21°C to 27°C and the growing season to grow hovers between 120 -180 days<sup>3</sup>. Hence, breeders have been experimenting with various types of corn hybrids, each of them specifically created to have high yield despite the environment it is planted in. Over the years, the farmers have been using trial and error method to identify the best hybrids to plant by planting each of these hybrids in different locations with different environmental factors; this process has been proven to be slow and not very effective<sup>4</sup>. This project aims to explore the meteorology and geographical factors that makes a corn, the a-maize-ing crop that we know today, which would benefit the corn breeder greatly. <br> <br>
+
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Bus Stops</font></div>===
 +
Returns detailed information for all bus stops currently being services by buses, including: Bus Stop Code, location coordinates.
  
=Scope=
+
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Passenger Volume by Bus Stops</font></div>===
The scope of the project is limited to the corn produced in USA, specifically from the Corn Belt regions. Due to time constraint, we will analyse at the ‘Environment’ level (aggregated) instead of ‘Hybrid’ level. Each Environment can be treated as a plantation, where each Environment has various numbers of hybrids being planted. All years are taken into consideration, with a focus on the individual growing season of each environment (around Mar/April to September). We will also limit the environmental factors to the following: <br>
+
Returns tap in and tap out passenger volume by weekdays and weekends for individual bus stop.
# Precipitation <br>
 
# Exposure length to sun (Radiation) <br>
 
# Average Temperature <br>
 
# Location of where the hybrid is planted  <br>
 
  
=Objective=
+
===<div style="font-family:Segoe UI Semibold;"><font size = 3; color="#176585">Passenger Volume by Origin Destination Bus Stops</font></div>===
The first objective of this project is to visualise our dataset:  
+
Returns number of trips by weekdays and weekends from the origin to destination bus stops.
# '''Weather Data''': Precipitation, Average Temperature,Length of (sun) Radiation over the years <br>  
+
</font></div></div></div>
# '''Performance Data''': Average Yield by State, by Plantation. However after skimming through our data, about 45% of our plantations are only used once. Hence we do not have enough data for us to do a time-series analysis. Instead, we will be doing cross-sectional analysis, where we will analyse and visualise our data year by year. <br>
 
  
The second objective is to predict the yield of a plantation, given the soil conditions and topography of the plantation. The model that we are going to implement is Geo-weighted Regression (GWR) Model. This is will be further elaborated on in the Methodology section. <br>
+
== <big>Visualization Feature</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D
 +
">
 +
The visualization that we are trying to build is <b>graphical</b> and <b>geospatial</b> in nature. Bus stop will become a node and the bus route will become the edge between nodes.
  
=Data Source=
+
Examples:
This data from the 'Syngenta Crop Challenge 2019'. [https://www.ideaconnection.com/syngenta-crop-challenge/challenge.php#datasets Click here to see the data.]<br>
 
  
==Performance Data==
+
<b>Geospatial Flow Chart</b><br />
This dataset is the main dataset: we have both the individual hybrid yield, as well as the average yield (of several hybrids) of a particular environment. We have '''5,382''' unique hybrids, with '''579''' unique environments. The table below shows the main variables that we will be using for our analysis: <br>
+
[[Image:Group8-geospatial flowchart.JPG|300px]]
{| class="wikitable"
+
 
|-
+
<b>Visualizing Connectivity</b><br />
! Variable !! Description
+
[[Image:Group8-visualizing connectivity.JPG|300px]]
|-
+
 
| <b><i>YEAR</i></b>|| Year grown
+
<b>Finding centrality</b><br />
|-
+
[[Image:Group8-centrality.JPG|300px]]
| <b><i>HYBRID_ID</i></b>|| Identifier for the tested hybrid
+
 
|-
+
</font></div></div></div>
| <b><i>ENV_ID</i></b>|| Identifier for the tested location and year
+
 
|-
+
== <big>Methodology</big> ==
| <b><i>YIELD</i></b>|| Yield of tested hybrid in tested location (quintiles/hectare)
+
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
|-
+
<div style="font-family:Segoe UI;">
| <b><i>ENV_YIELD_MEAN</i></b>|| Average Yield of tested location
+
<font size = 3; color ="#0F334D
|-
+
">
| <b><i>LAT, LONG</i></b>|| Latitude and Longitude of tested location to nearest 0.1 degree
 
|-
 
| <b><i>ELEVATION</i></b>|| Elevation of field at tested location
 
|-
 
| <b><i>PLANT_DATE</i></b>|| Date the hybrid was planted
 
|-
 
| <b><i>HARVEST_DATE</i></b>|| Date the hybrid was harvested
 
|-
 
| <b><i>CLAY, SILT, SAND, AWC, pH, etc</i></b>|| Properties of soil at tested location
 
|}
 
==Weather Data==
 
This dataset is the supporting dataset: this has all the environmental data for each environment for 365 days (each). There are three important things to note: <br>  
 
# We will only use the days that corresponding each of the growing season of that environment, not the data for all 365 days. <br>
 
# Same location (one set of Long-Lat coordinates) may have various ENV_IDs. For example, level of precipitation at Location A on 8<sup>th</sup> January 2013 and on 8<sup>th</sup> January 2015 would be different, hence different ENV_ID despite being at the same location. <br>  
 
# One ENV_ID correspond to One plantation. In other words, one plantation could have multiple ENV_ID, but never at the same year. <br>
 
The table below shows the main variables that we will be using for our analysis: <br>
 
{| class="wikitable"
 
|-
 
! Variable !! Description
 
  
|-
+
We started this project with a broad question in mind, ‘How can we improve the bus transportation system in Singapore’. The methodology of the project is iterative in nature, we will build broad visualization, identify areas to deep dive and propose solutions for these issues.
| <b><i>ENV_ID</i></b>|| Identifier for the tested location and year (same one as in Performance Data)
 
|-
 
| <b><i>DAY_NUM</i></b>|| Day number within year of weather variables (365 days)
 
|-
 
| <b><i>DAYL</i></b>|| Day length (seconds)
 
|-
 
| <b><i>PREC</i></b>|| Precipitation (mm)
 
|-
 
| <b><i>SRAD</i></b>|| Solar radiation (W/m<sup>2</sup>)
 
|-
 
| <b><i>TMAX</i></b>|| Maximum temperature (degrees Celsius)
 
|-
 
| <b><i>TMIN</i></b>|| Minimum temperature (degrees Celsius)
 
|}
 
=Visualisations=
 
For our '''Weather Data''', we will aggregate the environments into the states that they are in, and then we will be implementing geofacet time series. This would give an overview of how the weather is like over the years for that particular state. However, we know that Nature knows no boundary, hence we will plot Isohyetal (Preciptation) and Isothermal (Average Temperature) Maps to see how the distribution is like over the Corn Belt. <br>  <br>
 
For our '''Performance Data''', we will be implementing isoline graphs to visualise the yield by location, by plantation. This would give a good visualisation on which part of the corn belt has better yield. <br> <br>
 
Our prediction model would be done using '''GWmodel''' package to implement GWR model, which will be discussed in the following section. Our GWR model would estimate the relationships between our independent variable (yield) with our dependent variables (soil, Drought-resistant, Longitude, Latitide, Elevation etc), which eventually could be used by corn breeders in predicting the yield of a plantation, given a certain set of environmental factors. <br>
 
  
=Methodology: Geo-weighted Regression =
+
* <b>Data Extraction </b> - Requesting for access to data from LTA and building API interface to extract data
Geo-weighted Regression (GWR) explores spatially varying relationships between the dependent and independent variables at location 𝑖, where  𝑢<sub>𝑖</sub> and 𝑣<sub>𝑖</sub> are the coordinates of 𝑖. As data are geographically weighted, nearer observations have more influence in determining the local set of regression coefficients.
+
* <b>Preliminary study</b> - Reading up on existing work done on Singapore bus transportation network and understand transportation engineering models
Any GWR model will return these 2 important parameters: Estimates and corresponding t-value. From the estimate value, we can see the correlation between that observation with the dependent variable. If the value is positive, means it is positively correlated, and vice versa. With the given t-values, we will convert to the corresponding p-values in R to see significance. The t-value is specific thing for a specific statistical test, whereas the p-value gives the statistical significance level. We take 5% significant level for our analysis.<br>
+
* <b>First phase analysis and visualization</b> - Building of high level visualizations to clearly show status quo and areas for improvement
 +
* <b>Second phase analysis and visualization</b> - Deep dive into issues and identify solutions
 +
* <b>Implementation</b> - Build R-Shiny app and report
  
There are two parameters that are important to GWR Model: Kernel and Bandwidth.  
+
Solution may include graphical analysis, geospatial analysis in the realm of transport engineering such as Gravity Model(Multiple Regression Model), Network analysis, modelling Centrality.
  
==Kernels for GWR==
+
[[File:GravityModel.png|thumb|Gravity Model from <ref>https://www.researchgate.net/publication/331690357_A_Multiple_Regression_Approach_for_Traffic_Flow_Estimation</ref>]]
[[File:Kernels GWR.png|250px|thumb|]]
+
</font></div></div></div>
There are 6 kernel functions available: <br>  
 
  
The global GWR model gives equal weightage to all observations, which is actually just a normal linear regression. The other kernel functions are used to determine the weightage of an observation by calculating the distance between 2 observations. Gaussian and Exponential kernels are continuous functions of the distance between 2 observations. The weightage will be maximum at a GW model calibrated point (d=0), and decrease accordingly to its function. These two functions give lesser weightage to observations beyond the bandwidth. Boxcar, Bisquare and Tricube kernels are discontinuous functions, giving zero weightage to observations greater than the bandwidth. Boxplot gives equal weightage to observations within the bandwidth, whereas Bisqaure and Tricube give decreasing weightage until the bandwidth. <br><br>  
+
== <big>Limitations</big> ==
 +
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
 +
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D
 +
">
  
We will build both Global and Local GWR models in this project.
+
* We wont be accessing Information Centrality because we assume we wont be removing any bus stops
 +
* Due to data limitations and security, we won't be assessing actual waiting time, frequency and actual travelling time.
 +
* As our project focuses on buses, we will ignore the connectivity due to trains and other forms of public transportations.
 +
</font></div></div></div>
  
==Bandwidth==
+
== <big>Team Members</big> ==
The bandwidth gives the range where the kernel will be applied on the each observation. The smaller the bandwidth, the smaller the range.
+
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
=Tools & Packages=
+
<div style="font-family:Segoe UI;">
These are some of the tools that we will be using: <br>
+
<font size = 3; color ="#0F334D
{| class="wikitable"
+
">
|-
+
* [https://www.linkedin.com/in/jiayi-chan123456/ Chan Jia Yi]
! Package!! Useage of Package
+
* [https://www.linkedin.com/in/yongshan-koh/ Koh Yongshan]
 +
* [https://www.linkedin.com/in/mylee1/ Lee Meng Yong]
  
|-
+
<div style=background:#ffffff  border:#A3BFB1>
| <b><i>lubridate</i></b>|| Cleaning Data
+
[[File:Teammates.JPG|700px|frameless|center]]
|-
+
</div>
| <b><i>tidyverse</i></b>|| Cleaning Data
+
</font></div></div></div>
|-
 
| <b><i>ggplot2</i></b>|| General Graph Plots
 
|-
 
| <b><i>geofacet</i></b>|| Visualising both Weather Data and Performance Data
 
|-
 
| <b><i>tmap, gstat, sp, sf, rgdal, rgeos:raster</i></b>|| Visualising isolines for both Weather Data and Performance Data
 
|-
 
| <b><i>GWmodel</i></b>|| GWR model for Performance Data
 
|-
 
| <b><i>shiny</i></b>|| Dashboard Design
 
|-
 
| <b><i>shinydashboard</i></b>|| Dashboard Design
 
|-
 
| <b><i>shinythemes</i></b>|| Dashboard Design
 
|}
 
  
=Reference=
+
== <big>Tools and Packages</big> ==
The image for the banner was tken from https://iegvu.agribusinessintelligence.informa.com/CO215920/South-Africa-corn-planting-plummets. <br>
+
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
Kernel image is from Gollini, I., Lu, B., Charlton, M., Brunsdon, C., & Harris, P. (2013). GWmodel: an R package for exploring spatial heterogeneity using geographically weighted models. arXiv preprint arXiv:1306.0413. <br>
+
<div style="font-family:Segoe UI;">
 +
<font size = 3; color ="#0F334D
 +
">
 +
* [https://cran.r-project.org/web/packages/flows/vignettes/flows.html Flow]
 +
* [https://cran.r-project.org/web/packages/ggraph/index.html GGraph]
 +
* [https://shiny.rstudio.com shiny]
 +
* [https://cran.r-project.org/web/packages/shinydashboard shinydashboard]
 +
* [https://cran.r-project.org/web/packages/ggplot2 ggplot2]
 +
* [https://plot.ly/r plotly]
 +
* [https://www.tidyverse.org tidyverse]
 +
* [https://neo4j.com/developer/r/ Neo4j]
 +
* [https://cran.r-project.org/web/packages/leaflet/index.html leaflet]
 +
* [https://cran.r-project.org/web/packages/ggcorrplot/index.html ggcorrplot]
 +
* [https://cran.r-project.org/web/packages/tidygraph/index.html tidygraph]
 +
* [https://cran.r-project.org/web/packages/heatmaply/index.html heatmaply]
 +
* [https://cran.r-project.org/web/packages/MASS/index.html MASS]
 +
* [https://cran.r-project.org/web/packages/ERSA/index.html ERSA]
 +
* [https://cran.r-project.org/web/packages/CARS/index.html CAR]
 +
* [https://cran.r-project.org/web/packages/rgdal/index.html RGDAL]
 +
</font></div></div></div>
  
[1] Olson, R. A., & Sander, D. H. (1988). Corn production. Corn and corn improvement, (cornandcornimpr), 639-686. <br>
+
== <big>Reference</big> ==
[2] Smith, C. W. (2004). Corn: origin, history, technology, and production (Vol. 4). John Wiley & Sons.<br>
+
<div style="font-family:Segoe UI Light; font-size:100%; padding: 0px 0px 0px 15px;">
[3] Shaw, R. H. (1988). Climate requirement. Corn and corn improvement, (cornandcornimpr), 609-638. <br>
+
<div style="font-family:Segoe UI;">
[4] https://www.ideaconnection.com/syngenta-crop-challenge/challenge.php
+
<font size = 3; color ="#0F334D
 +
">
 +
* [https://www.mytransport.sg/content/mytransport/home/dataMall.html Land Transport Datamall Documentation - published by LTA]
 +
*[https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?referer=https://www.google.com/&httpsredir=1&article=3099&amp;context=sis_research Time-Series Data Mining in Transportation: A Case Study on Singapore Public Train Commuter Travel Patterns]
 +
* [https://www.researchgate.net/publication/254746478_Spatial_Network_Analysis_of_Public_Transport_Systems_Developing_a_Strategic_Planning_Tool_to_Assess_the_Congruence_of_Movement_and_Urban_Structure_in_Australian_Cities Spatial Network Analysis of Public Transport Systems - ''published by Data2X'']
 +
* [https://www.comp.nus.edu.sg/~wongls/psZ/xiuju-lta10.pdf Weighted complex network analysis of travel route in Singapore public transport system- ''published by NUS'']
 +
* [https://www.flowsmag.com/2017/05/09/the-graphic-visualisation-of-flows/ Graphical visualisation of flows - ''published by Flows Mag'']
 +
* [https://blog.data.gov.sg/rerouting-buses-using-data-science-part-i-4d6c9d4f1f Rerouting Buses using Data Science - Part I- ''published by Govtech Singapore'']
 +
* [https://blog.data.gov.sg/modelling-the-public-transport-network-part-ii-a6da2f3bd28c Modelling the public transport network - Part II- ''published by Govtech Singapore'' ]
 +
* [https://www.tech.gov.sg/media/technews/how-govtech-simulates-four-million-bus-rides-a-day How Govtech simulates four million bus rides a day- ''published by Govtech Singapore'' ]
 +
* [https://wiki.smu.edu.sg/1718t3isss608/Group25_Analysis Journey to the end of the line - SMU MITB Project, Group 2 T17/18- ''published by SMU MITB'']
 +
* [https://blog.rstudio.com/2015/06/24/leaflet-interactive-web-maps-with-r/ Interactive Web Maps with R- ''published by R Studio'']
 +
* [https://www.budgetdirect.com.sg/car-insurance/research/public-transport-singapore/ Public Transport Singapore 2020- ''Budget Direct Singapore'']
 +
* [https://www.researchgate.net/publication/254746478_Spatial_Network_Analysis_of_Public_Transport_Systems_Developing_a_Strategic_Planning_Tool_to_Assess_the_Congruence_of_Movement_and_Urban_Structure_in_Australian_Cities/ Spatial Network Analysis of Public Transport Systems: Developing a Strategic Planning Tool to Assess the Congruence of Movement and Urban Structure in Australian Cities- ''Research Gate'']
 +
* [https://www.mytransport.sg/content/mytransport/home/dataMall/dynamic-data.html/ LTA public transport data- ''LTA datamall'']
 +
* [https://www.researchgate.net/publication/331690357_A_Multiple_Regression_Approach_for_Traffic_Flow_Estimation/ A Multiple Regression Approach for Traffic Flow Estimation- ''Research Gate'']
 +
</font></div></div>

Latest revision as of 17:03, 26 April 2020


Group logo2.png

Proposal

Poster

Application & User Guide

Research Paper

Back to Main ↗


Overview

Singapore's public transport use rose to hit a record high in 2018, with a total of 7.54 million trips made on buses or trains each day.[1]

Here's what may come across your mind : Do you ever have experiences where a bus ride that is supposed to be short and quick took way longer than expected? Are you frustrated that the bus stops at every stop even though there’s nobody boarding or alighting? And why do we have so many bus stops that almost nobody uses?

What if we can reimagine the public bus network in Singapore through data?

In this project, we will use data visualization techniques to map out all transportation nodes in Singapore, and re-propose a different way of organizing our bus services, which include bus stops, bus routes, and connectivity within subregion and from subregions to another subregion.

Scope

The scope of the project is limited to public buses in Singapore.

In order to map out the pattern of transportation in Singapore, we will mainly be using datasets from LTA datamall (https://www.mytransport.sg/content/mytransport/home/dataMall.html). In addition, we may supplement the data with other relevant datasets, such as geographical socioeconomic data, land use data (industrial area, commercial area, residential area), population density, or weather data.

Approach

We will use a primal approach to the analysis of movement networks, by treating intersections as nodes and street segments as edges.[2]

The app aims to provide policy makers with the following information:

Transportation Flow

  • Visualise the flow of people across the bus stops/planning subzones and the criticality of the bus stops/planning subzones to Singapore’s public bus transport network.

Travel Demand

Estimate the travel demand: Volume of people expected to travel between a particular origin and destination via a particular route and mode of travel (direct bus)

The app should also be able to:

  • Show the Degree of Centrality - The number of regions a region is connected via bus services
  • Show the Closeness Centrality of every bus stops to identify how easily each bus stop can reach another bus stop
  • Show the Betweenness Centrality of a bus stop on the shortest path between location to location
  • Show the Connectivity of a bus stop based on the region the area is connected to. (i.e. higher frequency of buses per hour, higher connectivity)

We have a few recommendations for the network assessment task:

  • Out of all possible paths in between 2 points in a network, the model we propose needs to discriminate in favour of paths that occurs in a minimum number of transfers between public buses.
  • When considering transfers along a path between 2 points in a network, the model needs to define consistent standards of number of nodes to be recommended as an interchange node.
  • Wherever a pair of nodes is connected by a minimum of two edges, the path with the lowest cumulative distance has been chosen, regardless of the number of transfers required.
  • Diffusing some travel demand away from its trunk routes and thus of achieving more geographically balanced passenger flows with fewer squeeze points for capacity

Outcome

Where the visualisation could have useful practical implications to inform decision makers on policy making decisions in order to:

  • Optimise bus routes to improve route utilisation. Reduce number of bus stops a service stops at to reduce congestion
  • Optimise bus routes which could help to reduce congestion along certain bus stops
  • Planning of bus stops, where should we place bus stops in order to maximize overall utility
  • Plan the frequency of buses at certain times to minimize bus wait time and maximize throughput
  • Advice city planners with regards to transportation flow in congested areas

Data Source

We will primarily be using data from LTA Data Mall [3]. Data is not publically available but available upon a written request. For this project, we will need to write a script in order to make an API call to extract the data we need. Data includes Live data as well as Historical data.

Bus Arrival

Live data. Returns real-time Bus Arrival information for Bus Services at a queried Bus Stop, including: Estimated Time of Arrival (ETC), Estimated Location, Load information (how crowded the bus is).

Bus Services

Returns detailed service information for all buses currently in operation, including: first stop, last stop, peak / off peak frequency of dispatch.

Bus Route

Returns detailed route information for all services currently in operation, including: all bus stops along each route, first/last bus timings for each stop.

Bus Stops

Returns detailed information for all bus stops currently being services by buses, including: Bus Stop Code, location coordinates.

Passenger Volume by Bus Stops

Returns tap in and tap out passenger volume by weekdays and weekends for individual bus stop.

Passenger Volume by Origin Destination Bus Stops

Returns number of trips by weekdays and weekends from the origin to destination bus stops.

Visualization Feature

The visualization that we are trying to build is graphical and geospatial in nature. Bus stop will become a node and the bus route will become the edge between nodes.

Examples:

Geospatial Flow Chart
Group8-geospatial flowchart.JPG

Visualizing Connectivity
Group8-visualizing connectivity.JPG

Finding centrality
Group8-centrality.JPG

Methodology

We started this project with a broad question in mind, ‘How can we improve the bus transportation system in Singapore’. The methodology of the project is iterative in nature, we will build broad visualization, identify areas to deep dive and propose solutions for these issues.

  • Data Extraction - Requesting for access to data from LTA and building API interface to extract data
  • Preliminary study - Reading up on existing work done on Singapore bus transportation network and understand transportation engineering models
  • First phase analysis and visualization - Building of high level visualizations to clearly show status quo and areas for improvement
  • Second phase analysis and visualization - Deep dive into issues and identify solutions
  • Implementation - Build R-Shiny app and report

Solution may include graphical analysis, geospatial analysis in the realm of transport engineering such as Gravity Model(Multiple Regression Model), Network analysis, modelling Centrality.

Gravity Model from [4]

Limitations

  • We wont be accessing Information Centrality because we assume we wont be removing any bus stops
  • Due to data limitations and security, we won't be assessing actual waiting time, frequency and actual travelling time.
  • As our project focuses on buses, we will ignore the connectivity due to trains and other forms of public transportations.

Team Members

Tools and Packages

Reference