Difference between revisions of "AY1516 T2 Team CommuteThere Methodology"

From Analytics Practicum
Jump to navigation Jump to search
 
(23 intermediate revisions by 2 users not shown)
Line 11: Line 11:
 
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #0091b3" width="210px" |   
 
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #0091b3" width="210px" |   
 
[[AY1516_T2_Team_CommuteThere_Overview|<font color="#3c3c3c"><strong>PROJECT OVERVIEW</strong></font>]]
 
[[AY1516_T2_Team_CommuteThere_Overview|<font color="#3c3c3c"><strong>PROJECT OVERVIEW</strong></font>]]
 +
 +
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" | 
 +
[[AY1516_T2_Team_CommuteThere_Project_Data_Preparation|<font color="#3c3c3c"><strong>ANALYSIS & FINDINGS</strong></font>]]
  
 
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" |   
 
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" |   
Line 18: Line 21:
 
[[AY1516_T2_Team_CommuteThere_Main Deliverables|<font color="#3c3c3c"><strong>DOCUMENTATION</strong></font>]]
 
[[AY1516_T2_Team_CommuteThere_Main Deliverables|<font color="#3c3c3c"><strong>DOCUMENTATION</strong></font>]]
  
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" | 
 
[[AY1516_T2_Team_CommuteThere_Analysis_Findings|<font color="#3c3c3c"><strong>ANALYSIS & FINDINGS</strong></font>]]
 
 
|}
 
|}
 
</center>
 
</center>
Line 46: Line 47:
 
<!-- Body -->
 
<!-- Body -->
 
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; border-left:8px solid #0091b3"><font color= #000000><strong>Analyse Commuter Patterns</strong></font></div>==
 
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; border-left:8px solid #0091b3"><font color= #000000><strong>Analyse Commuter Patterns</strong></font></div>==
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Arial, sans-serif; border-radius: 7px; text-align:left">
+
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
<font face="Open Sans, Arial, sans-serif;">
 
This analysis aims to identify commuter patterns of each demographic groups - students, adults and elderly - as each group has differing interests and preferences in the places to frequent at. Analysis of commuter patterns is split into 4 segments:
 
<font face="Open Sans, Arial, sans-serif">
 
{| class="wikitable"
 
|-
 
! Segments !! Description
 
|-
 
  
|Island wide
+
This analysis aims to identify commuter patterns of each demographic groups - students, adults and elderly - as each group has differing interests and preferences in the places to frequent at. Data used for this methodology involves the ez-link and points of interests (POI) data. Given that the places that each demographic groups frequent at varies due to differing interests and preferences, to include points of interests (POI) in this analysis will be helpful to understand which places attract various groups of people at various periods of the week. With that, our team conclude that POI should be places that serve the primary needs of the people.
||Overall commuting activity for each demographic groups as a whole, regardless of place of origin. This will provide an overview of the commuters’ travelling pattern in Singapore.
 
|-
 
  
|Inter town
+
Analysing commuter patterns is further segregated to two sub-methods:
||Travelling patterns of the commuters whose trips originate from Tampines planning area and end in the East region i.e Bedok,Paya Lebar, Changi, Pasir Ris
+
====1. Identifying common destination points====
|-
+
An initial analysis will be conducted to find out the common destinations that commuters travel to given that each demographic groups will have different needs and hence different places they frequent to. A heatmap of the common points will be visualized using QGIS. Areas with a darker intensity of colour would show the areas where many commuters alight at.
  
|Intra town
+
====2. Identifying travel patterns====
||Travelling patterns of the commuters whose trips originate and end in Tampines planning area i.e Tampines, Simei
+
Travel patterns are categorized into four different segments: Island wide, inter town, intra town and most frequently travelled trips, where commuters may travel just within Tampines planning area, or within the east region, or island wide. To do so, we will use QGIS to map out.
|-
 
  
|Most frequent travelled trips
 
||Commuters who made the same trip for at least four times in a week can be categorised as such. The data for each demographic groups are analysed based on weekdays which has most of the activities reflected on
 
  
|}
+
Analysis of commuter patterns is split into 4 segments:
  
 +
<center>
 +
{| style="background-color:#ffffff ; margin: 3px 10px 3px 10px; font-size:15px" width="100%"
 +
|- style="background:#f2f4f4; font-size:17px"
  
Our data for this analysis consists of the following:
+
|-
 +
| style="font-family:Open Sans, Arial, sans-serif; font-size:16px; text-align: center; padding:5px; border-bottom:solid #0091b3" | <font color="#3c3c3c"><strong>Segments</strong></font>
 +
| style="font-family:Open Sans, Arial, sans-serif; font-size:16px; text-align: center; padding:5px; border-bottom:solid #0091b3" | <font color="#3c3c3c"><strong>Description</strong></font>
  
<u>1.Ez-link transactions</u><br>
+
|-
With the support from LARC, we were able to obtain ez-link transactions data from 20 to 26 January 2014. We have selected just a week of data in January 2014 because the travelling patterns for each week in a month are similar and there are neither no public holidays nor school holidays in the selected week for analysis. However, regardless of scaling down the data into just a week’s period, there are still millions of transactions presented. As such, analysis of the data will be further scaled down to grouping the transactions based on demographic profiles, followed by aggregating the timings of transactions to every 15 minutes given that the timings presented come in seconds.
+
| style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | <strong>Island wide</strong>
 +
| style="font-family:Open Sans, Arial, sans-serif; text-align: left; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Overall commuting activity for each demographic groups as a whole, regardless of place of origin. This will provide an overview of the commuters’ travelling pattern in Singapore.
  
<u>2.Bus routes</u><br>
+
|-
Busrouters.sg is an online portal where bus routes in Singapore are displayed in a map version. Data for bus routes is public available by the developer. The bus routes are updated to the latest bus profiles provided by the Land Transport Authority (LTA). However, the bus routes are published in json format. In order for us to conduct geospatial analysis using QGIS, a conversion of json to csv format is required. Besides having the bus routes plotted out in lines using QGIS, we realized that it is also important to have the bus stops included in the bus routes, where points of the bus stops and lines of the bus routes are snapped as a whole. Busrouters.sg has provided data of bus stops for each bus services. With that information, our team will be working on incorporating bus stops with the routes using PostGIS and QGIS.
+
| style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | <strong>Inter town</strong>
 +
| style="font-family:Open Sans, Arial, sans-serif; text-align: left; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Travelling patterns of the commuters whose trips originate from Tampines planning area and end in the East region i.e Bedok,Paya Lebar, Changi, Pasir Ris.
  
<u>3.Points of interests</u><br>
+
|-
Given that the places that each demographic groups frequent at varies due to differing interests and preferences, to include points of interests (POI) in this analysis will be helpful to understand which places attract various groups of people at various periods of the week. With that, our team conclude that POI should be places that serve the primary needs of the people. As such, POI include:<br>
+
| style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | <strong>Intra town</strong>
 +
| style="font-family:Open Sans, Arial, sans-serif; text-align: left; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Travelling patterns of the commuters whose trips originate and end in Tampines planning area i.e Tampines and Simei.
  
• MRT stations<br>
+
|-
• Schools (primary, secondary, pre-tertiary and tertiary education)<br>
+
| style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | <strong>Most frequent travelled trips</strong>
• Shopping malls<br>
+
| style="font-family:Open Sans, Arial, sans-serif; text-align: left; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Commuters who made the same trip for at least four times in a week can be categorised as such. The data for each demographic groups are analysed based on weekdays which has most of the activities reflected on.
• Sports complex<br>
+
|}
• Parks<br>
+
</center>
• Childcare<br>
+
</div>
• Community centers<br>
 
• Shapefiles for the identified POI can be retrieved from data.gov.sg, Openstreetmap, Onemap and LTA Data Mall<br>
 
 
 
===Part 2: Site Visit - Identify Gaps in Infrastructure===
 
After conducting the first analysis where we identify areas with high volume of commuters, and commuters who travel short distances. The ez-link data will show us places that attract more elderly than students, for example, asking questions such as: “do those places serve the elderly well enough?”
 
 
 
The second part of analysis involves identifying gaps in the infrastructure within Tampines planning area. Why are people commuting by bus instead of walking? Will safety be compromised if people choose to walk? Or are there roads hindering the connectivity between the point of start with the destination? Singstat had the statistics of population for June 2015 published publicly. This information will aid in the understanding of how well-served are the living areas to the community. With that, we will conduct site visits to understand the situation better on ground level.<br>
 
 
 
 
 
Data for this analysis includes:<br>
 
  
<u>1. Statistics on Demographic Profile</u><br>
+
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; border-left:8px solid #0091b3"><font color= #000000><strong>Analyse Multimodal Transportation Patterns </strong></font></div>==
(insert photos)x3<br>
+
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
[[File:Team_WalkThere_DemographicProfile1.png|360px|center]]<br>
 
[[File:Team_WalkThere_DemographicProfile.png|720px|center]]<br>
 
[[File:Team_WalkThere_DemographicProfile0.png|720px|center]]<br>
 
  
<p>Based on the charts above, the top 3 age ranges in all subzones lie in the older range, which is above 50 years old. This shows that Tampines is more of a mature estate and as such, it is important to have facilities and footpaths catered to the elderly.</p>
+
===1. Distribution Analysis on Multimode Commuters===
 +
In order to analyse multimode commuters, we will join the MRT and Bus dataset together using card number attribute, time attribute and date attribute.  
  
<u>2. Linking of Pedestrian Walkways</u><br>
+
====1.1 Analyse Transfer Interval====
As a tropical country located near the equator, Singapore receives her fair share of sunlight and often discourages people from staying outdoors for long due to the high level of humidity. With that, people may choose to commute by bus even for a short distance just to avoid the sun. Having covered linked ways and planting more trees, may help alleviate the situation through introducing more shades to pedestrians during daytime; and lamp posts to provide sufficient lighting at night for safer walking experience. Areas are obstructed with varied reasons, such as not enough lightings or shades and more, will be identified when we conduct site visit. LTA data mall has provided the following data:<br>
+
According to Transit Link, a transfer can be from:
• footpath<br>
+
*the MRT/LRT to a bus service,
• covered linkway<br>
+
*a bus service to another bus service, or
• lamp post<br>
+
*a bus service to the MRT/LRT
• road crossing<br>
 
• pedestrian overhead bridge and underpass<br>
 
  
<u>3. Pedestrian Network</u><br>
+
Transfer interval refers to the amount of time taken for the students to transfer from one mode of transportation to another mode of transportation.This is calculated using the difference between Bus entry time and MRT exit time (for MRT→Bus) and MRT entry time and Bus exit time(for Bus →MRT)
This data allows us to understand whether pedestrians can arrive at their destinations via walking. However, as the data is not available to us publicly, we have to formulate this network at our own means. By connecting the road network and plotting the pedestrian connectivity, even walking through void decks, will be done after conducting site visits.<br>
 
  
 +
===2. Analyse Relationship Between Walking and Bus Commuting===
 +
=====2.1 Least Cost Walk Path Analysis=====
 +
Due to time constraint, our group will use the Student group as a proxy.  In order to analyse the relationship between walking and bus commuting, we will compare the time taken to walk with the bus travelling time. Unlike bus travelling time, the time taken to walk is not provided in the dataset. This will be calculated using the walking distance, which will be derived from least cost walk path analysis, and the average walking speed of students derived from prominent research papers.
  
</font>
+
Our group has derived two methods to construct the least cost walk path namely the Traditional method and the Euclidean Distance method. Traditional method involves the use of QGIS extension plugins such as GRASS and SAGA whereas the Euclidean Distance involves the use of Hub Lines in MMGIS Plugins.  
</div>
 
 
 
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; border-left:8px solid #0091b3"><font color= #000000><strong>Work Scope</strong></font></div>==
 
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Arial, sans-serif; border-radius: 7px; text-align:left">
 
  
<font face="Open Sans, Arial, sans-serif;">
+
======Traditional Method======
=== Literature Study ===
+
To understand previous studies on walkability in Singapore and in other countries, and the types of infrastructures that can be introduced so as to be able to make recommendations to improve the connectivity between residential estates and points of interest.  
+
Firstly, “landtype”, “road” and “tampines planning area” shapefiles are assigned with impedance value to denote the amount effort that the pedestrian has to make. Higher impedance denotes greater amount of effort made by the pedestrian. For “landtype” shapefile, land that can be trespassed will have a value of 1 whereas those that cannot be trespassed will have a value of 100. For “road” shapefile, expressways will have a value of 100 and 1 otherwise.  For “tampines planning area” shapefile, it will have an impedance value of 0 to indicate a flat land.  
  
=== Software Learning ===
+
Secondly, all the above mentioned shapefiles are rasterized with pixel of 50m x 50m. “landtype” and “road” rasters are then merged using GRASS r.patch. Next, cumulative cost of moving from an origin of a particular route is calculated using GRASS r.walk.
Learn how to use the QGis software, both on the laptop as well as on the mobile phone (to aid data collection)
 
  
=== Data Collection ===
+
With an output of cumulative cost raster layer generated by GRASS r.walk, we will be able to construct the least cost walk path using SAGA least cost paths function. As the resulting line layer did not have distance information, the distance will be calculated using the $length formula in field calculator.
Ez-link data will be provided by LARC while points of interests data sets are publicly available on Openstreetmap, Data.gov.sg, LTA data mall and Onemap. Pedestrian network will be manually mapped out through conducting site visits and with the integration of road network.
 
  
=== Data Exploration ===
+
======Euclidean Distance Method======  
Ez-link data of one week will be segmented into 3 sections for analysis: student, adult and elderly. Each team members has to identify trends and patterns for each profile groups with the use of analytics tools such as JMP and QGIS.
+
Walking distance derived from the Euclidean Distance Method is a straight-line distance from an origin to a destination. The walk paths are constructed using Hub lines function in MMQGIS.
  
=== Geospatial Analysis ===
+
=====2.2 Comparing Time Taken to Walk and Bus Transportation=====
Using QGIS, for the following:<br>
+
For the purpose of our analysis, we will be using the Euclidean Distance method as it is a more straightforward and less tedious method as compared to the Traditional method. We will construct 10 least cost paths using both Traditional and Euclidean Distance method, and compare the difference in the distance by subtracting the distance derived from the Euclidean Distance method from the Traditional method. The average and standard deviation of these differences will be used to calculate the error bound i.e Mean of Differences + 2 x(Standard  Deviation of Differences).  
• Commuters behaviours throughout the entire one week.<br>
 
• Map out paths that residents may take from their houses to identified points of interest<br>
 
• Understand the coverage of street lamps to analyse the safety of walking paths at night. Through measuring the radius of coverage and the height of the lamp post, we can understand how the distribution of the lamp post should be placed.<br>
 
  
 +
As the Traditional method is more representative of the actual path used by the pedestrian as compared to the Euclidean Distance Method, the upper error bound will be added to the distance derived from the Euclidean Distance method instead of taking into account both lower and upper error bound.  After which, we will compare the bus travelling time and the time taken to walk. If the bus travelling time is shorter than the time taken to walk, we are able to deduce that bus commuting and walking has a negative correlation.
  
</font>
 
 
</div>
 
</div>

Latest revision as of 10:30, 17 April 2016

Commutetherelogo.png

HOME

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

Overview

Review of Previous Work

Data

Methodology

Analyse Commuter Patterns

This analysis aims to identify commuter patterns of each demographic groups - students, adults and elderly - as each group has differing interests and preferences in the places to frequent at. Data used for this methodology involves the ez-link and points of interests (POI) data. Given that the places that each demographic groups frequent at varies due to differing interests and preferences, to include points of interests (POI) in this analysis will be helpful to understand which places attract various groups of people at various periods of the week. With that, our team conclude that POI should be places that serve the primary needs of the people.

Analysing commuter patterns is further segregated to two sub-methods:

1. Identifying common destination points

An initial analysis will be conducted to find out the common destinations that commuters travel to given that each demographic groups will have different needs and hence different places they frequent to. A heatmap of the common points will be visualized using QGIS. Areas with a darker intensity of colour would show the areas where many commuters alight at.

2. Identifying travel patterns

Travel patterns are categorized into four different segments: Island wide, inter town, intra town and most frequently travelled trips, where commuters may travel just within Tampines planning area, or within the east region, or island wide. To do so, we will use QGIS to map out.


Analysis of commuter patterns is split into 4 segments:

Segments Description
Island wide Overall commuting activity for each demographic groups as a whole, regardless of place of origin. This will provide an overview of the commuters’ travelling pattern in Singapore.
Inter town Travelling patterns of the commuters whose trips originate from Tampines planning area and end in the East region i.e Bedok,Paya Lebar, Changi, Pasir Ris.
Intra town Travelling patterns of the commuters whose trips originate and end in Tampines planning area i.e Tampines and Simei.
Most frequent travelled trips Commuters who made the same trip for at least four times in a week can be categorised as such. The data for each demographic groups are analysed based on weekdays which has most of the activities reflected on.

Analyse Multimodal Transportation Patterns

1. Distribution Analysis on Multimode Commuters

In order to analyse multimode commuters, we will join the MRT and Bus dataset together using card number attribute, time attribute and date attribute.

1.1 Analyse Transfer Interval

According to Transit Link, a transfer can be from:

  • the MRT/LRT to a bus service,
  • a bus service to another bus service, or
  • a bus service to the MRT/LRT

Transfer interval refers to the amount of time taken for the students to transfer from one mode of transportation to another mode of transportation.This is calculated using the difference between Bus entry time and MRT exit time (for MRT→Bus) and MRT entry time and Bus exit time(for Bus →MRT)

2. Analyse Relationship Between Walking and Bus Commuting

2.1 Least Cost Walk Path Analysis

Due to time constraint, our group will use the Student group as a proxy. In order to analyse the relationship between walking and bus commuting, we will compare the time taken to walk with the bus travelling time. Unlike bus travelling time, the time taken to walk is not provided in the dataset. This will be calculated using the walking distance, which will be derived from least cost walk path analysis, and the average walking speed of students derived from prominent research papers.

Our group has derived two methods to construct the least cost walk path namely the Traditional method and the Euclidean Distance method. Traditional method involves the use of QGIS extension plugins such as GRASS and SAGA whereas the Euclidean Distance involves the use of Hub Lines in MMGIS Plugins.

Traditional Method

Firstly, “landtype”, “road” and “tampines planning area” shapefiles are assigned with impedance value to denote the amount effort that the pedestrian has to make. Higher impedance denotes greater amount of effort made by the pedestrian. For “landtype” shapefile, land that can be trespassed will have a value of 1 whereas those that cannot be trespassed will have a value of 100. For “road” shapefile, expressways will have a value of 100 and 1 otherwise. For “tampines planning area” shapefile, it will have an impedance value of 0 to indicate a flat land.

Secondly, all the above mentioned shapefiles are rasterized with pixel of 50m x 50m. “landtype” and “road” rasters are then merged using GRASS r.patch. Next, cumulative cost of moving from an origin of a particular route is calculated using GRASS r.walk.

With an output of cumulative cost raster layer generated by GRASS r.walk, we will be able to construct the least cost walk path using SAGA least cost paths function. As the resulting line layer did not have distance information, the distance will be calculated using the $length formula in field calculator.

Euclidean Distance Method

Walking distance derived from the Euclidean Distance Method is a straight-line distance from an origin to a destination. The walk paths are constructed using Hub lines function in MMQGIS.

2.2 Comparing Time Taken to Walk and Bus Transportation

For the purpose of our analysis, we will be using the Euclidean Distance method as it is a more straightforward and less tedious method as compared to the Traditional method. We will construct 10 least cost paths using both Traditional and Euclidean Distance method, and compare the difference in the distance by subtracting the distance derived from the Euclidean Distance method from the Traditional method. The average and standard deviation of these differences will be used to calculate the error bound i.e Mean of Differences + 2 x(Standard Deviation of Differences).

As the Traditional method is more representative of the actual path used by the pedestrian as compared to the Euclidean Distance Method, the upper error bound will be added to the distance derived from the Euclidean Distance method instead of taking into account both lower and upper error bound. After which, we will compare the bus travelling time and the time taken to walk. If the bus travelling time is shorter than the time taken to walk, we are able to deduce that bus commuting and walking has a negative correlation.