Difference between revisions of "AY1516 T2 Team CommuteThere Methodology"

From Analytics Practicum
Jump to navigation Jump to search
 
(19 intermediate revisions by 2 users not shown)
Line 11: Line 11:
 
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #0091b3" width="210px" |   
 
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #0091b3" width="210px" |   
 
[[AY1516_T2_Team_CommuteThere_Overview|<font color="#3c3c3c"><strong>PROJECT OVERVIEW</strong></font>]]
 
[[AY1516_T2_Team_CommuteThere_Overview|<font color="#3c3c3c"><strong>PROJECT OVERVIEW</strong></font>]]
 +
 +
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" | 
 +
[[AY1516_T2_Team_CommuteThere_Project_Data_Preparation|<font color="#3c3c3c"><strong>ANALYSIS & FINDINGS</strong></font>]]
  
 
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" |   
 
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" |   
Line 18: Line 21:
 
[[AY1516_T2_Team_CommuteThere_Main Deliverables|<font color="#3c3c3c"><strong>DOCUMENTATION</strong></font>]]
 
[[AY1516_T2_Team_CommuteThere_Main Deliverables|<font color="#3c3c3c"><strong>DOCUMENTATION</strong></font>]]
  
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #f5f5f5" width="230px" | 
 
[[AY1516_T2_Team_CommuteThere_Analysis_Findings|<font color="#3c3c3c"><strong>ANALYSIS & FINDINGS</strong></font>]]
 
 
|}
 
|}
 
</center>
 
</center>
Line 46: Line 47:
 
<!-- Body -->
 
<!-- Body -->
 
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; border-left:8px solid #0091b3"><font color= #000000><strong>Analyse Commuter Patterns</strong></font></div>==
 
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; border-left:8px solid #0091b3"><font color= #000000><strong>Analyse Commuter Patterns</strong></font></div>==
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Arial, sans-serif; border-radius: 7px; text-align:left">
+
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
<font face="Open Sans, Arial, sans-serif;">
+
 
 +
This analysis aims to identify commuter patterns of each demographic groups - students, adults and elderly - as each group has differing interests and preferences in the places to frequent at. Data used for this methodology involves the ez-link and points of interests (POI) data. Given that the places that each demographic groups frequent at varies due to differing interests and preferences, to include points of interests (POI) in this analysis will be helpful to understand which places attract various groups of people at various periods of the week. With that, our team conclude that POI should be places that serve the primary needs of the people.
 +
 
 +
Analysing commuter patterns is further segregated to two sub-methods:
 +
====1. Identifying common destination points====
 +
An initial analysis will be conducted to find out the common destinations that commuters travel to given that each demographic groups will have different needs and hence different places they frequent to. A heatmap of the common points will be visualized using QGIS. Areas with a darker intensity of colour would show the areas where many commuters alight at.
 +
 
 +
====2. Identifying travel patterns====
 +
Travel patterns are categorized into four different segments: Island wide, inter town, intra town and most frequently travelled trips, where commuters may travel just within Tampines planning area, or within the east region, or island wide. To do so, we will use QGIS to map out.
 +
 
 +
 
 +
Analysis of commuter patterns is split into 4 segments:
 +
 
 +
<center>
 +
{| style="background-color:#ffffff ; margin: 3px 10px 3px 10px; font-size:15px" width="100%"
 +
|- style="background:#f2f4f4; font-size:17px"  
  
This analysis aims to identify commuter patterns of each demographic groups - students, adults and elderly - as each group has differing interests and preferences in the places to frequent at. Analysis of commuter patterns is split into 4 segments:
 
<font face="Open Sans, Arial, sans-serif">
 
{| class="wikitable" style="width: 85%;margin: auto;"
 
 
|-
 
|-
! Segments !! Description
+
| style="font-family:Open Sans, Arial, sans-serif; font-size:16px; text-align: center; padding:5px; border-bottom:solid #0091b3" | <font color="#3c3c3c"><strong>Segments</strong></font>
 +
| style="font-family:Open Sans, Arial, sans-serif; font-size:16px; text-align: center; padding:5px; border-bottom:solid #0091b3" | <font color="#3c3c3c"><strong>Description</strong></font>
 +
 
 
|-
 
|-
 +
| style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | <strong>Island wide</strong>
 +
| style="font-family:Open Sans, Arial, sans-serif; text-align: left; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Overall commuting activity for each demographic groups as a whole, regardless of place of origin. This will provide an overview of the  commuters’ travelling pattern in Singapore.
  
|style="text-align:Center;"|Island wide
 
||Overall commuting activity for each demographic groups as a whole, regardless of place of origin. This will provide an overview of the  commuters’ travelling pattern in Singapore.
 
 
|-
 
|-
 +
| style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | <strong>Inter town</strong>
 +
| style="font-family:Open Sans, Arial, sans-serif; text-align: left; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Travelling patterns of the commuters whose trips originate from Tampines planning area and end in the East region i.e Bedok,Paya Lebar, Changi, Pasir Ris.
  
|style="text-align:Center;"|Inter town
 
||Travelling patterns of the commuters whose trips originate from Tampines planning area and end in the East region i.e Bedok,Paya Lebar, Changi, Pasir Ris
 
 
|-
 
|-
 +
| style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | <strong>Intra town</strong>
 +
| style="font-family:Open Sans, Arial, sans-serif; text-align: left; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Travelling patterns of the commuters whose trips originate and end in Tampines planning area i.e Tampines and Simei.
  
|style="text-align:Center;"|Intra town
 
||Travelling patterns of the commuters whose trips originate and end in Tampines planning area i.e Tampines, Simei
 
 
|-
 
|-
 
+
| style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | <strong>Most frequent travelled trips</strong>
|style="text-align:Center;"|Most frequent travelled trips
+
| style="font-family:Open Sans, Arial, sans-serif; text-align: left; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Commuters who made the same trip for at least four times in a week can be categorised as such. The data for each demographic groups are analysed based on weekdays which has most of the activities reflected on.
||Commuters who made the same trip for at least four times in a week can be categorised as such. The data for each demographic groups are analysed based on weekdays which has most of the activities reflected on  
 
 
 
 
|}
 
|}
 +
</center>
 +
</div>
  
 +
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; border-left:8px solid #0091b3"><font color= #000000><strong>Analyse Multimodal Transportation Patterns </strong></font></div>==
 +
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
  
Our data for this analysis consists of the following:
+
===1. Distribution Analysis on Multimode Commuters===
Ez-Link Transactions
+
In order to analyse multimode commuters, we will join the MRT and Bus dataset together using card number attribute, time attribute and date attribute.  
With the support from LARC, we were able to obtain ez-link transactions data from 20 to 26 January 2014. We have selected just a week of data in January 2014 because the travelling patterns for each week in a month are similar and there are neither no public holidays nor school holidays in the selected week for analysis. However, regardless of scaling down the data into just a week’s period, there are still millions of transactions presented. As such, analysis of the data will be further scaled down to grouping the transactions based on demographic profiles, followed by aggregating the timings of transactions to every 15 minutes given that the timings presented come in seconds.
 
 
 
Points of interests
 
Given that the places that each demographic groups frequent at varies due to differing interests and preferences, to include points of interests (POI) in this analysis will be helpful to understand which places attract various groups of people at various periods of the week. With that, our team conclude that POI should be places that serve the primary needs of the people. As such, POI include:
 
MRT stations
 
Schools (primary, secondary, pre-tertiary and tertiary education)
 
Shopping malls
 
Sports complex
 
Parks
 
Childcare
 
Community centers
 
Shapefiles for the identified POIs can be retrieved from data.gov.sg, Openstreetmap, Onemap and LTA Data Mall.
 
 
 
Identify common destination points
 
We will conduct an initial analysis to find out the common routes that passengers take, within Tampines, and from Tampines to the other parts of the East Region. This will help us to understand the commuter patterns within Tampines and between Tampines and the East Region. To do so, we will use QGis to map out.  
 
  
 +
====1.1 Analyse Transfer Interval====
 +
According to Transit Link, a transfer can be from:
 +
*the MRT/LRT to a bus service,
 +
*a bus service to another bus service, or
 +
*a bus service to the MRT/LRT
  
 +
Transfer interval refers to the amount of time taken for the students to transfer from one mode of transportation to another mode of transportation.This is calculated using the difference between Bus entry time and MRT exit time (for MRT→Bus) and MRT entry time and Bus exit time(for Bus →MRT)
  
</font>
+
===2. Analyse Relationship Between Walking and Bus Commuting===
</div>
+
=====2.1 Least Cost Walk Path Analysis=====
 +
Due to time constraint, our group will use the Student group as a proxy.  In order to analyse the relationship between walking and bus commuting, we will compare the time taken to walk with the bus travelling time. Unlike bus travelling time, the time taken to walk is not provided in the dataset. This will be calculated using the walking distance, which will be derived from least cost walk path analysis, and the average walking speed of students derived from prominent research papers.
  
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; border-left:8px solid #0091b3"><font color= #000000><strong>Work Scope</strong></font></div>==
+
Our group has derived two methods to construct the least cost walk path namely the Traditional method and the Euclidean Distance method. Traditional method involves the use of QGIS extension plugins such as GRASS and SAGA whereas the Euclidean Distance involves the use of Hub Lines in MMGIS Plugins.  
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Arial, sans-serif; border-radius: 7px; text-align:left">
 
  
<font face="Open Sans, Arial, sans-serif;">
+
======Traditional Method======
=== Literature Study ===
+
To understand previous studies on walkability in Singapore and in other countries, and the types of infrastructures that can be introduced so as to be able to make recommendations to improve the connectivity between residential estates and points of interest.  
+
Firstly, “landtype”, “road” and “tampines planning area” shapefiles are assigned with impedance value to denote the amount effort that the pedestrian has to make. Higher impedance denotes greater amount of effort made by the pedestrian. For “landtype” shapefile, land that can be trespassed will have a value of 1 whereas those that cannot be trespassed will have a value of 100. For “road” shapefile, expressways will have a value of 100 and 1 otherwise.  For “tampines planning area” shapefile, it will have an impedance value of 0 to indicate a flat land.  
  
=== Software Learning ===
+
Secondly, all the above mentioned shapefiles are rasterized with pixel of 50m x 50m. “landtype” and “road” rasters are then merged using GRASS r.patch. Next, cumulative cost of moving from an origin of a particular route is calculated using GRASS r.walk.
Learn how to use the QGis software, both on the laptop as well as on the mobile phone (to aid data collection)
 
  
=== Data Collection ===
+
With an output of cumulative cost raster layer generated by GRASS r.walk, we will be able to construct the least cost walk path using SAGA least cost paths function. As the resulting line layer did not have distance information, the distance will be calculated using the $length formula in field calculator.
Ez-link data will be provided by LARC while points of interests data sets are publicly available on Openstreetmap, Data.gov.sg, LTA data mall and Onemap. Pedestrian network will be manually mapped out through conducting site visits and with the integration of road network.
 
  
=== Data Exploration ===
+
======Euclidean Distance Method======  
Ez-link data of one week will be segmented into 3 sections for analysis: student, adult and elderly. Each team members has to identify trends and patterns for each profile groups with the use of analytics tools such as JMP and QGIS.
+
Walking distance derived from the Euclidean Distance Method is a straight-line distance from an origin to a destination. The walk paths are constructed using Hub lines function in MMQGIS.
  
=== Geospatial Analysis ===
+
=====2.2 Comparing Time Taken to Walk and Bus Transportation=====
Using QGIS, for the following:<br>
+
For the purpose of our analysis, we will be using the Euclidean Distance method as it is a more straightforward and less tedious method as compared to the Traditional method. We will construct 10 least cost paths using both Traditional and Euclidean Distance method, and compare the difference in the distance by subtracting the distance derived from the Euclidean Distance method from the Traditional method. The average and standard deviation of these differences will be used to calculate the error bound i.e Mean of Differences + 2 x(Standard  Deviation of Differences).  
• Commuters behaviours throughout the entire one week.<br>
 
• Map out paths that residents may take from their houses to identified points of interest<br>
 
• Understand the coverage of street lamps to analyse the safety of walking paths at night. Through measuring the radius of coverage and the height of the lamp post, we can understand how the distribution of the lamp post should be placed.<br>
 
  
 +
As the Traditional method is more representative of the actual path used by the pedestrian as compared to the Euclidean Distance Method, the upper error bound will be added to the distance derived from the Euclidean Distance method instead of taking into account both lower and upper error bound.  After which, we will compare the bus travelling time and the time taken to walk. If the bus travelling time is shorter than the time taken to walk, we are able to deduce that bus commuting and walking has a negative correlation.
  
</font>
 
 
</div>
 
</div>

Latest revision as of 10:30, 17 April 2016

Commutetherelogo.png

HOME

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

Overview

Review of Previous Work

Data

Methodology

Analyse Commuter Patterns

This analysis aims to identify commuter patterns of each demographic groups - students, adults and elderly - as each group has differing interests and preferences in the places to frequent at. Data used for this methodology involves the ez-link and points of interests (POI) data. Given that the places that each demographic groups frequent at varies due to differing interests and preferences, to include points of interests (POI) in this analysis will be helpful to understand which places attract various groups of people at various periods of the week. With that, our team conclude that POI should be places that serve the primary needs of the people.

Analysing commuter patterns is further segregated to two sub-methods:

1. Identifying common destination points

An initial analysis will be conducted to find out the common destinations that commuters travel to given that each demographic groups will have different needs and hence different places they frequent to. A heatmap of the common points will be visualized using QGIS. Areas with a darker intensity of colour would show the areas where many commuters alight at.

2. Identifying travel patterns

Travel patterns are categorized into four different segments: Island wide, inter town, intra town and most frequently travelled trips, where commuters may travel just within Tampines planning area, or within the east region, or island wide. To do so, we will use QGIS to map out.


Analysis of commuter patterns is split into 4 segments:

Segments Description
Island wide Overall commuting activity for each demographic groups as a whole, regardless of place of origin. This will provide an overview of the commuters’ travelling pattern in Singapore.
Inter town Travelling patterns of the commuters whose trips originate from Tampines planning area and end in the East region i.e Bedok,Paya Lebar, Changi, Pasir Ris.
Intra town Travelling patterns of the commuters whose trips originate and end in Tampines planning area i.e Tampines and Simei.
Most frequent travelled trips Commuters who made the same trip for at least four times in a week can be categorised as such. The data for each demographic groups are analysed based on weekdays which has most of the activities reflected on.

Analyse Multimodal Transportation Patterns

1. Distribution Analysis on Multimode Commuters

In order to analyse multimode commuters, we will join the MRT and Bus dataset together using card number attribute, time attribute and date attribute.

1.1 Analyse Transfer Interval

According to Transit Link, a transfer can be from:

  • the MRT/LRT to a bus service,
  • a bus service to another bus service, or
  • a bus service to the MRT/LRT

Transfer interval refers to the amount of time taken for the students to transfer from one mode of transportation to another mode of transportation.This is calculated using the difference between Bus entry time and MRT exit time (for MRT→Bus) and MRT entry time and Bus exit time(for Bus →MRT)

2. Analyse Relationship Between Walking and Bus Commuting

2.1 Least Cost Walk Path Analysis

Due to time constraint, our group will use the Student group as a proxy. In order to analyse the relationship between walking and bus commuting, we will compare the time taken to walk with the bus travelling time. Unlike bus travelling time, the time taken to walk is not provided in the dataset. This will be calculated using the walking distance, which will be derived from least cost walk path analysis, and the average walking speed of students derived from prominent research papers.

Our group has derived two methods to construct the least cost walk path namely the Traditional method and the Euclidean Distance method. Traditional method involves the use of QGIS extension plugins such as GRASS and SAGA whereas the Euclidean Distance involves the use of Hub Lines in MMGIS Plugins.

Traditional Method

Firstly, “landtype”, “road” and “tampines planning area” shapefiles are assigned with impedance value to denote the amount effort that the pedestrian has to make. Higher impedance denotes greater amount of effort made by the pedestrian. For “landtype” shapefile, land that can be trespassed will have a value of 1 whereas those that cannot be trespassed will have a value of 100. For “road” shapefile, expressways will have a value of 100 and 1 otherwise. For “tampines planning area” shapefile, it will have an impedance value of 0 to indicate a flat land.

Secondly, all the above mentioned shapefiles are rasterized with pixel of 50m x 50m. “landtype” and “road” rasters are then merged using GRASS r.patch. Next, cumulative cost of moving from an origin of a particular route is calculated using GRASS r.walk.

With an output of cumulative cost raster layer generated by GRASS r.walk, we will be able to construct the least cost walk path using SAGA least cost paths function. As the resulting line layer did not have distance information, the distance will be calculated using the $length formula in field calculator.

Euclidean Distance Method

Walking distance derived from the Euclidean Distance Method is a straight-line distance from an origin to a destination. The walk paths are constructed using Hub lines function in MMQGIS.

2.2 Comparing Time Taken to Walk and Bus Transportation

For the purpose of our analysis, we will be using the Euclidean Distance method as it is a more straightforward and less tedious method as compared to the Traditional method. We will construct 10 least cost paths using both Traditional and Euclidean Distance method, and compare the difference in the distance by subtracting the distance derived from the Euclidean Distance method from the Traditional method. The average and standard deviation of these differences will be used to calculate the error bound i.e Mean of Differences + 2 x(Standard Deviation of Differences).

As the Traditional method is more representative of the actual path used by the pedestrian as compared to the Euclidean Distance Method, the upper error bound will be added to the distance derived from the Euclidean Distance method instead of taking into account both lower and upper error bound. After which, we will compare the bus travelling time and the time taken to walk. If the bus travelling time is shorter than the time taken to walk, we are able to deduce that bus commuting and walking has a negative correlation.