Difference between revisions of "AY1516 T2 Team WalkThere Project Interim Progress"

From Analytics Practicum
Jump to navigation Jump to search
Line 61: Line 61:
 
<li>The bus stops layers are then saved with the coordinate reference system set to SVY21</li></ol>
 
<li>The bus stops layers are then saved with the coordinate reference system set to SVY21</li></ol>
 
===Cleaned Data===
 
===Cleaned Data===
 +
====Standardised Format====
 
As JMP is one of the analytics tools used for this project, to ensure the data type and format for each column is correct, the following is done: <br>
 
As JMP is one of the analytics tools used for this project, to ensure the data type and format for each column is correct, the following is done: <br>
 
<center>[[File:TeamWalkThere Data.png|400px]]</center>
 
<center>[[File:TeamWalkThere Data.png|400px]]</center>
 +
====Grouping Records====
 +
<center>[[File:TeamWalkThere GroupingRecords.png|600px]]</center>
 +
====Keeping Relevant Data====
 +
Given the aim of this project is to assess the walkability in Tampines planning area, having long travelling routes included in the analysis will not be helpful as it is understandable that commuters will choose to commute by bus than walk for long distances. As such, a threshold of travelling distance is set at 1 km. Routes that are more than 1 km are removed.
 +
 +
Besides setting a threshold to the travelling distance, to determine what are the commonly travelled routes, the top 97.5% distribution of the data (99.5% for adults’ ez-link data due to the large data size).
  
 
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px;  border-left:8px solid #0091b3"><font color= #000000><strong>Findings</strong></font></div>==
 
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px;  border-left:8px solid #0091b3"><font color= #000000><strong>Findings</strong></font></div>==

Revision as of 00:28, 29 February 2016

TeamWalkThereLogo.jpg

HOME

PROJECT OVERVIEW

PROJECT MANAGEMENT

DOCUMENTATION

ANALYSIS & FINDINGS

Interim Progress

Final Progress

Data Cleaning

Extraction of Data

In collaboration with the Living Analytics Research Centre (LARC), a week’s (9-15 January 2012) of Ez-link data was provided for this study. To deal with the large data size of over 40million records, extraction of data was carried out in sets:

  1. Extract by the category of demographic groups - student, adult, elderly;
  2. By individual days;
  3. Bus as the only transit mode

Given that the data provided are in the raw format, joining of tables is required to gather a more complete view of the data. The following diagram shows how joining of tables is carried out.

TeamWalkthere ED 1.png

Cleaning of Data

Removing of Duplicates

There are duplicates of “location_id” in the “Location id coordinates” data due to the updates in the coordinates of the bus stops as reflected with the updated “entry_date” recorded in the data. The “remove duplicates” tool in excel is used to remove the repeated “location_id”. With that, there are 4903 updated records.

Removing Undefined Records

There are instances where commuters do not tap out when alighting at their destination. These were recorded as “-99” and “?” in various fields where data on destinations are recorded. These records were removed as limited analysis and findings would be gathered out.

Bus stops in Tampines planning area

As the focus of this study involves analysing commuters’ behaviour only in Tampines planning area where commuters’ activities begin from, the application of GIS tools to extract relevant bus stops is required. Taking into consideration of commuters travelling to either Bedok, Pasir Ris or Changi, which are located just beside Tampines planning area, bus stops in these areas were extracted for the analysis. Bus stops in the Tampines planning area are where commuters begin their travelling activity while bus stops in the east region reflects on the commuters’ destinations.

These were the steps taken to extract relevant bus stops:

  1. Join “Location id coordinates” data with “Location mapping11” data with “location_id” as the common field.
  2. Using QGIS, upload the bus stop data and clip it with URA’s subzone shapefile. Two separate layers are created - bus stops in Tampines planning area and bus stops in the east region.
  3. The bus stops layers are then saved with the coordinate reference system set to SVY21

Cleaned Data

Standardised Format

As JMP is one of the analytics tools used for this project, to ensure the data type and format for each column is correct, the following is done:

TeamWalkThere Data.png

Grouping Records

TeamWalkThere GroupingRecords.png

Keeping Relevant Data

Given the aim of this project is to assess the walkability in Tampines planning area, having long travelling routes included in the analysis will not be helpful as it is understandable that commuters will choose to commute by bus than walk for long distances. As such, a threshold of travelling distance is set at 1 km. Routes that are more than 1 km are removed.

Besides setting a threshold to the travelling distance, to determine what are the commonly travelled routes, the top 97.5% distribution of the data (99.5% for adults’ ez-link data due to the large data size).

Findings

Text here