Difference between revisions of "AY1516 T2 Team WalkThere Project Interim Progress"

From Analytics Practicum
Jump to navigation Jump to search
Line 62: Line 62:
 
===Cleaned Data===
 
===Cleaned Data===
 
As JMP is one of the analytics tools used for this project, to ensure the data type and format for each column is correct, the following is done: <br>
 
As JMP is one of the analytics tools used for this project, to ensure the data type and format for each column is correct, the following is done: <br>
[[File:TeamWalkThere Data.png|400px]]
+
<center>[[File:TeamWalkThere Data.png|400px]]</center>
  
 
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px;  border-left:8px solid #0091b3"><font color= #000000><strong>Findings</strong></font></div>==
 
==<div style="font-family:Open Sans, Arial, sans-serif; background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px;  border-left:8px solid #0091b3"><font color= #000000><strong>Findings</strong></font></div>==

Revision as of 00:24, 29 February 2016

TeamWalkThereLogo.jpg

HOME

PROJECT OVERVIEW

PROJECT MANAGEMENT

DOCUMENTATION

ANALYSIS & FINDINGS

Interim Progress

Final Progress

Data Cleaning

Extraction of Data

In collaboration with the Living Analytics Research Centre (LARC), a week’s (9-15 January 2012) of Ez-link data was provided for this study. To deal with the large data size of over 40million records, extraction of data was carried out in sets:

  1. Extract by the category of demographic groups - student, adult, elderly;
  2. By individual days;
  3. Bus as the only transit mode

Given that the data provided are in the raw format, joining of tables is required to gather a more complete view of the data. The following diagram shows how joining of tables is carried out.

TeamWalkthere ED 1.png

Cleaning of Data

Removing of Duplicates

There are duplicates of “location_id” in the “Location id coordinates” data due to the updates in the coordinates of the bus stops as reflected with the updated “entry_date” recorded in the data. The “remove duplicates” tool in excel is used to remove the repeated “location_id”. With that, there are 4903 updated records.

Removing Undefined Records

There are instances where commuters do not tap out when alighting at their destination. These were recorded as “-99” and “?” in various fields where data on destinations are recorded. These records were removed as limited analysis and findings would be gathered out.

Bus stops in Tampines planning area

As the focus of this study involves analysing commuters’ behaviour only in Tampines planning area where commuters’ activities begin from, the application of GIS tools to extract relevant bus stops is required. Taking into consideration of commuters travelling to either Bedok, Pasir Ris or Changi, which are located just beside Tampines planning area, bus stops in these areas were extracted for the analysis. Bus stops in the Tampines planning area are where commuters begin their travelling activity while bus stops in the east region reflects on the commuters’ destinations.

These were the steps taken to extract relevant bus stops:

  1. Join “Location id coordinates” data with “Location mapping11” data with “location_id” as the common field.
  2. Using QGIS, upload the bus stop data and clip it with URA’s subzone shapefile. Two separate layers are created - bus stops in Tampines planning area and bus stops in the east region.
  3. The bus stops layers are then saved with the coordinate reference system set to SVY21

Cleaned Data

As JMP is one of the analytics tools used for this project, to ensure the data type and format for each column is correct, the following is done:

TeamWalkThere Data.png

Findings

Text here