Time-series Analysis on Singapore Public Transportation Train Network Data Source

From Analytics Practicum
Revision as of 22:11, 21 April 2015 by Trecia.koh.2012 (talk | contribs)
Jump to navigation Jump to search

Home

Project Overview

 

Findings

 

Project Documentation

 

Project Management

Background Data Source Methodology
Source

Land Transport Authority (LTA) provides the data sets through Learning Analytics Research Centre (LARC) research labs. The dataset provided by LARC is currently from a MySQL database which consists of the following tables:

  • Bus_service_mapping
  • Location_gis_mapping
  • Location_mapping
  • Lta_ride

The dataset is a weeks’(1st November 2011 – 6th November 2011) worth of smart card (EZ-Link) transactions used in Singapore’s public transport and it consists of both bus and MRT transactions. As we are only interested in the MRT transactions, we will be looking into 2 tables basically Location_mapping and Lta_ride. We extracted the data by taking a database dump and added a conditional statement to filter transport_type by "RTS" to only include the train dataset.

Below shows the screenshot of the raw data set for both bus and MRT transactions which is worth approximately 33 millions rows of data.
CTS Pic1.png

After filtering to only include the trains transactions, the amount of data reduced to approximately 10 millions rows. CTS Pic2.png