Difference between revisions of "Time-series Analysis on Singapore Public Transportation Train Network Data Source"

From Analytics Practicum
Jump to navigation Jump to search
Line 44: Line 44:
  
 
Below shows the screenshot of the raw data set for both bus and MRT transactions which is worth approximately 33 millions rows of data. <br />
 
Below shows the screenshot of the raw data set for both bus and MRT transactions which is worth approximately 33 millions rows of data. <br />
[[File:CTS Pic1.png |600px]] <br />
+
[[File:CTS Pic1.png |500px]] <br />
  
 
After filtering to only include the trains transactions, the amount of data reduced to approximately 10 millions rows.
 
After filtering to only include the trains transactions, the amount of data reduced to approximately 10 millions rows.
[[File:CTS Pic2.png |600px]] <br />
+
[[File:CTS Pic2.png |500px]] <br />
 +
 
 +
{| style="background-color:#FFFFFF ; color:#FFFFFF  padding: 1px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
 +
| style="padding:0.3em; font-family:Georgia; font-size:100%; border-bottom:2px solid #626262; border-left:2px #FFFFFF; background: #FFFFFF; text-align:left;" width="20%" | <font color="#FE2EC8" size="3em">Exploratory Data Analysis Data Preparation<br></font>
 +
|}
 +
[[File:CTS Pic3.png |400px]] <br />
 +
The above showed the process of the EDA data preparation and there are several steps that needs to be done before performing descriptive analysis or running summary statistics. Here are the steps taken:
 +
*Extract hour of entry_time and exit_time
 +
*Extract the minutes from entry_time and exit_time
 +
*Recoded the entry and exit time of midnight to 24 instead of 00
 +
*Extract the day of the week from the entry_date
 +
*Map location_id to retrieve location_name from location_mapping table
 +
*Combine all the recoded columns into a single data file

Revision as of 22:19, 21 April 2015

Home

Project Overview

 

Findings

 

Project Documentation

 

Project Management

Background Data Source Methodology
Source

Land Transport Authority (LTA) provides the data sets through Learning Analytics Research Centre (LARC) research labs. The dataset provided by LARC is currently from a MySQL database which consists of the following tables:

  • Bus_service_mapping
  • Location_gis_mapping
  • Location_mapping
  • Lta_ride

The dataset is a weeks’(1st November 2011 – 6th November 2011) worth of smart card (EZ-Link) transactions used in Singapore’s public transport and it consists of both bus and MRT transactions. As we are only interested in the MRT transactions, we will be looking into 2 tables basically Location_mapping and Lta_ride. We extracted the data by taking a database dump and added a conditional statement to filter transport_type by "RTS" to only include the train dataset.

Below shows the screenshot of the raw data set for both bus and MRT transactions which is worth approximately 33 millions rows of data.
CTS Pic1.png

After filtering to only include the trains transactions, the amount of data reduced to approximately 10 millions rows. CTS Pic2.png

Exploratory Data Analysis Data Preparation

CTS Pic3.png
The above showed the process of the EDA data preparation and there are several steps that needs to be done before performing descriptive analysis or running summary statistics. Here are the steps taken:

  • Extract hour of entry_time and exit_time
  • Extract the minutes from entry_time and exit_time
  • Recoded the entry and exit time of midnight to 24 instead of 00
  • Extract the day of the week from the entry_date
  • Map location_id to retrieve location_name from location_mapping table
  • Combine all the recoded columns into a single data file