Time-series Analysis on Singapore Public Transportation Train Network Data Source
Background | Data Source | Methodology |
---|
Source |
Land Transport Authority (LTA) provides the data sets through Learning Analytics Research Centre (LARC) research labs. The dataset provided by LARC is currently from a MySQL database which consists of the following tables:
- Bus_service_mapping
- Location_gis_mapping
- Location_mapping
- Lta_ride
The dataset is a weeks’(1st November 2011 – 6th November 2011) worth of smart card (EZ-Link) transactions used in Singapore’s public transport and it consists of both bus and MRT transactions. As we are only interested in the MRT transactions, we will be looking into 2 tables basically Location_mapping and Lta_ride. We extracted the data by taking a database dump and added a conditional statement to filter transport_type by "RTS" to only include the train dataset.
Below shows the screenshot of the raw data set for both bus and MRT transactions which is worth approximately 33 millions rows of data.
After filtering to only include the trains transactions, the amount of data reduced to approximately 10 millions rows.