Difference between revisions of "AY1516 T2 Team CommuteThere Project Data Preparation"
Line 93: | Line 93: | ||
SQL statement used for retrieving the data: | SQL statement used for retrieving the data: | ||
''SELECT * FROM lta_ride_data_anly482.bus_service_mapping;'' | ''SELECT * FROM lta_ride_data_anly482.bus_service_mapping;'' | ||
+ | |||
+ | After cleaning the data, an additional step to sieve out relevant bus stops is required. As our analysis aims to identify the commuters’ behaviour when travelling inter-town, intra-town and island wide, we needed to sieve out bus stops that are within the former three mentioned. Intra town refers to commuters travelling within Tampines planning area; inter town refers to commuters travelling to neighbouring planning areas such as Paya Lebar, Pasir Ris, Bedok and Changi; island wide refers to commuters travelling outside of the east region. With that, using QGIS, we uploaded the bus stop data and clipped it with URA’s subzone shapefile. The bus stop data is then filtered accordingly as shown below. | ||
+ | |||
+ | <center> | ||
+ | {| style="background-color:#ffffff ; margin: 3px 10px 3px 10px; font-size:15px" width="80%" | ||
+ | |- style="background:#f2f4f4; font-size:17px" | ||
+ | |||
+ | |- | ||
+ | | style="font-family:Open Sans, Arial, sans-serif; font-size:16px; text-align: center; padding:5px; border-bottom:solid #0091b3" | | ||
+ | | style="font-family:Open Sans, Arial, sans-serif; font-size:16px; text-align: center; padding:5px; border-bottom:solid #0091b3" | <font color="#3c3c3c"><strong>Planning Area(s):</strong></font> | ||
+ | |||
+ | |- | ||
+ | | style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | [[File:TeamCommute_TAMP.png|250px]] <br> <i>Intra town bus stops</i> | ||
+ | | style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Tampines | ||
+ | |||
+ | |- | ||
+ | | style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | [[File:TeamCommute_East.png|250px]] <br> <i>Inter town bus stops </i> | ||
+ | | style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Paya Lebar, Bedok, Pasir Ris, Changi, Changi Bay | ||
+ | |||
+ | |- | ||
+ | | style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | [[File:TeamCommute_ISLANDW.png|250px]] <br> <i>Island wide bus stops </i> | ||
+ | | style="font-family:Open Sans, Arial, sans-serif; text-align: center; padding:3px 10px; border-bottom:solid 1px #d8d8d8" | Ang Mo Kio, Bishan, Boon Lay, Bukit Batok, Bukit Merah, Bukit Panjang, Bukit Timah, Central Water Catchment, Choa Chu Kang, Clementi, Downtown Core, Geylang, Hougang, Jurong East, Jurong West, Kallang, Lim Chu Kang, Mandai, Marina East, Marina South, Marine Parade, Museum, Newton, Novena, Orchard, Outram, Pioneer, Punggol, Queenstown, River Valley, Rochor, Seletar, Sengkang, Serangoon, Singapore River, Simpang, Sungei Kadut, Straits View, Tanglin, Tengah, Toa Payoh, Tuas, Western Water Catchment, Woodlands, Yishun | ||
+ | |||
+ | |} | ||
+ | </center> | ||
=== Location_gis_mapping === | === Location_gis_mapping === |
Revision as of 02:03, 17 April 2016
Contents
Main Data Sets
Name of Data | Nature of Data | Number of Records (after cleaning) |
bus_service_mapping | The actual bus service numbers are provided alongside with the bus service ids together with the date of entry of data into the database. The more recent the data depicts a more updated record. | 335 |
location_gis_mapping | Coordinates of bus stops in WGS84 coordinate system are provided with the location_id of each bus stops and the date of entry when the data is entered into the database. The more recent the data depicts a more updated record. | 4903 |
location _mapping | Names of each MRT stations and bus stops are provided together with the location_id. Date of entry of data when entered into the database is provided. The more recent the data depicts a more updated record. | 5070 |
ride_data_20120109_20120115 | Ez-link transactions of commuters of all types of commuters category from 9 Jan 2012 to 15 Jan 2012. Each transactions displays the tap-in and top-out of ez-link card. | Millions |
Anomalies and Data Cleaning
Bus_service _mapping
An example of a record with duplicated entries |
Final set of data where duplicates are removed |
Upon retrieving the data from the database, there was a total of 29522 records. However, the large number of records were attributed to duplicates of records where new entry of bus service mapping is entered with the date where data is stored in the database. To remove duplicates, dates are sorted according to the most recent dates first followed by the later dates. There are 335 unique records after removing the duplicates.
SQL statement used for retrieving the data: SELECT * FROM lta_ride_data_anly482.bus_service_mapping;
After cleaning the data, an additional step to sieve out relevant bus stops is required. As our analysis aims to identify the commuters’ behaviour when travelling inter-town, intra-town and island wide, we needed to sieve out bus stops that are within the former three mentioned. Intra town refers to commuters travelling within Tampines planning area; inter town refers to commuters travelling to neighbouring planning areas such as Paya Lebar, Pasir Ris, Bedok and Changi; island wide refers to commuters travelling outside of the east region. With that, using QGIS, we uploaded the bus stop data and clipped it with URA’s subzone shapefile. The bus stop data is then filtered accordingly as shown below.
Location_gis_mapping
Just like bus_service_mapping, location_gis_mapping data also has duplicated entries of the same location_id but with different enrty_date. There are 442458 records upon retrieving this data. There is a total of records upon retrieving this data set from the database. After removing duplicates that has the older entry_dates, there is a total of 4903 records.
There are 27 records that have (0,0) as the coordinates. This appears to be an anomaly because these coordinates do not fit into the coordinates of Singapore island. With that, these records are also removed from the data for this project. As such, ez-link records in ride_data_20120109_20120115 with these location_ids are also removed since this will not help in the analysis of commuters’ patterns.
SQL statement used for retrieving the data: SELECT * FROM lta_ride_data_anly482.location_gis_mapping;