Difference between revisions of "AY1516 T2 Team CommuteThere Project Data Preparation"
Jump to navigation
Jump to search
Line 62: | Line 62: | ||
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Arial, sans-serif; border-radius: 7px; text-align:left"> | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Arial, sans-serif; border-radius: 7px; text-align:left"> | ||
=== Bus_service _mapping === | === Bus_service _mapping === | ||
+ | {| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;" | ||
+ | |- | ||
+ | | style="text-align: center;" | [[File:TeamCommute-Bus_service_mapping1.png|250px]] | ||
+ | ''An example of a record with duplicated entries''<br> | ||
+ | ''of different entry_date'' | ||
+ | | style="text-align: center;" | [[File:TeamCommute-Bus_service_mapping2.png|250px]] | ||
+ | ''Final set of data where duplicates are removed''<br> | ||
+ | ''with the most recent entry_date'' | ||
+ | |} | ||
+ | Upon retrieving the data from the database, there was a total of 29522 records. However, the large number of records were attributed to duplicates of records where new entry of bus service mapping is entered with the date where data is stored in the database. To remove duplicates, dates are sorted according to the most recent dates first followed by the later dates. There are 335 unique records after removing the duplicates. | ||
+ | |||
+ | SQL statement used for retrieving the data: | ||
+ | ''SELECT * FROM lta_ride_data_anly482.bus_service_mapping;'' | ||
=== Location_gis_mapping === | === Location_gis_mapping === |
Revision as of 01:02, 17 April 2016
Contents
Main Data Sets
Name of Data | Nature of Data | Number of Records (after cleaning) |
---|---|---|
bus_service_mapping | The actual bus service numbers are provided alongside with the bus service ids together with the date of entry of data into the database. The more recent the data depicts a more updated record. | 335 |
location_gis_mapping | Coordinates of bus stops in WGS84 coordinate system are provided with the location_id of each bus stops and the date of entry when the data is entered into the database. The more recent the data depicts a more updated record. | 4903 |
location _mapping | Names of each MRT stations and bus stops are provided together with the location_id. Date of entry of data when entered into the database is provided. The more recent the data depicts a more updated record. | 5070 |
ride_data_20120109_20120115 | Ez-link transactions of commuters of all types of commuters category from 9 Jan 2012 to 15 Jan 2012. Each transactions displays the tap-in and top-out of ez-link card. | Millions |
Anomalies and Data Cleaning
Bus_service _mapping
An example of a record with duplicated entries |
Final set of data where duplicates are removed |
Upon retrieving the data from the database, there was a total of 29522 records. However, the large number of records were attributed to duplicates of records where new entry of bus service mapping is entered with the date where data is stored in the database. To remove duplicates, dates are sorted according to the most recent dates first followed by the later dates. There are 335 unique records after removing the duplicates.
SQL statement used for retrieving the data: SELECT * FROM lta_ride_data_anly482.bus_service_mapping;