Difference between revisions of "ANLY482 AY2017-18T2 Group08 : Project Findings"

From Analytics Practicum
Jump to navigation Jump to search
m
m
Line 33: Line 33:
  
 
==<div style="background: #404040; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#ffffff >2.0 About the Data</font></div>==
 
==<div style="background: #404040; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size: 16px"><font color=#ffffff >2.0 About the Data</font></div>==
 
'''<big><font color="#fcb706">2.1 Revised Metadata</font></big>'''
 
  
 
The csv files titled ‘Group08_oBike_InterimData’ contains four sheets with descriptions as follows:-
 
The csv files titled ‘Group08_oBike_InterimData’ contains four sheets with descriptions as follows:-
Line 49: Line 47:
 
(iv) 3.1 Cross Checking
 
(iv) 3.1 Cross Checking
 
This sheet is used internally for our cross checking between ‘1.0 Cleaned Data’ and ‘2.0 Original Data’ to ensure that no error occurred when duplicating the data. Using the ‘LOCATION’ column which contains all unique entries of addresses, we cross checked to ensure that all the entries in the ‘1.0 Cleaned Data’ are found in the ‘2.0 Original Data’ and vice versa.  
 
This sheet is used internally for our cross checking between ‘1.0 Cleaned Data’ and ‘2.0 Original Data’ to ensure that no error occurred when duplicating the data. Using the ‘LOCATION’ column which contains all unique entries of addresses, we cross checked to ensure that all the entries in the ‘1.0 Cleaned Data’ are found in the ‘2.0 Original Data’ and vice versa.  
 +
 +
 +
'''<big><font color="#fcb706">2.1 Revised Metadata</font></big>'''
 +
  
  

Revision as of 00:23, 26 February 2018

Homepage

Our Team

Project Overview

Project Findings

Project Management

Documentation

Other AY2017-18 T2 Projects

Interim Final


1.0 Project Recap

oBike, Singapore’s first home-grown stationless bicycle sharing company, began their operations in January 2017. However, in recent months, Singapore’s Land and Transport Authority (LTA) issued new rules and regulations that require bicycles to be parked in designated yellow boxes around the island. LTA enforcers, together with authorities from Town Council and NParks, survey the island, and issue tickets to bike-sharing companies in the event where bicycles are found to be outside of these yellow boxes. From the time a ticket is issued, oBike has a mere four hours to move their illegally-parked bicycles. Failure to do so will incur hefty fines.

As such, this practicum seeks to achieve the following objectives:- (i) Identify hotspots for illegal parking cases (ii) Project the illegal parking patterns by analysing historical data (iii) Determine suitable areas for yellow boxes to be painted To achieve the above objectives however, we had to first clean the data given and perform exploratory data analysis (EDA). That said, this interim report seeks to document the data cleaning process as well as EDA performed thus far. In addition, any key insights derived till date will also be shared.

2.0 About the Data

The csv files titled ‘Group08_oBike_InterimData’ contains four sheets with descriptions as follows:-

(i) 1.0 Cleaned Data Cleaned data refers to data that has already been cleaned via our data cleaning process, which will be described further in Section 4. The format for ‘1.0 Cleaned Data’ is similar to the original data given by oBike, except there are five newly inserted columns – ‘Original ID’, ‘New ID’, ‘Day’, ‘Updated Addresses’ and ‘Time Period’. This sheet will be used for analysis purposes. Please refer to Figure 1 below for the revised metadata.

(ii) 2.0 Original Data This sheet contains the original, raw data given, with the exception of the row ‘Original ID’ that was inserted for tracking purposes. There is a total of 14 columns in this sheet, inclusive of ‘Original ID.’ Please refer to Figure 1 below for the revised metadata.   (iii) 3.0 Appendix & Notes The purpose of this sheet is to highlight to any reader on the changes made to the original data set to allow for better comprehension of the data cleaning process. It contains notes relating to data points that were duplicated or removed.

(iv) 3.1 Cross Checking This sheet is used internally for our cross checking between ‘1.0 Cleaned Data’ and ‘2.0 Original Data’ to ensure that no error occurred when duplicating the data. Using the ‘LOCATION’ column which contains all unique entries of addresses, we cross checked to ensure that all the entries in the ‘1.0 Cleaned Data’ are found in the ‘2.0 Original Data’ and vice versa.


2.1 Revised Metadata


2.2 Summary Statistics for '1.0 Original Data'

3.0 Data Quality Issues & Consequences

3.1 Original Address / Location


2.2 Number of Bikes


2.3 Authority


2.1 Status


2.1 Codes

4.0 Data Cleaning & Preparation



5.0 Exploratory Data Analysis and Interim Findings



6.0 Going Forward



7.0 Conclusion