ISSS608 2016-17 T3 Assign ZHOU YUHUI Data Preparation
|
|
|
|
Data Preparation
0.Observe The Data
Before starting any data preparation or data analysis, I find it necessary to closely observe the dataset itself first. Here are some points I would like to discuss after observing the data:
1.Car ID is a one-time ID only
The Car ID in this dataset is not an identifier for each car, but for each trip of one car. In other words, if the same car enters and exits the park twice, it should have 2 Car IDs. This is because I found that the last 6 digit of the Car ID are exactly same as the time of timestamp.
This should make it easier for us to analyze the pattern of cars as we do not need to make extra efforts to separate multiple trips of one same car. Also it makes it easier to spot those suspicious ones.
2. Observation Period:
The Observation Period is from May 2015 to May 2016. There are altogether 13 months in the observation period, May counted for twice. Therefore when using “month” as a time dimension, we should also look at the year, otherwise it would be misleading.
3.Rangerstop0 and Rangerstop2 are not restricted gates
As mentioned in the Data Description, “Ranger-stops. These sensors represent working areas for the Rangers, so you will often see a Ranger-stop sensor at the end of a road managed by a Gate. Some Ranger-stops are in other locations however, so these sensors record all traffic passing by.”