Difference between revisions of "ISSS608 2016-17 T3 Assign ZHOU YUHUI Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 34: Line 34:
 
Before starting any data preparation or data analysis, I find it necessary to closely observe the dataset itself first. Here are some points I would like to discuss after observing the data:
 
Before starting any data preparation or data analysis, I find it necessary to closely observe the dataset itself first. Here are some points I would like to discuss after observing the data:
  
1.Car ID is a one-time ID only
+
<b>1.Car ID is a one-time ID only</b>
 +
 
 
The Car ID in this dataset is not an identifier for each car, but for each trip of one car. In other words, if the same car enters and exits the park twice, it should have 2 Car IDs. This is because I found that the last 6 digit of the Car ID are exactly same as the time of timestamp.
 
The Car ID in this dataset is not an identifier for each car, but for each trip of one car. In other words, if the same car enters and exits the park twice, it should have 2 Car IDs. This is because I found that the last 6 digit of the Car ID are exactly same as the time of timestamp.
 
<p>[[File:Observecarid.PNG|400px|center]]</p>
 
<p>[[File:Observecarid.PNG|400px|center]]</p>
 
This should make it easier for us to analyze the pattern of cars as we do not need to make extra efforts to separate multiple trips of one same car. Also it makes it easier to spot those suspicious ones.
 
This should make it easier for us to analyze the pattern of cars as we do not need to make extra efforts to separate multiple trips of one same car. Also it makes it easier to spot those suspicious ones.
  
2. Observation Period:
+
<b>2. Observation Period:</b>
  
 
The Observation Period is from May 2015 to May 2016. There are altogether 13 months in the observation period, May counted for twice. Therefore when using “month” as a time dimension, we should also look at the year, otherwise it would be misleading.
 
The Observation Period is from May 2015 to May 2016. There are altogether 13 months in the observation period, May counted for twice. Therefore when using “month” as a time dimension, we should also look at the year, otherwise it would be misleading.
  
3.Rangerstop0 and Rangerstop2 are not restricted gates.
+
<b>3.Rangerstop0 and Rangerstop2 are not restricted gates</b>
  
 
As mentioned in the Data Description, “Ranger-stops.  These sensors represent working areas for the Rangers, so you will often see a Ranger-stop sensor at the end of a road managed by a Gate. Some Ranger-stops are in other locations however, so these sensors record all traffic passing by.”  
 
As mentioned in the Data Description, “Ranger-stops.  These sensors represent working areas for the Rangers, so you will often see a Ranger-stop sensor at the end of a road managed by a Gate. Some Ranger-stops are in other locations however, so these sensors record all traffic passing by.”  
  
 
<p>[[File:Observemap.PNG|500px|center]]</p>
 
<p>[[File:Observemap.PNG|500px|center]]</p>

Revision as of 23:27, 16 July 2017

Qbird.jpg


VAST Challenge 2017:Mystery at the Wildlife Preserve

Background

Methodology & Data Preparation

Insights

Conclusion

 


Data Preparation

0.Observe The Data

Before starting any data preparation or data analysis, I find it necessary to closely observe the dataset itself first. Here are some points I would like to discuss after observing the data:

1.Car ID is a one-time ID only

The Car ID in this dataset is not an identifier for each car, but for each trip of one car. In other words, if the same car enters and exits the park twice, it should have 2 Car IDs. This is because I found that the last 6 digit of the Car ID are exactly same as the time of timestamp.

Observecarid.PNG

This should make it easier for us to analyze the pattern of cars as we do not need to make extra efforts to separate multiple trips of one same car. Also it makes it easier to spot those suspicious ones.

2. Observation Period:

The Observation Period is from May 2015 to May 2016. There are altogether 13 months in the observation period, May counted for twice. Therefore when using “month” as a time dimension, we should also look at the year, otherwise it would be misleading.

3.Rangerstop0 and Rangerstop2 are not restricted gates

As mentioned in the Data Description, “Ranger-stops. These sensors represent working areas for the Rangers, so you will often see a Ranger-stop sensor at the end of a road managed by a Gate. Some Ranger-stops are in other locations however, so these sensors record all traffic passing by.”

Observemap.PNG