Difference between revisions of "ISSS608 2016-17 T3 Assign ZHOU YUHUI Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 64: Line 64:
  
 
<b>Enter/Exit/In:</b> Label every row as “Enter” if it’s the first timestamp of a Car ID, “Exit” if it is the last timestamp of a Car ID, “In” for the rest.
 
<b>Enter/Exit/In:</b> Label every row as “Enter” if it’s the first timestamp of a Car ID, “Exit” if it is the last timestamp of a Car ID, “In” for the rest.
<p>[[File:Dpduration.PNG|500px|center]]</p>
+
<p>[[File:Dpenterexitin.PNG|500px|center]]</p>
  
 
<b>Camping:</b> Mark each row whether it is at a campsite or not.
 
<b>Camping:</b> Mark each row whether it is at a campsite or not.
 +
<p>[[File:Dpcamping.PNG|500px|center]]</p>

Revision as of 23:34, 16 July 2017

Qbird.jpg


VAST Challenge 2017:Mystery at the Wildlife Preserve

Background

Methodology & Data Preparation

Insights

Conclusion

 


Data Preparation

0.Observe The Data

Before starting any data preparation or data analysis, I find it necessary to closely observe the dataset itself first. Here are some points I would like to discuss after observing the data:

1) Car ID is a one-time ID only

The Car ID in this dataset is not an identifier for each car, but for each trip of one car. In other words, if the same car enters and exits the park twice, it should have 2 Car IDs. This is because I found that the last 6 digit of the Car ID are exactly same as the time of timestamp.

Observecarid.PNG

This should make it easier for us to analyze the pattern of cars as we do not need to make extra efforts to separate multiple trips of one same car. Also it makes it easier to spot those suspicious ones.

2) Observation Period:

The Observation Period is from May 2015 to May 2016. There are altogether 13 months in the observation period, May counted for twice. Therefore when using “month” as a time dimension, we should also look at the year, otherwise it would be misleading.


3) Rangerstop0 and Rangerstop2 are not restricted gates


As mentioned in the Data Description, “Ranger-stops. These sensors represent working areas for the Rangers, so you will often see a Ranger-stop sensor at the end of a road managed by a Gate. Some Ranger-stops are in other locations however, so these sensors record all traffic passing by.”

Observemap.PNG

According to the map, ranger-stop 0 and 2 are those which record all traffic passing by, thus they don’t represent restricted areas.

1.Data Preparation

1) Per Timestamp Level data preparation:

Sequence Number: Per Car ID, label the sequence number of each gate it visited.

Dpsequence.PNG

Duration: Calculate the time it takes for the car to move from one gate to the next.

Dpduration.PNG

Enter/Exit/In: Label every row as “Enter” if it’s the first timestamp of a Car ID, “Exit” if it is the last timestamp of a Car ID, “In” for the rest.

Camping: Mark each row whether it is at a campsite or not.

Dpcamping.PNG