ISS608 2017-18 T3 G2 Assign ChenYanchong Data Prep
Data overview
The following lists the data used for the analysis. The data is obtained from VAST Challenge 2018: Mini-Challenge 2.
Data description
Tools
R, Excel, Tableau
Data understanding
Integrated the two tables together, a record can be read as below format:
On the same day, a chemical could be examined multiple times, at most three times per day (as founded). One possible reason is that the sensor was triggered three times one day (since the ids are different) but at different date time, since timestamp information is not provided, this assumption can not be verified. However, to be fair, the when grouping data, the mean value is taken instead of the summation of the values.
Data preparation
1. Filtered data from year 2010 to year 2016 with below reasons:
In tableau, created a new calculated field, named "measure count" with formula COUNTD([Measures])
this value of this field reveals the number of measures appear in each year
Drag the calculated field (measure count) on rows shelf, sample date on column shelf and location on color card as well as on detail card From graph above, noted that the sensors in Achara, Decha and Tansanee started to work on year 2009. One year 2009, all the locations have abnormal high starting point, in other words, the number of chemical types in each location are higher than normal value. To investigate further, such anomaly could be caused by storm water [1]or urban flooding.[2]