IS428 AY2019-20T1 Assign Victor Lin Data
|
|
|
|
|
Contents
Data Provided
The data provided contains readings (from 12 am on April 6, 2020 to 11:59 pm on April 10, 2020), the locations of the sensors, and the map of the neighbourhood:
Dataset | Data Attributes |
---|---|
StaticSensorLocations.csv |
|
StaticSensorReadings.csv |
|
MobileSensorReadings.csv |
|
StHimarkNeighbourhoodShapefile |
|
Data Preparation
Combining the Static Sensor Readings with the Static Sensor Locations
Within the static sensor readings data, there was no information on where the sensors are. Using the Union Tool, each sensor is now assigned a specific latitude and longitude value based on sensor-id. The resulting output is the saved down as a tableau extract (.tde file).
Combining Static Sensor Data with Mobile Sensor Data
In order to be able to analyse the data in its entirety, the two datasets have to be fused before being put through Tableau. As such, the data is prepped and combined in this step:
First, a new calculated field is created to avoid confusion of the sensor IDs, since both the static and mobile sensors use numeric IDs. The "S" prefix is added to the sensor ID for static sensors. The columns containing the old sensor IDs are then removed from the dataset to prep the dataset for merging. | |
Next, the same is done for mobile sensor data. A new calculated field is created and a "M" prefix is added to the sensor IDs to differentiate them from their static counterparts. The old column containing the numeric ID is then removed. | |
Finally, the two datasets are combined into one large dataset using the union tool. The "Units" column is also removed from the dataset as it does not value-add to the analysis of the data. The resulting output is the saved down as a tableau extract (.tde file). |
Aggregation of Data Values
With over 4 million rows of data in the combined dataset, the running time for Tableau will take very long to process a query.
In order to reduce the total number of rows, I aggregated the data by taking the average of the readings within a 1-minute interval for each sensor ID. The assumption made here was that the mobile sensors will remain within the same location during the 1-minute intervals.