IS428 AY2019-20T1 Assign Ho Jue Hong Data Cleaning
MINI-CHALLENGE 2: CITIZEN SCIENCE TO THE RESCUE
Data Flow
Merging of Static Sensor Location and Readings
Firstly to make the dataset easier to use, I would clean the dataset first, removing the redundant columns. After looking through the datasets given, the two datasets StaticSensorLocation.csv and StaticSensorReading.csv have a similar column, Sensor-id. Hence we are able to do an inner join between both datasets and remove the extra columns.
The applied join clause would be the Sensor-id columns as shown above. Now that we have merged both columns together, we need to remove the extra sensor-id column.
Here we can see that we made 4 changes, the first change is to add a calculated field where we use the formula "Static"+str([Sensor-id]) to create a [new_id] to make this unique. Hence when I combine with the mobile reading dataset later, there won't be any clashes.
The second thing I did was to remove [sensor-id] since we are using the new column [new_id]
The third thing I did was to remove the other [sensor-id] field.
Lastly, the removal of [Units] column because it serves no other purpose
Cleaning Mobile Sensor Reading
To be able to union both the new output of Static-sensors.csv and mobile.csv, we have the make sure the sensor-keys are well labelled. Hence the first thing we did here was to create a calculated field where [New_id] = "Mobile"+str([Sensor-id]) this creates the the ID for mobile sensors.
The second thing we did was once again to remove the [Units] column as it serves no purpose anymore.
Lastly, because we have the new field [New_id] that we created [Sensor-id] is redudant and hence it will be removed too.