IS428 AY2019-20T1 Assign Ho Jue Hong Data Cleaning

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Jh powerplant.jpg MINI-CHALLENGE 2: CITIZEN SCIENCE TO THE RESCUE

PROBLEM & MOTIVATION

 

DATA CLEANING

 

DASHBOARD

 

QUESTION AND ANSWERS

Data Flow

Data flow.png

Merging of Static Sensor Location and Readings

Firstly to make the dataset easier to use, I would clean the dataset first, removing the redundant columns. After looking through the datasets given, the two datasets StaticSensorLocation.csv and StaticSensorReading.csv have a similar column, Sensor-id. Hence we are able to do an inner join between both datasets and remove the extra columns.

1st cleaning.png



Join clause.png



The applied join clause would be the Sensor-id columns as shown above. Now that we have merged both columns together, we need to remove the extra sensor-id column.

Change1.png



Here we can see that we made 4 changes, the first change is to add a calculated field where we use the formula "Static"+str([Sensor-id]) to create a [new_id] to make this unique. Hence when I combine with the mobile reading dataset later, there won't be any clashes.

The second thing I did was to remove [sensor-id] since we are using the new column [new_id]

The third thing I did was to remove the other [sensor-id] field.

The fourth thing, the removal of [Units] column because it serves no other purpose.

Lastly, adding the new field [User-ID] so values won't be null when you append it to the the cleaned mobile dataset.

Cleaning Mobile Sensor Reading

Mobile1.png



Mobile changes.png



To be able to union both the new output of Static-sensors.csv and mobile.csv, we have the make sure the sensor-keys are well labelled. Hence the first thing we did here was to create a calculated field where [New_id] = "Mobile"+str([Sensor-id]) this creates the the ID for mobile sensors.

The second thing we did was once again to remove the [Units] column as it serves no purpose anymore.

Lastly, because we have the new field [New_id] that we created [Sensor-id] is redudant and hence it will be removed too.

Union of both MobileSensorReading Dataset and combine dataset of SensorLocation and SensorReading Dataset

Union 1.png



To union means to append, given that both datasets are "Cleaned" and have the same columns the could be appended without any problems. Once the flow is done we export it into a CSV file and the file is ready to be used in Tableau

Inner Joint of output.csv file and Geometry of St Himark File

Tableau inner.png



Here we are given the StHimark.shp file, essentially it tells us about the area code and name, by using the formula MAKEPOINT, we are able to map the [lat] and [long] to the Geometry shapefile.