IS428 AY2019-20T1 Assign Ho Jue Hong Data Cleaning

From Visual Analytics for Business Intelligence
Revision as of 01:22, 13 October 2019 by Juehong.ho.2017 (talk | contribs)
Jump to navigation Jump to search

MINI-CHALLENGE 2: CITIZEN SCIENCE TO THE RESCUE

PROBLEM & MOTIVATION

 

DATA CLEANING

 

DASHBOARD

 

PROBLEMS

 

REFERENCE

Data Flow

Data flow.png

Merging of Static Sensor Location and Readings

Firstly to make the dataset easier to use, I would clean the dataset first, removing the redundant columns. After looking through the datasets given, the two datasets StaticSensorLocation.csv and StaticSensorReading.csv have a similar column, Sensor-id. Hence we are able to do an inner join between both datasets and remove the extra columns.

1st cleaning.png



Join clause.png



The applied join clause would be the Sensor-id columns as shown above. Now that we have merged both columns together, we need to remove the extra sensor-id column.

Change1.png



Here we can see that we made 4 changes, the first change is to add a calculated field where we use the formula "Static"+str([Sensor-id]) to create a [new_id] to make this unique. Hence when I combine with the mobile reading dataset later, there won't be any clashes.

The second thing I did was to remove [sensor-id] since we are using the new column [new_id]

The third thing I did was to remove the other [sensor-id] field.

Lastly, the removal of [Units] column because it serves no other purpose

Cleaning Mobile Sensor Reading

Mobile1.png



Mobile changes.png



To be able to union both the new output of Static-sensors.csv and mobile.csv, we have the make sure the sensor-keys are well labelled. Hence the first thing we did here was to create a calculated field where [New_id] = "Mobile"+str([Sensor-id]) this creates the the ID for mobile sensors.

The second thing we did was once again to remove the [Units] column as it serves no purpose anymore.

Lastly, because we have the new field [New_id] that we created [Sensor-id] is redudant and hence it will be removed too.

Union of both MobileSensorReading Dataset and combine dataset of SensorLocation and SensorReading Dataset

Union 1.png



To union means to append, given that both datasets are "Cleaned" and have the same columns the could be appended without any problems. Once the flow is done we export it into a CSV file and the file is ready to be used in Tableau