IS428 AY2019-20T1 Assign Christine Data Analysis Transformation

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Christine.2016 NuclearIcon.png VISUALIZATION OF ALWAYS SAFE NUCLEAR POWER PLANT

PROBLEM & MOTIVATION

 

DATA ANALYSIS & TRANSFORMATION

 

INTERACTIVE VISUALIZATION

 

ANOMALIES OBSERVATION

 

REFERENCE


The very first step of analysis is located at the data cleaning and transformation so that it can bring value to the analysis conducted. Raw dataset in zip file is downloaded from VAST Challenge – Mini Challenge 2 are as follow:

  • StHimarkNeighborhoodShapefile Folder (consist of shape file that allow user to use the map file in geometry format along with the ID and name of neighbourhood)
  • StaticSensorReadings.csv (contains multiple Static Sensor ID respectively with the sensor readings over a period of time)
  • StaticSensorLocations.csv (contains multiple Static Sensor ID respectively with the its location)
  • MobileSensorReadings.csv (contains multiple Mobile Sensor ID respectively with the sensor readings and its location over a period of time)

Before using the raw dataset downloaded, this section will elaborate on the dataset analysis and transformation process in order to prepare the data for import and analysis can be conducted.

COMBINE STATIC READINGS AND STATIC LOCATIONS

File used: StaticSensorReadings.csv and StaticSensorLocations.csv

  • Figure b.1 - Combine Static Readings and Locations

Both files will be joined through the Applied Join Clauses by using Sensor-id in both files (Figure b.2). Through this Join clause, tidier data (Tall & Skinny) can be generated.

  • Figure b.2 - Clause in Join and Calculated Field

Because of on the next few steps, this static sensor readings will be combined with mobile sensor readings; hence, to avoid confusion on the Sensor-id, I decided to add identifier (e.g. Static) to the initial Sensor-id. Initially, Sensor-id was in Integer (numeric) format; in order to add String to Integer, Sensor-id need to be changed to String type. On next step needs to create the Calculated field, by adding a calculated field named “SensorID_Static” which consists of Sensor-id along with the word ‘Static’. Final look of table will be as Figure b.3.

  • Figure b.3 - Final look of table


DATA CLEANING ON MOBILE SENSOR READINGS

File used: MobileSensorReadings.csv

  • Figure b.4 - Cleaning of Mobile Sensor Readings
  • Figure b.5 - Cleaning of Mobile Sensor ID and Calculated Field

Similar process will be done as Static readings. In order to avoid confusion in Sensor-id differentiation, additional identifier will be added into the Sensor-id (Sensor-id will be transformed into String type) by using this calculated field. New column named SensorID_Mobile will be created. Final look of table will be as Figure b.6.

  • Figure b.6 - Final look of table


COMBINE STATIC SENSORS AND MOBILE SENSORS' READINGS

File used: StaticSensorReadings.csv; StaticSensorLocations.csv; and MobileSensorReadings.csv

  • Figure b.7 - Cleaning Process on Union of dataset

In this step, both static sensor readings and mobile sensor readings will be combined (by using Union) into one working file (Figure b.7). In this case, some extra values will be removed and some columns will be rearranged into tidier data.

  • Figure b.8 - Actions Performed in Cleaning Union dataset

Actions performed on cleaning data as follow:

  • the First action is combining the duplicated information. Initially, there were two values of ‘CPM’ with additional spaces. We need to do Group and Replace the word ‘CPM’ into one standardized value.
  • The next action is Sensor-Id. Initially, there will be two columns consist of Sensor-id of Static and Mobile sensor respectively. After combine and new column of SensorID_Mobile/Static will be created then delete the extra one column of Sensor-id.

After initial exploratory of the dataset by using map, I found that under static readings, each Static Sensor_id represents one region/area, hence, in order to make the whole data set more self-explanatory, I rename the static data into each respective region/area along with the Sensor_id (e.g. Static Sensor_1 is Palace Hill, Static Sensor_11 is BroadView become 1 – Palace Hill and 11 - BroadView) while Mobile Sensor-id naming remain the same.

  • Figure b.9 - Static_User Display

After combining the Sensor-id of both sensors, I realized that Static sensor have no User-id where it will auto-generate null value which is not value adding. Firstly, the null value should be renamed to Static_User. Continue with combining User-id and Sensor-id (e.g. Static_User(1 – Palace Hill)) by using the calculated field.

Calculation of calculated field:
[User_id] + “ (“ + [SensorID_Mobile/Static] + “)”


By completing all actions mentioned above, whole dataset will look tidier, ready to be imported to Tableau and use for analysis. Follow is the final look of the table Figure b.10.

  • Figure b.10 - Final look of table


FINAL WORKFLOW AND DATASET

  • Figure b.11 - Final Workflow
  • Figure b.12 - Final Table