Difference between revisions of "IS428 AY2019-20T1 Assign Christine Data Analysis Transformation"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
{|style="background-color:#B0E0E6; color:#fff; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#B0E0E6; color:#fff; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
| style="padding:0.2em; font-size:100%; background-color:#B0E0E6; text-align:center; color:#fff" width="10%" |  
 
| style="padding:0.2em; font-size:100%; background-color:#B0E0E6; text-align:center; color:#fff" width="10%" |  
[[IS428_AY2019-20T1_Assign_Christine |<font color="#100c08" size=3 face="Helvetica"><b>PROBLEM & MOTIVATION</b></font>]]
+
[[IS428_AY2019-20T1_Assign_Christine |<font color="#100c08" size=2 face="Helvetica"><b>PROBLEM & MOTIVATION</b></font>]]
  
 
| style="background:none;" width="1%" | &nbsp;
 
| style="background:none;" width="1%" | &nbsp;
 
| style="padding:0.2em; font-size:100%; background-color:#008B8B;  border-bottom:0px solid #3D9DD7; text-align:center; color:#100c08" width="10%" |  
 
| style="padding:0.2em; font-size:100%; background-color:#008B8B;  border-bottom:0px solid #3D9DD7; text-align:center; color:#100c08" width="10%" |  
[[IS428_AY2019-20T1_Assign_Christine_Data_Analysis_Transformation|<font color="#fff" size=3 face="Helvetica"><b>DATA ANALYSIS & TRANSFORMATION</b></font>]]
+
[[IS428_AY2019-20T1_Assign_Christine_Data_Analysis_Transformation|<font color="#fff" size=2 face="Helvetica"><b>DATA ANALYSIS & TRANSFORMATION</b></font>]]
  
 
| style="background:none;" width="1%" | &nbsp;
 
| style="background:none;" width="1%" | &nbsp;
 
| style="padding:0.2em; font-size:100%; background-color:#B0E0E6;  border-bottom:0px solid #3D9DD7; text-align:center; color:#fff" width="10%" |  
 
| style="padding:0.2em; font-size:100%; background-color:#B0E0E6;  border-bottom:0px solid #3D9DD7; text-align:center; color:#fff" width="10%" |  
[[IS428_AY2019-20T1_Assign_Christine_Interactive_Visualization|<font color="#100c08" size=3 face="Helvetica"><b>INTERACTIVE VISUALIZATION</b></font>]]
+
[[IS428_AY2019-20T1_Assign_Christine_Interactive_Visualization|<font color="#100c08" size=2 face="Helvetica"><b>INTERACTIVE VISUALIZATION</b></font>]]
  
 
| style="background:none;" width="1%" | &nbsp;
 
| style="background:none;" width="1%" | &nbsp;
 
| style="padding:0.2em; font-size:100%; background-color:#B0E0E6;  border-bottom:0px solid #3D9DD7; text-align:center; color:#fff" width="10%" |  
 
| style="padding:0.2em; font-size:100%; background-color:#B0E0E6;  border-bottom:0px solid #3D9DD7; text-align:center; color:#fff" width="10%" |  
[[IS428_AY2019-20T1_Assign_Christine_Anomalies_Observation|<font color="#100c08" size=3 face="Helvetica"><b>ANOMALIES OBSERVATION</b></font>]]
+
[[IS428_AY2019-20T1_Assign_Christine_Anomalies_Observation|<font color="#100c08" size=2 face="Helvetica"><b>ANOMALIES OBSERVATION</b></font>]]
  
 
| style="background:none;" width="1%" | &nbsp;
 
| style="background:none;" width="1%" | &nbsp;
 
| style="padding:0.2em; font-size:100%; background-color:#B0E0E6;  border-bottom:0px solid #3D9DD7; text-align:center; color:#fff" width="10%" |  
 
| style="padding:0.2em; font-size:100%; background-color:#B0E0E6;  border-bottom:0px solid #3D9DD7; text-align:center; color:#fff" width="10%" |  
[[IS428_AY2019-20T1_Assign_Christine_Reference|<font color="#100c08" size=3 face="Helvetica"><b>REFERENCE</b></font>]]
+
[[IS428_AY2019-20T1_Assign_Christine_Reference|<font color="#100c08" size=2 face="Helvetica"><b>REFERENCE</b></font>]]
 
|}  
 
|}  
 
</div>
 
</div>
Line 35: Line 35:
 
==<div><font color=#100c08 face="Century Gothic"><b>COMBINE STATIC READINGS AND STATIC LOCATIONS</b></font></div>==
 
==<div><font color=#100c08 face="Century Gothic"><b>COMBINE STATIC READINGS AND STATIC LOCATIONS</b></font></div>==
 
File used: '''StaticSensorReadings.csv''' and '''StaticSensorLocations.csv'''
 
File used: '''StaticSensorReadings.csv''' and '''StaticSensorLocations.csv'''
<div id="Fb.1">[[File:christine.2016_figure_b.1_staticReadingLocations.png|300px|frameless|center]]
+
<div><center><ul>
<center>''Figure b.1 - Combine Static Readings and Locations''</center></div>
+
<li style="display: inline-block;" id="Fb.1"> [[File:christine.2016_figure_b.1_staticReadingLocations.png|thumb|none|450px|''Figure b.1 - Combine Static Readings and Locations'']] </li>
 +
</ul></center></div>
 
Both files will be joined through the Applied Join Clauses by using Sensor-id in both files ([[#Fb.2|Figure b.2]]). Through this Join clause, tidier data (Tall & Skinny) can be generated.
 
Both files will be joined through the Applied Join Clauses by using Sensor-id in both files ([[#Fb.2|Figure b.2]]). Through this Join clause, tidier data (Tall & Skinny) can be generated.
<div id="Fb.2">[[File:christine.2016_figure_b.2_staticIDCleaning.png|300px|frameless|center]]
+
<div><center><ul>
<center>''Figure b.2 - Clause in Join and Calculated Field''</center></div>
+
<li style="display: inline-block;" id="Fb.2"> [[File:christine.2016_figure_b.2_staticIDCleaning.png|thumb|none|300px|''Figure b.2 - Clause in Join and Calculated Field'']] </li>
 +
</ul></center></div>
 
Because of on the next few steps, this static sensor readings will be combined with mobile sensor readings; hence, to avoid confusion on the Sensor-id, I decided to add identifier (e.g. Static) to the initial Sensor-id. Initially, Sensor-id was in Integer (numeric) format; in order to add String to Integer, Sensor-id need to be changed to String type.
 
Because of on the next few steps, this static sensor readings will be combined with mobile sensor readings; hence, to avoid confusion on the Sensor-id, I decided to add identifier (e.g. Static) to the initial Sensor-id. Initially, Sensor-id was in Integer (numeric) format; in order to add String to Integer, Sensor-id need to be changed to String type.
 
On next step needs to create the Calculated field, by adding a calculated field named “SensorID_Static” which consists of Sensor-id along with the word '''‘Static’'''. Final look of table will be as [[#Fb.3|Figure b.3]].
 
On next step needs to create the Calculated field, by adding a calculated field named “SensorID_Static” which consists of Sensor-id along with the word '''‘Static’'''. Final look of table will be as [[#Fb.3|Figure b.3]].
<div id="Fb.3">[[File:christine.2016_figure_b.3_afterProcess.png|500px|frameless|center]]
+
<div><center><ul>
<center>''Figure b.3 - Final look of table''</center></div>
+
<li style="display: inline-block;" id="Fb.3"> [[File:christine.2016_figure_b.3_afterProcess.png|thumb|none|500px|''Figure b.3 - Final look of table'']] </li>
 +
</ul></center></div>
 
<br/>
 
<br/>
  
 
==<div><font color=#100c08 face="Century Gothic"><b>DATA CLEANING ON MOBILE SENSOR READINGS</b></font></div>==
 
==<div><font color=#100c08 face="Century Gothic"><b>DATA CLEANING ON MOBILE SENSOR READINGS</b></font></div>==
 
File used: '''MobileSensorReadings.csv'''
 
File used: '''MobileSensorReadings.csv'''
<div id="Fb.4">[[File:christine.2016_figure_b.4_MobileCleaning.png|300px|frameless|center]]
+
<div><center><ul>
<center>''Figure b.4 - Cleaning of Mobile Sensor Readings''</center></div>
+
<li style="display: inline-block;" id="Fb.4"> [[File:christine.2016_figure_b.4_MobileCleaning.png|thumb|none|400px|''Figure b.4 - Cleaning of Mobile Sensor Readings'']] </li>
<div id="Fb.5">[[File:christine.2016_figure_b.5_mobileIDCleaning.png|300px|frameless|center]]
+
<li style="display: inline-block;" id="Fb.5"> [[File:christine.2016_figure_b.5_mobileIDCleaning.png|thumb|none|300px|''Figure b.5 - Cleaning of Mobile Sensor ID and Calculated Field'']] </li>
<center>''Figure b.5 - Cleaning of Mobile Sensor ID and Calculated Field''</center></div>
+
</ul></center></div>
 
Similar process will be done as Static readings. In order to avoid confusion in Sensor-id differentiation, additional identifier will be added into the Sensor-id (Sensor-id will be transformed into String type) by using this calculated field. New column named SensorID_Mobile will be created. Final look of table will be as [[#Fb.6|Figure b.6]].
 
Similar process will be done as Static readings. In order to avoid confusion in Sensor-id differentiation, additional identifier will be added into the Sensor-id (Sensor-id will be transformed into String type) by using this calculated field. New column named SensorID_Mobile will be created. Final look of table will be as [[#Fb.6|Figure b.6]].
<div id="Fb.6">[[File:christine.2016_figure_b.6_afterProcess.png|500px|frameless|center]]
+
<div><center><ul>
<center>''Figure b.6 - Final look of table''</center></div>
+
<li style="display: inline-block;" id="Fb.6"> [[File:christine.2016_figure_b.6_afterProcess.png|thumb|none|500px|''Figure b.6 - Final look of table'']] </li>
 +
</ul></center></div>
 
<br/>
 
<br/>
  
 
==<div><font color=#100c08 face="Century Gothic"><b>COMBINE STATIC SENSORS AND MOBILE SENSORS' READINGS</b></font></div>==
 
==<div><font color=#100c08 face="Century Gothic"><b>COMBINE STATIC SENSORS AND MOBILE SENSORS' READINGS</b></font></div>==
 
File used: '''StaticSensorReadings.csv; StaticSensorLocations.csv; and MobileSensorReadings.csv'''
 
File used: '''StaticSensorReadings.csv; StaticSensorLocations.csv; and MobileSensorReadings.csv'''
<div id="Fb.7">[[File:christine.2016_figure_b.7_unionCleaning.png|300px|frameless|center]]
+
<div><center><ul>
<center>''Figure b.7 - Cleaning Process on Union of dataset''</center></div>
+
<li style="display: inline-block;" id="Fb.7"> [[File:christine.2016_figure_b.7_unionCleaning.png|thumb|none|400px|''Figure b.7 - Cleaning Process on Union of dataset'']] </li>
 +
</ul></center></div>
 
In this step, both static sensor readings and mobile sensor readings will be combined (by using Union) into one working file ([[#Fb.7|Figure b.7]]). In this case, some extra values will be removed and some columns will be rearranged into tidier data.
 
In this step, both static sensor readings and mobile sensor readings will be combined (by using Union) into one working file ([[#Fb.7|Figure b.7]]). In this case, some extra values will be removed and some columns will be rearranged into tidier data.
<div id="Fb.8">[[File:christine.2016_figure_b.8_unionActionsCleaning.png|300px|frameless|center]]
+
<div><center><ul>
<center>''Figure b.8 - Actions Performed in Cleaning Union dataset''</center></div>
+
<li style="display: inline-block;" id="Fb.8"> [[File:christine.2016_figure_b.8_unionActionsCleaning.png|thumb|none|400px|''Figure b.8 - Actions Performed in Cleaning Union dataset'']] </li>
 +
</ul></center></div>
 
Actions performed on cleaning data as follow:
 
Actions performed on cleaning data as follow:
* First action is combining the duplicated information. Initially there was two values of ‘cpm’ with additional spaces. We need to do Group and Replace the word ‘cpm’ into one standardized value.
+
* the First action is combining the duplicated information. Initially, there were two values of ‘CPM’ with additional spaces. We need to do Group and Replace the word ‘CPM’ into one standardized value.
 
* The next action is Sensor-Id. Initially, there will be two columns consist of Sensor-id of Static and Mobile sensor respectively. After combine and new column of SensorID_Mobile/Static will be created then delete the extra one column of Sensor-id.
 
* The next action is Sensor-Id. Initially, there will be two columns consist of Sensor-id of Static and Mobile sensor respectively. After combine and new column of SensorID_Mobile/Static will be created then delete the extra one column of Sensor-id.
 
After initial exploratory of the dataset by using map, I found that under static readings, '''each Static Sensor_id represents one region/area''', hence, in order to make the whole data set more self-explanatory, I rename the static data into each respective region/area along with the Sensor_id (e.g. Static Sensor_1 is Palace Hill, Static Sensor_11 is BroadView become 1 – Palace Hill and 11 - BroadView) while Mobile Sensor-id naming remain the same.
 
After initial exploratory of the dataset by using map, I found that under static readings, '''each Static Sensor_id represents one region/area''', hence, in order to make the whole data set more self-explanatory, I rename the static data into each respective region/area along with the Sensor_id (e.g. Static Sensor_1 is Palace Hill, Static Sensor_11 is BroadView become 1 – Palace Hill and 11 - BroadView) while Mobile Sensor-id naming remain the same.
<div id="Fb.9">[[File:christine.2016_figure_b.9_cleanedStaticUser.png|300px|frameless|center]]
+
<div><center><ul>
<center>''Figure b.9 - Static_User Display''</center></div>
+
<li style="display: inline-block;" id="Fb.9"> [[File:christine.2016_figure_b.9_cleanedStaticUser.png|thumb|none|300px|''Figure b.9 - Static_User Display'']] </li>
 +
</ul></center></div>
 
After combining the Sensor-id of both sensors, I realized that Static sensor have no '''User-id''' where it will auto-generate '''null''' value which is not value adding. Firstly, the null value should be renamed to '''Static_User'''. Continue with combining '''User-id and Sensor-id''' (e.g. Static_User(1 – Palace Hill)) by using the calculated field.
 
After combining the Sensor-id of both sensors, I realized that Static sensor have no '''User-id''' where it will auto-generate '''null''' value which is not value adding. Firstly, the null value should be renamed to '''Static_User'''. Continue with combining '''User-id and Sensor-id''' (e.g. Static_User(1 – Palace Hill)) by using the calculated field.
 
<center>'''Calculation of calculated field:'''<br/>
 
<center>'''Calculation of calculated field:'''<br/>
Line 75: Line 82:
 
<br/>
 
<br/>
 
By completing all actions mentioned above, whole dataset will look tidier, ready to be imported to Tableau and use for analysis. Follow is the final look of the table [[#Fb.10|Figure b.10]].
 
By completing all actions mentioned above, whole dataset will look tidier, ready to be imported to Tableau and use for analysis. Follow is the final look of the table [[#Fb.10|Figure b.10]].
<div id="Fb.10">[[File:christine.2016_figure_b.10_afterProcess.png|500px|frameless|center]]
+
<div><center><ul>
<center>''Figure b.10 - Final look of table''</center></div>
+
<li style="display: inline-block;" id="Fb.10"> [[File:christine.2016_figure_b.10_afterProcess.png|thumb|none|500px|''Figure b.10 - Final look of table'']] </li>
 +
</ul></center></div>
 
<br/>
 
<br/>
  
 
==<div><font color=#100c08 face="Century Gothic"><b>FINAL WORKFLOW AND DATASET</b></font></div>==
 
==<div><font color=#100c08 face="Century Gothic"><b>FINAL WORKFLOW AND DATASET</b></font></div>==
 +
<div><center><ul>
 +
<li style="display: inline-block;" id="Fb.11"> [[File:christine.2016_figure_b.11_finalWorkflow.png|thumb|none|500px|''Figure b.11 - Final Workflow'']] </li>
 +
</ul></center></div>
 +
<div><center><ul>
 +
<li style="display: inline-block;" id="Fb.12"> [[File:christine.2016_figure_b.12_finalTable.png|thumb|none|500px|''Figure b.12 - Final Table'']] </li>
 +
</ul></center></div>
 
<br/>
 
<br/>
[[#Fb.1|Figure b.1]]
 

Latest revision as of 23:51, 11 October 2019

Christine.2016 NuclearIcon.png VISUALIZATION OF ALWAYS SAFE NUCLEAR POWER PLANT

PROBLEM & MOTIVATION

 

DATA ANALYSIS & TRANSFORMATION

 

INTERACTIVE VISUALIZATION

 

ANOMALIES OBSERVATION

 

REFERENCE


The very first step of analysis is located at the data cleaning and transformation so that it can bring value to the analysis conducted. Raw dataset in zip file is downloaded from VAST Challenge – Mini Challenge 2 are as follow:

  • StHimarkNeighborhoodShapefile Folder (consist of shape file that allow user to use the map file in geometry format along with the ID and name of neighbourhood)
  • StaticSensorReadings.csv (contains multiple Static Sensor ID respectively with the sensor readings over a period of time)
  • StaticSensorLocations.csv (contains multiple Static Sensor ID respectively with the its location)
  • MobileSensorReadings.csv (contains multiple Mobile Sensor ID respectively with the sensor readings and its location over a period of time)

Before using the raw dataset downloaded, this section will elaborate on the dataset analysis and transformation process in order to prepare the data for import and analysis can be conducted.

COMBINE STATIC READINGS AND STATIC LOCATIONS

File used: StaticSensorReadings.csv and StaticSensorLocations.csv

  • Figure b.1 - Combine Static Readings and Locations

Both files will be joined through the Applied Join Clauses by using Sensor-id in both files (Figure b.2). Through this Join clause, tidier data (Tall & Skinny) can be generated.

  • Figure b.2 - Clause in Join and Calculated Field

Because of on the next few steps, this static sensor readings will be combined with mobile sensor readings; hence, to avoid confusion on the Sensor-id, I decided to add identifier (e.g. Static) to the initial Sensor-id. Initially, Sensor-id was in Integer (numeric) format; in order to add String to Integer, Sensor-id need to be changed to String type. On next step needs to create the Calculated field, by adding a calculated field named “SensorID_Static” which consists of Sensor-id along with the word ‘Static’. Final look of table will be as Figure b.3.

  • Figure b.3 - Final look of table


DATA CLEANING ON MOBILE SENSOR READINGS

File used: MobileSensorReadings.csv

  • Figure b.4 - Cleaning of Mobile Sensor Readings
  • Figure b.5 - Cleaning of Mobile Sensor ID and Calculated Field

Similar process will be done as Static readings. In order to avoid confusion in Sensor-id differentiation, additional identifier will be added into the Sensor-id (Sensor-id will be transformed into String type) by using this calculated field. New column named SensorID_Mobile will be created. Final look of table will be as Figure b.6.

  • Figure b.6 - Final look of table


COMBINE STATIC SENSORS AND MOBILE SENSORS' READINGS

File used: StaticSensorReadings.csv; StaticSensorLocations.csv; and MobileSensorReadings.csv

  • Figure b.7 - Cleaning Process on Union of dataset

In this step, both static sensor readings and mobile sensor readings will be combined (by using Union) into one working file (Figure b.7). In this case, some extra values will be removed and some columns will be rearranged into tidier data.

  • Figure b.8 - Actions Performed in Cleaning Union dataset

Actions performed on cleaning data as follow:

  • the First action is combining the duplicated information. Initially, there were two values of ‘CPM’ with additional spaces. We need to do Group and Replace the word ‘CPM’ into one standardized value.
  • The next action is Sensor-Id. Initially, there will be two columns consist of Sensor-id of Static and Mobile sensor respectively. After combine and new column of SensorID_Mobile/Static will be created then delete the extra one column of Sensor-id.

After initial exploratory of the dataset by using map, I found that under static readings, each Static Sensor_id represents one region/area, hence, in order to make the whole data set more self-explanatory, I rename the static data into each respective region/area along with the Sensor_id (e.g. Static Sensor_1 is Palace Hill, Static Sensor_11 is BroadView become 1 – Palace Hill and 11 - BroadView) while Mobile Sensor-id naming remain the same.

  • Figure b.9 - Static_User Display

After combining the Sensor-id of both sensors, I realized that Static sensor have no User-id where it will auto-generate null value which is not value adding. Firstly, the null value should be renamed to Static_User. Continue with combining User-id and Sensor-id (e.g. Static_User(1 – Palace Hill)) by using the calculated field.

Calculation of calculated field:
[User_id] + “ (“ + [SensorID_Mobile/Static] + “)”


By completing all actions mentioned above, whole dataset will look tidier, ready to be imported to Tableau and use for analysis. Follow is the final look of the table Figure b.10.

  • Figure b.10 - Final look of table


FINAL WORKFLOW AND DATASET

  • Figure b.11 - Final Workflow
  • Figure b.12 - Final Table