Difference between revisions of "ISSS608 2016-17 T3 Assign ANGAD SRIVASTAVA DataPrep"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 28: Line 28:
  
  
 +
The visualization exercises covered in the next 2 tabs make use of 4 distinct datasets. A brief overview of the data preparation and univariate analysis for all 4 datasets is covered in subsequent sections. These datasets and corresponding analysis has been prepared using both JMP Pro 13 and Tableau.
  
[[File:ISSS608_AngadSrivastava_PageUnderConstruction.png|1000px|center]]
+
A brief description of all 4 datasets and preparatory steps is given below:
 +
 
 +
 
 +
 
 +
=Geolocation Data=
 +
 
 +
This dataset was created based on the location X and Y coordinate points provided as part of the challenge description. The VAST challenge documentation provides geographical coordinates for all 4 factories and 9 sensors. In addition, a “Type” column was added to differentiate between the type of infrastructural construction (Factory or Sensor) represented by the corresponding coordinates.
 +
 
 +
Based on the given information, the table below with 4 columns was created which when plotted in a 200x200 coordinate block map shows the following geographical layout of all factories and sensors.
 +
 
 +
[[Image:ISSS608_AngadSrivastava_DataPrep_1.jpg|900px|center]]
 +
<div style="float:center;text-align:center;font-size:13px;"> <u>''Figure 1''</u> </div>
 +
 
 +
 
 +
=Sensor Data=
 +
<div>
 +
<div style=float:right; width:20%;">
 +
[[Image:ISSS608_AngadSrivastava_DataPrep_2.jpg|300px|right|border]]
 +
<div style="float:center;text-align:center;font-size:13px;"> <u>''Figure 3''</u> </div>
 +
</div>
 +
<div style=float:left;width:70%;">
 +
This dataset provided by the VAST challenge documentation in its original format, contains hourly readings for each of the 4 chemicals captured by every Sensor. The number of records span across 3 months i.e. April, August and December, 2016. The adjacent image shows a sample of the dataset provided.
 +
Using JMP Pro 13, missing values for this dataset were checked to confirm that all 79,243 rows in this dataset did not have any missing values.
 +
 
 +
 
 +
[[Image:ISSS608_AngadSrivastava_DataPrep_3.jpg|500px|center|border]]
 +
<div style="float:center;text-align:center;font-size:13px;"> <u>''Figure 2''</u> </div>
 +
 
 +
 
 +
Univariate analysis of the readings also show that the measure of chemical readings is highly skewed. It is noteworthy to highlight the degree of skewness by stating that the 99.5th percentile measure is 6.46 and the maximum value at 101.1. The median reading is at 0.39. These insights have been used in the preparation of selected visualizations, as covered in the next tab.
 +
 
 +
The distribution of the numeric nominal Monitor field was also analysed to ascertain the frequency of all readings for all Sensors. As shown in the adjacent image, all 9 Sensors capture similar number of readings in the 3 months of data provided with minor differences as shown in the frequency count below:
 +
 
 +
<div style="float:left;;width:50%;">
 +
[[Image:ISSS608_AngadSrivastava_DataPrep_4.jpg|500px|center|border]]<br/>
 +
<center style="font-size:13px;"> <u>''Figure 4''</u> </center>
 +
</div>
 +
<div style="float:left;;width:50%;>
 +
[[Image:ISSS608_AngadSrivastava_DataPrep_5.jpg|450px|center|border]]
 +
<center style="font-size:13px;"> <u>''Figure 5''</u> </center>
 +
</div>
 +
</div>
 +
</div>
 +
 
 +
 
 +
 
 +
=Meteorological Data=
 +
 
 +
 
 +
<div>
 +
<div style=float:right; width:40%;">
 +
 
 +
[[Image:ISSS608_AngadSrivastava_DataPrep_6.jpg|530px|right|border]]
 +
<div style="float:center;text-align:center;font-size:13px;"> <u>''Figure 7''</u> </div>
 +
</div>
 +
<div style=float:left;width:60%;">
 +
This dataset, as provided by the VAST challenge documentation in its original format, contains atmospheric wind related information. The data covers meteorological readings once every 3 hours with the general wind direction and wind speed captured for that time frame. A snippet of this dataset is shown in the adjacent image.
 +
 
 +
The relevance of the data field for “Elevation” given in this dataset is unclear and requires additional information. For the purpose of this investigation, this field has been ignored until further information can be provided.
 +
 
 +
Possible missing values were checked with the following results:
 +
 
 +
[[Image:ISSS608_AngadSrivastava_DataPrep_7.jpg|500px|center|border]]
 +
<div style="float:center;text-align:center;font-size:13px;"> <u>''Figure 6''</u> </div>
 +
</div>
 +
 
 +
As shown above, there are 2 anomalous records. Further investigations showed that the 2 records are - a empty row and missing values on 30th August, 2016 at 3AM. These 2 records were removed as part of the data cleaning process.
 +
 
 +
Univariate analysis on the Wind Speed (m/s) shows the frequency of wind speed is well distributed between 0.1 m/s and 6.8 m/s.
 +
 
 +
[[Image:ISSS608_AngadSrivastava_DataPrep_8.jpg|700px|center|border]]
 +
<div style="float:center;text-align:center;font-size:13px;"> <u>''Figure 8''</u> </div>
 +
 
 +
Frequency analysis of Wind Direction shows that the wind direction is more skewed between 150 to 360 degrees. Since the given wind direction data is north-facing, the frequency distribution shows that the wind direction is mostly from west to east.
 +
 
 +
[[Image:ISSS608_AngadSrivastava_DataPrep_9.jpg|700px|center|border]]
 +
<div style="float:center;text-align:center;font-size:13px;"> <u>''Figure 9''</u> </div>

Revision as of 16:29, 15 July 2017

ISSS608 AngadSrivastava WikiHeading.jpg

The Challenge

Data Preparation

Visualization Tools

VAST Submissions

Feedback and Comments

 


The visualization exercises covered in the next 2 tabs make use of 4 distinct datasets. A brief overview of the data preparation and univariate analysis for all 4 datasets is covered in subsequent sections. These datasets and corresponding analysis has been prepared using both JMP Pro 13 and Tableau.

A brief description of all 4 datasets and preparatory steps is given below:


Geolocation Data

This dataset was created based on the location X and Y coordinate points provided as part of the challenge description. The VAST challenge documentation provides geographical coordinates for all 4 factories and 9 sensors. In addition, a “Type” column was added to differentiate between the type of infrastructural construction (Factory or Sensor) represented by the corresponding coordinates.

Based on the given information, the table below with 4 columns was created which when plotted in a 200x200 coordinate block map shows the following geographical layout of all factories and sensors.

ISSS608 AngadSrivastava DataPrep 1.jpg
Figure 1


Sensor Data

ISSS608 AngadSrivastava DataPrep 2.jpg
Figure 3

This dataset provided by the VAST challenge documentation in its original format, contains hourly readings for each of the 4 chemicals captured by every Sensor. The number of records span across 3 months i.e. April, August and December, 2016. The adjacent image shows a sample of the dataset provided. Using JMP Pro 13, missing values for this dataset were checked to confirm that all 79,243 rows in this dataset did not have any missing values.


ISSS608 AngadSrivastava DataPrep 3.jpg
Figure 2


Univariate analysis of the readings also show that the measure of chemical readings is highly skewed. It is noteworthy to highlight the degree of skewness by stating that the 99.5th percentile measure is 6.46 and the maximum value at 101.1. The median reading is at 0.39. These insights have been used in the preparation of selected visualizations, as covered in the next tab.

The distribution of the numeric nominal Monitor field was also analysed to ascertain the frequency of all readings for all Sensors. As shown in the adjacent image, all 9 Sensors capture similar number of readings in the 3 months of data provided with minor differences as shown in the frequency count below:

ISSS608 AngadSrivastava DataPrep 4.jpg

Figure 4
ISSS608 AngadSrivastava DataPrep 5.jpg
Figure 5


Meteorological Data

ISSS608 AngadSrivastava DataPrep 6.jpg
Figure 7

This dataset, as provided by the VAST challenge documentation in its original format, contains atmospheric wind related information. The data covers meteorological readings once every 3 hours with the general wind direction and wind speed captured for that time frame. A snippet of this dataset is shown in the adjacent image.

The relevance of the data field for “Elevation” given in this dataset is unclear and requires additional information. For the purpose of this investigation, this field has been ignored until further information can be provided.

Possible missing values were checked with the following results:

ISSS608 AngadSrivastava DataPrep 7.jpg
Figure 6

As shown above, there are 2 anomalous records. Further investigations showed that the 2 records are - a empty row and missing values on 30th August, 2016 at 3AM. These 2 records were removed as part of the data cleaning process.

Univariate analysis on the Wind Speed (m/s) shows the frequency of wind speed is well distributed between 0.1 m/s and 6.8 m/s.

ISSS608 AngadSrivastava DataPrep 8.jpg
Figure 8

Frequency analysis of Wind Direction shows that the wind direction is more skewed between 150 to 360 degrees. Since the given wind direction data is north-facing, the frequency distribution shows that the wind direction is mostly from west to east.

ISSS608 AngadSrivastava DataPrep 9.jpg
Figure 9