IS428 2017-18 T1 Assign Duong Ngo Bao Tran
Contents
Overview
Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors. Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons. Mitch is gaining access to several datasets that may help him in his work, and he has asked you (and your colleagues) as experts in visual analytics to help him analyze these datasets.
Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.
In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production. Mitch discovered that the state government has been monitoring the gaseous effluents from the factories through a set of sensors, distributed around the factories, and set between the smokestacks, the city of Mistford and the nature preserve. The state has given Mitch access to their air sampler data, meteorological data, and locations map.
The task
General task
The four factories in the industrial area are subjected to higher-than-usual environmental assessment, due to their proximity to both the city and the preserve. Gaseous effluent data from several sampling stations has been collected over several months, along with meteorological data (wind speed and direction), that could help Mitch understand what impact these factories may be having on the Rose-Crested Blue Pipit. These factories are supposed to be quite compliant with recent years’ environmental regulations, but Mitch has his doubts that the actual data has been closely reviewed. Could visual analytics help him understand the real situation?
The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others.
Specific tasks
- Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
- Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?
- Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.
Background information
Companies
There are 4 companies in Mistford involved in the investigation.
- Roadrunner Fitness Electronics – Roadrunner produces personal fitness trackers, heart rate monitors, headlamps, GPS watches, and other sport-related consumer electronics.
- Kasios Office Furniture – Kasios Office Furniture manufactures metal and composite-wood office furniture including desks, tables, and chairs.
- Radiance ColourTek – Radiance produces solvent based optically variable metallic flake paints.
- Indigo Sol Boards – Indigo Sol produces skateboards and snowboards.
Chemicals
The sensors collect information on several substances of potential concern, including:
- Appluimonia – An airborne odor is caused by a substance in the air that you can smell.
- Chlorodinine – Corrosives are materials that can attack and chemically destroy exposed body tissues.
- Methylosmolene – This is a trade name for a family of volatile organic solvents.
- AGOC-3A – New environmental regulations, and consumer demand, have led to the development of low-VOC and zero-VOC solvents.
Dataset analysis & transformation process
There were a total of 7 documents provided for the assignment: 3 workable excel files, 1 static map in jpg format, and 3 detailed documents with relevant descriptions for the project. This section will elaborate on how the datasets are observed and transformed to prepare for the data visualization.
Sensor location
As both factories and sensors are located based on X and Y coordinates on the map, the factory locations are added to the sheet to prepare for a geodata plotting in Tableau for both items.
The sheet and file name are then renamed to “Coordinates”.
Tableau
The current excel files can be divided into 2 data sources to be worked with in Tableau:
- Sensors: Linking Coordinates and Sensor Data via a left outer join (Item = Monitor). Sensor Data’s Monitor column should be changed to String to match Item’s data format or else they will be unable to join;
- Meteorological Data: Consisting of only the relevant data sheet.
To prepare for task 3, a calculated field is created to convert the wind direction into geographic direction using the conversion guide and the following formula
if [Wind Direction] <= 67.5 and [Wind Direction] > 22.5 then "NE" elseif [Wind Direction] <= 112.5 then "E" elseif [Wind Direction] <= 157.5 then "SEt" elseif [Wind Direction] <= 202.5 then "S" elseif [Wind Direction] <= 247.5 then "SW" elseif [Wind Direction] <= 302.5 then "W" elseif [Wind Direction] <= 347.5 then "NW" else "N" end |
The link to the Tableau data visualization can be found here. .
Task 1
The link to the Tableau data visualization can be found here
Task description
Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
Data analysis
For task 1, we are required to characterize the sensors’ performance and operation by analysing the sensor readings. As the sensor readings dataset is already prepared, it can be imported to Tableau immediately. To get an overview of the patterns that could emerge from the dataset, a heatmap is a good start to visualize the existing data. A heatmap can be created by adding the following items into the respective fields:
Month (Date time) | Column |
Hour (Date time) | Column |
Day (Date time) | Row |
Monitor | Filter |
Sum(Reading) | Color |
After the heatmap was created, it could be seen there was missing data for the sensor readings at exactly 00:00 for 5 days: April 2nd, April 6th, August 4th, August 7th, and December 2nd. There are no data available for April 31st as April only has 30 days.
There are two possibilities that could explain the missing data:
- The sensors were not working properly on those days at 00:00
- Ad-hoc maintenance was being carried out at 00:00
- The data specialist missed out those data when filling in the data sheet.
Assuming (1) is the case, further investigations should be carried out to figure out the real reason behind the sensor malfunctions. This can be done by closer monitoring of the sensors at 00:00 every day.
Task 2
Task description
Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?
Data analysis
For task 2, we are required to pay attention to the chemicals to see which ones are detected by the sensor group, and whether any patterns exist from such chemical releases as seen from the provided datasets. One of the possible approaches to observe such patterns is using pie chart for the chemicals read by each sensor. To make the data visualization more geographically comprehensive and prepare for task 3, we can implement the pie charts on an interactive map with the given static jpg map of the area as well as the coordinates for sensors. The latter was given from the beginning and modified to include factories; for this task however factories will not be considered in the visualization. After putting X and Y in columns and rows respectively, the given static map can be added via the Background Images for Map tool. Although the given coordinates was based on a (200,200) grid, the static map is more accurate when it’s set to a (199,190) field.
After importing the static map, changing the sensor data point colors to fit the map’s (light pink) and adjusting the sizes, the following result will be displayed:
Pie charts showing the chemicals released can be created by adding the following items into the respective fields:
X | Column |
Y | Row |
Chemical | Color |
Sum(reading) | Reading |
Monitor | Label |
From this graph, it can be seen that sensors 3 and 4 are detecting the highest amount of chemicals released compared to other sensors, and sensors 1 and 2 the lowest. An interesting observation is that sensor 6’s readings are not as alarming despite being located in the middle of the 4 factories. Further investigations are carried out to observe patterns of the chemicals release, if any. A second chart is created to observe the patterns by using the following data:
Month (Date Time) | Column |
Reading | Size |
Chemical | Color |
Chemical | Label |
Monitor | Label |
From the graph, it can be observed that AGOC-3A and Methylosmolene have a higher amount of extremely high readings compared to Appluimonia and Chlorolidine.
Task 3
The tableau link to the data visualization can be found here.
Task description
Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.
Data analysis
For task 3, we are required to observe which factory is liable for each chemical release. To observe how wind and chemical releases pattern correlate, a chart is created with the following items
Day (Date Time) | Column |
Hour (Date Time) | Column |
Wind Direction (Geo) | Column |
Reading | Row |
Month (Date Time) | Filter |
Chemical | Filter |
Monitor | Color |
From this graph, the anomalies of every sensor's readings will be spotted, together with the wind direction as from the dataset provided as well as the monitor that picked up those extremely high readings. Those anomalies will then be scrutinized to determine which factory is responsible for each of the high chemical readings. For example, the following is the readings for AGOC-3A during April:
Assuming above 50% is considered a high reading for chemical composition, a filter can be applied to pinpoint the anomalies more efficiently:
After comparing the direction with the given map, it appears that Kasio and Roadrunner are both responsible for all the high chemical readings.