IS428 2017-18 T1 Assign DLau Peng Liang Bryan
Contents
IS428 Main Page: Brings you to the IS428 main page
Assignment Overview: Overview of the Assignment & Details
Assignment Dropbox: Dropbox with links to other peer's assignments
Problem & Motivation
Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors. Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons.
Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.
In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production.
Mitch is gaining access to several datasets that may help him in his work, and he has asked you (and your colleagues) as experts in visual analytics to help him analyse these datasets. These datasets includes air sampler data, meteorological data, and locations maps provided by the state government, which has been monitoring the gaseous effluents from the factories through a set of sensors distributed around the factories.
Task
General Task
The four factories in the industrial area are subjected to higher-than-usual environmental assessment, due to their proximity to both the city and the preserve. Gaseous effluent data from several sampling stations has been collected over several months, along with meteorological data (wind speed and direction), that could help Mitch understand what impact these factories may be having on the Rose-Crested Blue Pipit. These factories are supposed to be quite compliant with recent years’ environmental regulations, but Mitch has his doubts that the actual data has been closely reviewed. Could visual analytics help him understand the real situation?
The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below.
Specific Task
- Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.
- Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.
- Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.
Dataset Analysis & Transformation Process
Datasets Provided (Sensor Data, Sensor Location, Meteorological Data)
Additional Information Provided
Interactive Visualisation
Results
Task #1
Figure 1 - Sensor Activity Rate
When first analyzing the data, I began to first plot the calendar heatmap of the sum of all readings to identify if there were any sensors that were not functioning as intended. I defined a faulty sensor as one that was either not collecting any chemical, or collecting data of chemicals erroneously (collecting data of the wrong chemicals).
According to Figure 1 which illustrates the sensors activity rate (sensors that are actively collecting data are deemed to be active), we are able to identify that NO readings was collected on the 2nd April, 6th April, 4th August, 7th August and 2nd December, at 00:00. This was highly unusual and seems to indicate that all the sensors were inactive during that period of time. However, readings began to be collected once again at 01:00, which could suggest that it was merely intermittent disruption (e.g. thunderstorms) causing blackouts, or that the sensors could have been due for a scheduled maintenance during that period of time.
As the inactivity only happens at the start of the month, it seems more likely that the sensors stopped collecting data due to scheduled maintenance to ensure that they remain operational for the next few months, before the next scheduled maintenance sessions.
Figure 2 - Duplicate & Missing Readings
After the creation of the calendar heatmap, I thought that it would be good to gain a better understanding about the readings of each sensor, at a specific period in time. Hence, I decided to create another chart that would provide me with the detailed analysis of the readings per chemical per monitor at the exact date and time. At any given point in time (e.g. 2200 on 12/6/16), one monitor should only have ONE reading per chemical. Following that logic, I initially expected all of the points to be of a single color when I plotted the chart, and used the COUNT(Number of Records) as color identification for differentiating between two points.
However, as illustrated in Figure 2, we are able to clearly identify gaps in between, and see that there are points that are of different colors, which shows that the sensors have been reading more than one instance of the chemical at a specific point in time, which should not be the case and suggests that the sensor may potentially be faulty.
Figure 3 - Duplicate Readings == Missing Readings?
800px - Insert file here (Filter then cut image)
Adding in a couple of filters for a clearer visualization then allows us to see that whenever a sensor picks up two readings of the same chemical, the sensor also fails to pick up one reading of the same chemical. This further reinforces the idea that the sensors may be faulty, as they are erroneously reading Methylosmolene for AGOC-3A. As Methylosmolene is an extremely toxic chemical, the sensors' failure to correctly identify it should definitely be cause for concern, and serve as immediate notice for the technicians to re-evaluate all of the sensors' configurations. This is because AGOC-3A is considered to be the least harmful both to humans and to the environment, and if the government has any plans to use the chemical emission rates to tout that they are now a "cleaner and greener" state, they should first ensure that there aren't any harmful chemicals that are being disguised as AGOC-3A.
Task #2
Figure 4 - Chemical Emissions Per Month
800px - Insert file here (Sheet 4)
Looking across the 3 months of data provided, we were able to plot out the chemical emissions per month for each of the monitors in a heatmap, as shown in Figure 4. As apparent as it is, we can see that each and every one of the sensor group is capable of reading all 4 types of the released chemicals, although the readings differ across sensors, due to proximity to the factories and also the direction and strength of the wind, which we will try to analyze and explain further in Task #3.
Figure 5 - Chemical Readings Trend (Across 3 Months)
800px - Insert file here (Chemical Emissions Per Mth Sheet)
Building the line chart for for the average readings per chemical across the 3-month period showed a clear trend whereby we can see that the average readings was on an upward trend from April to December.
For AGOC-3A, the emissions seemed to have slow down from August to December, whereas Methylosmolene's emissions were minimal from April to August but experienced a sharp increase from August to December, which suggests that perhaps some of the factories have decided to forego the state's initiatives to preserve air quality in a bid to obtain higher profits as the firm may have seen declines in their operating profit as they tried to adhere to regulations to reduce harmful chemical emissions, which may have increased their operating cost. Meanwhile, Appluimonia and Chlorodinine also increased exponentially from April to December.
Figure 6 - Average Chemical Readings Per Hr
800px - Insert file here (Average Chemical Readings Per Hr Sheet)
Using the average values of the readings across all periods as illustrated in Figure 6, I was able to observe that although most of the factories were mentioned to have reduced their emission of Methylosmolene, the emission rate for the chemical is still relatively high. In fact, it is the 2nd most emitted chemical. This observation, combined with our earlier findings that the sensors may not have accurately collected the readings of all chemicals (some harmful chemicals like Methylosmolene may currently be disguised as AGOC-3A) further exacerbates the issue whereby the citizens' health may be negatively affected without themselves being aware.
Furthermore, one interesting observation is that the emission rate of Methylosmolene spikes up during very unusual hours (2200 - 0500), and it's emission rate is exceptionally weird if we were to take into consideration that it is emitting twice that of other chemicals during a period whereby most factories should be inactive and that there should be little to no activities that would result in chemical emissions during that period of time.
Using the same illustration, we are also able to observe that it currently appears as though the state government's efforts to reduce the amount of harmful chemical emissions has been relatively effective. Based upon the average chemical release rate per hour, we are able to clearly see that AGOC-3A (the least harmful out of all the chemicals) is being released more for the most part of the day, except for the errant behavior of Methylosmolene as mentioned earlier.