IS428 2017-18 T1 Assign Benedict Wee Yu Rui

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Links

The Task

General Task

The four factories in the industrial area are subjected to higher-than-usual environmental assessment, due to their proximity to both the city and the preserve. Gaseous effluent data from several sampling stations has been collected over several months, along with meteorological data (wind speed and direction), that could help Mitch understand what impact these factories may be having on the Rose-Crested Blue Pipit. These factories are supposed to be quite compliant with recent years’ environmental regulations, but Mitch has his doubts that the actual data has been closely reviewed. Could visual analytics help him understand the real situation?

The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below.

The Specific Tasks

  • Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?Limit your response to no more than 9 images and 1000 words.
  • Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.
  • Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.

Background Information

The four manufacturing factories south of the nature preserve which Mitch is suspicious of the chemical releases and these are their coordinates:

Factory X Y
Roadrunner Fitness Electronics 89 27
Kasios Office FurnituExample 90 21
Radiance ColourTeExample 109 26
Indigo SoExample ardExample 120 22

In addition, these are the chemicals detected by the sensors and their following descriptions are below:

  • AGOC-3A – New environmental regulations, and consumer demand, have led to the development of low-VOC and zero-VOC solvents. Most manufacturers now use one or more low-VOC substances and Mistford’s plants have wholeheartedly signed on. These new solvents, including AGOC-3A, are less harmful to human and environmental health.
  • Appluimonia – An airborne odor is caused by a substance in the air that you can smell. Odors, or smells, can be either pleasant or unpleasant. In general, most substances that cause odors in the outdoor air are not at levels that can cause serious injury, long-term health effects, or death to humans or animals. However, odors may affect your quality of life and sense of well-being. Several odor-producing substances, including Appluimonia, are monitored under this program.
  • Chlorodinine – Corrosives are materials that can attack and chemically destroy exposed body tissues. Corrosives can also damage or even destroy metal. They begin to cause damage as soon as they touch the skin, eyes, respiratory tract, digestive tract, or the metal. They might be hazardous in other ways too, depending on the particular corrosive material. An example is the chemical Chlorodinine. It has been used as a disinfectant and sterilizing agent as well as other uses. It is harmful if inhaled or swallowed.
  • Methylosmolene – This is a trade name for a family of volatile organic solvents. After the publication of several studies documenting the toxic side effects of Methylosmolene in vertebrates, the chemical was strictly regulated in the manufacturing sector. Liquid forms of Methylosmolene are required by law to be chemically neutralized before disposal.

The Data

The data available consists of sensor readings from a set of air-sampling sensors and meteorological data from a weather station in proximity to the factories and sensors. The factories and sensors locations are provided in terms of x,y coordinates on a 200x200 grid, with (0,0) at the lower left hand corner (southwest). The sensors map shows the locations of the sensors and factories by number for the sensors and by name for the factories. The datasets given will go through some data cleaning in order to be visualized using Tableau:

  • Sensor Data.xlsx
  • Sensor Location.xlsx
  • Meteorological Data.xlsx

The Specific Tasks

Question 1: Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?

Link to Workbook During the 3 month-long sample periods, 9 monitors are continuously measuring 4 chemical concentrations respectively. Sensor readings are logged at hourly interval 24 hours per day. Figure 1 shows an overview of the different sensors and its readings(9 sensors; 4 chemical detectors each; 3 month-long periods).

Figure 1: The 9 sensors' readings over the three month-long periods showing missing readings in red and multiple readings in orange


Missing Midnight Values

We can observe 2 patterns to the missing values represented by the white lines. The first pattern shows us that for certain timestamps there were missing data. Figure 2 shows that there are 7 points where there are records missing – Midnight on 2nd April, 6th April, 2nd August, 4th August, 7th August, 2nd December and 7th December. The readings show some exceptions cases (circled in red in Figure 2) were on the 2st August for Sensor 3 (AGOC-3A and Methylosmolene) and 7th December for Sensor 6 (AGOC-3A), Sensor 7 (AGOC-3A and Appluimonia), Sensor 8 (AGOC-3A)

Figure 2 Missing midnight readings (in white) over the 3 month periods


Exception Cases

There are exception cases of data missing as some sensors data show rare spikes of chemical concentration readings. An example is On 2nd August missing data timestamp, Sensor 3 show a range of high readings of Methylosmolene. We can observe that there be probably show some affiliation between the consistent missing data and the possible unusual spike of chemical readings

Figure 3 Sensor 3 chart shows exception case of Methylosmolene spike on 2nd August


On the midnight of 7 December of cases of missing data exception , it shows signs of spikes of Appluimonia (sensor 6, Figure 4), Methylosmolene and AGOC-3A (sensors 7 and 8, Figures 5 and 6).

Figure 4 Sensor 6 chart shows exception data gap occured with with spike in AGOC-3A and Appluimonia on 7th December


Figure 5 Sensor 7 chart shows AGOC-3A exception occured with with spike in Methylosmolene on 7th December


Figure 6 Sensor 8 chart shows Methylosmolene & AGOC-3A exception occured with with huge spike in Methylosmolene and normal spike in AGOC-3A on 7th December

Duplicate Entries

There is a matching pattern which shows that the double-entry of AGOC-3A readings (in orange dots) coincides with the missing Methylosmolene readings (in red dots) as every double entry there will be a missing entry. It could probably that the readings were recording wrongly which may led to a huge spike of AGOC-3A readings.

Figure 7 Double entry of data of AGOC-3A coincides with the missing data of Methylosmolene

Question 2: Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?

Link to Workbook

Data Cleaning

NOTE: I am using a cleaned data set which is done using a Python script. The new data is generated by using the calculation of finding the mean and the standard deviation of AGOC-3A emissions for each sensor. After which, z-score is calculated to check for duplicate data entries and the value which shows the lower z-score is assigned to AGOC-3A while the other entry will be assigned to Methylosmolene.

An example of such an entry is shown below:

Chemical Date Time Monitor Readings
AGOC-3A 17/4/2016 6:00:00 AM 3 11.00977
AGOC-3A 17/4/2016 6:00:00 AM 3 3.98816

Sensor Readings by Chemical Type

Figure 8 shows an overall pattern of the reading of different chemicals in 9 monitors in the 3 months. In April, there is a huge spike of standard deviation of 4.0 above the mean for AGOC-3A, a moderate spike of Chlorodinine readings and numerous spikes of Methylosmolene readings. While in August, there was continuous spikes of Methylosmolene readings from 10 to 13 August and the highest spike on 13 August. The other 3 chemical readings seems normal of up and downs along the mean. Lastly, in December, the readings for AGOC-3A, Appluimonia, and Chlorodinine are all above the mean white for Methylosmolene, the readings are mostly above the mean

There is a pattern where the timing of most of the spikes usually occurred at the same period of time. From Figure 8, the spike on AGOC-3A are in the same period as the spike on Methylosmolene from 14 to 16 April, while there are spikes in readings for all 4 chemicals in August from 11 to 13 August and spikes in readings of all 4 chemicals of AGOC-3A, Appluimonia and Methylosmolene on 5, 11 and 18th of December.

Figure 8 Double entry of data of AGOC-3A coincides with the missing data of Methylosmolene


Sensor Readings by Sensors

Figure 9 shows an overall pattern of the reading of different chemicals in 9 monitors in the three months. Sensor 3 always show high readings in the four chemicals in the three months. The readings on monitor 4 are small in April but there is a steep increase to be the highest in all the chemical readings in December. Monitor 6 shows a huge spike in reading of Methylosmolene and AGOC-3A in April while monitor5 and monitor6 has high reading of AGOC-3A in August.

Figure 9 Double entry of data of AGOC-3A coincides with the missing data of Methylosmolene

Question 3: Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

Link to Workbook

A wind-rose type of concept is utilise to pinpoint which companies are the producers of certain chemicals. By using the sensors as a base, we draw back a cone which aims to visually show which company is the possible culprit. By using wind-speed and the wind direction in the meteorological dataset, plot out a cone and whichever factory is covered by the cone, it is a probable producer of the chemical. We find the days which have a high spikes of Z-Score and find which company produce these chemicals.

AGOC-3A

Figure 11: z-score readings by hour

First, we plot out the readings for each monitor and chemical and filter the results by a upper bound range of the z-score over the hours. This is shown in Figure 11 above. This chart will help us to find the spikes in the readings by the hour. However, we are using 3-hour data for the meteorlogical data so if a chemical readings is 10:00am, we will use the wind readings at 09:00am to match against the spatial chart.

If a overlap occurs in the spatial chart with the right sensor, we will consider it as a hit and the company that it hits is recorded as a suspected producer of the chemical. An example of this is shown in Figure 12. If it is a miss, it is recorded as no-hit so that we know that there was no overlap based on the wind data and there could be other factors that need to be considered in order to find out the actual situation. An example of this is shown on Figure 13.

AGOC-3A (Hit)
AGOC-3A (Miss)


Appluimonia

Appluimonia Reading
Date Time Monitor Reading Suspected Company
29/4/2016 09:00:00 AM 6 26.85 Roadrunner
13/8/2016 11:00:00 AM 5 23.33 Radience, Indigo
5/12/2016 12:00:00 PM 9 23.55 Indigo
7/12/2016 01:00:00 AM 9 23.77 Radience

From the table, we can see that Radience and Indigo are the most probable producers of Appluimonia

App.jpg

Chlorodinine

Chlorodinine Readings
Date Time Monitor Reading Suspected Company
16/4/2016 11:00:00 AM 1 15.27 Roadrunner
20/8/2016 4:00:00 AM 2 22.38 Roadrunner, Radians
16/8/2016 5:00:00 AM 5 35.25 Radians, Kasios
5/12/2016 4:00:00 AM 7 40.2 Radians, Indigo
19/12/2016 10:00:00 PM 9 15.1 Roadrunner

From the table, we can see that every single companies are culprits of producing Chlorodinine.

Chlorodinine Wind Direction

Methylosmolene

Methylosmolene Reading
Date Time Monitor Reading Suspected Company
14/4/2016 04:00:00 AM 7 143.8 Kaios
15/4/2016 10:00:00 AM 8 303.3 Kaios, Radience (If the angle is higher)
12/8/2016 12:00:00 PM 5 253.9 Radience
20/8/2016 12:00:00 PM 5 128.1 Roadrunner
7/12/2016 1:00:00 AM 1 95.4 Roadrunner, Kaios
Meth.jpg

Credits

This assignment was done in collaboration with Russell Yap, Tay Wei Rong, Amos Tan, Ong Sue Cern