IS428 2017-18 T1 Assign Ong Sue Cern

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Links

Overview

Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors. Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons. Mitch is gaining access to several datasets that may help him in his work, and he has asked you (and your colleagues) as experts in visual analytics to help him analyze these datasets.

Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

Companies

Roadrunner Fitness Electronics produces personal fitness trackers, heart rate monitors, headlamps, GPS watches, and other sport-related consumer electronics.

Kasios Office Furtniture manufactures metal and composite-wood office furniture including desks, tables, and chairs.

Radiance ColourTek produces solvent based optically variable metallic flake paints with the lowest volatile organic compounds in industry.

Indigo Sol Boards produces skateboards and snowboards and has seen modest growth in recent years.

Chemicals

Appluimonia is an airborne odor is caused by a substance in the air that you can smell. While it does not cause serious injury, long-term health effect, or death to humans or animals, it may affect the quality of life and sense of well-being.

Chlorodinine is a corrosive that can attack and chemically destroy exposed body tissues as soon as it touches the skin, eyes, respiratory tract or digestive tract. It is thus harmful if inhaled or swallowed. Chlorodinine is used as a disinfectant and sterilizing agent as well as other uses.

Methylosmolene is a trade name for a family of volatile organic solvents. Several studies have documented the toxic side effects of Methylosmolene in vertebrates, and the use of it in manufacturing is strictly regulated. Liquid forms of Methylosmolene are required by law to be chemically neutralized before disposal.

AGOC-3A has been developed under new environmental regulations and consumer demand for low-VOC and zero-VOC solvents. It is less harmful to human and environmental health.

Question 1

Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.


In general, the sensors function well across all hours. However, at 12am on several days each month, all sensors do not send out readings. These days are 2, 6 Apr; 4, 7 Aug and 2 Dec. On several other days, only a few sensors give out readings such as 12am, 2 Aug where only Sensor 3 gave readings; and these were for the chemicals AGOC-3A and Methylosmelene only. As this occurs regularly at the start of the month, we could infer that this is a monthly setup required for the sensors.

All monitors no rcds.png

A heatmap plotting the number of records for each day (y-axis, 1-30) and hour (x-axis, 0-23), recorded by all sensors, in April 2016. Missing data at 12am on 2 and 6 Apr appear as white boxes.

For all sensors across all months, duplicate data in AGOC-3A occurs when Methylosmelene readings are missing. This pattern suggests that the duplicate records for AGOC-3A could correspond to one reading for AGOC-3A and another for the missing Methylosmelene data. Since this error occurs for all sensors and only affects these two chemicals, it is possible that both AGOC-3A and Methylosmelene share similar properties, making it hard for the sensors to differentiate them.

Indiv monitors no rcds.png

A heatmap plotting the number of records for each day and hour, recorded by Sensor 5, in August 2016. Dark blue indicates duplicate records while missing data are white in colour. Teal colour indicates one record for the hour (expected behaviour).

After plotting the data on a timeline, based on chemical reading by sensor and month, it becomes visible that there are very drastic spikes in the AGOC-3A reading which occur when Methylosmolene data is missing. These readings are about 10 times greater than previous readings, thus we can infer that the spike is not caused by the duplicate readings alone. Given that Methylosmolene data is missing in such a specific scenario, these errors are not random but are due to high levels of AGOC-3A, Methylosmolene or both, in the atmosphere. This inference assumes that the readings available at that timing is accurate to begin with. To make small variations of readings more visible on the timeline, I plotted the square-root of each reading on the y-axis.  

Gaps1.jpg


Large spikes in chemical reading for AGOC-3A tend to correspond to missing data in Methylosmolene, as shown by Sensor 3 readings between 1-14 Aug.

Thus, to correct the duplicate data in AGOC-3A and assign it to the missing data in Methylosmolene, I use a Python script that checks for the z-score of each duplicate record based on the mean and standard deviation of AGOC-3A and Methylosmolene readings over the 3 months for a particular sensor. Then, I used the assignment of records to chemicals with the lowest sum of z-scores to correct the data.

Cods.png
Gaps2.png

Post-correction, the data had spikes for both AGOC-3A and Methylosmolene readings, with AGOC-3A showing lower spikes. Next, to spot for changes across months in the sensors, I plotted them using a box-plot for all readings occurring in each month, per sensor and by chemicals. Based on the trend line and box-plot, Sensor 4 has readings that are increasingly at a similar rate for all chemicals, which is unlikely to occur in real life. Other factors that convince me that the increase is due to errors in Sensor 4 is that sensors near Sensor 4 do not experience the same rate of increase, and increase in other sensor readings across months are never similar in rate across all chemicals. E.g. besides Sensor 4, Sensor 5 and Sensor 9 show increase in readings for all chemicals across the months. However, the increase is more gradual and the gradient is dissimilar across chemicals. Another observation from the box-plot is high sensor readings for all chemicals for Sensor 3, which is located not particularly close to the factories. The high readings are not captured by sensors 2 & 4 located next to 3, so we can assume that Sensor 3 may be particularly sensitive to chemical readings. However, given an increase in chemical readings for some sensors and not for others, it would be hard to correct the readings of Sensor 3 to match that of the other sensors.

S4 bxplt.png

Sensor 4 shows a clearly increasing trendline for each chemical reading across the months, when compared with the other sensor readings across months. Both Sensor 5 and 9 show an increase in readings, but with varying gradients across chemicals.

S4 incr.png

Closer examination of Sensor 4 show mean readings have been increasing across the month, with very similar mean values across the chemicals.

Question 2

Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.

By summing up the hourly readings for all monitors, it is visible that AGOC-3A and Methylosmolene readings have much more variation than that of Cholorodinine and Appluimonia. Peaks of the former 2 chemicals reach above 100 ppm while peaks of the latter reach at most 21 ppm.

Q2P1.png

Next, plotting the hourly readings against the weekdays for all sensor readings, we can observe that AGOC-3A and Methylosmolene readings vary more by hour, with AGOC-3A readings highest between 6am – 9pm, while Methylosmolene readings are highest between 10pm – 5am. Both Appluimonia and Chlorodinine show less visible patterns across the hours in a day. For all chemicals, there are no visible patterns across the day of the week and the chemical concentration taken from the readings.

Q2P2.png
Q2P3.png

The graph shows AGOC-3A readings (to the power of ¼, for clearer patterns in smaller readings) across the months for the different sensors. The patterns are similar for sensors located near each other. E.g. Sensor 1 and 2 share similar peaks; same for sensors 5 and 9. Sensors 3 to 6, and Sensor 8 show high peaks in readings, reaching above 70ppm, which is far higher than the average reading of 0.75 ppm.

Q2P4.png

By plotting the lower quartile, median and upper quartile, we can complement the earlier sensor readings timeline to identify which sensors have captured high readings of particular chemicals. Here, for AGOC-3A, (ignoring the unnaturally high readings of Sensor 3 mentioned in Q1), we can observe that sensors 3-6, and sensors 8-9 have captured most of the high chemical readings of the chemical.

Question 3

Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.

To understand which factories may contribute to the emission of chemicals, we need to use spatial data of the map of the sensors and factories, then impose the wind cones of air plume pointing from the sensors in the opposite direction of the wind. If several wind cones overlap, it may point us to the direction of the factory that releases the particular chemical.

  1. In order to draw the wind cones as polygons on the map, we need to create a path to indicate to Tableau the order to draw the polygons.
  2. Next, the length of the wind cone is calculating by converting the windspeed in m/s into miles/hour and then onto the 200 x 200 grid. The length scale is a factor and used as a configurable parameter to enable users to adjust the length of the wind cone.
  3. Add/subtract 180 degrees from the given wind direction so that wind cones point away from possible source of the chemicals.
  4. In order to plot the X and Y coordinates of the triangle, I used the following formula. The parameter angle describes the arc of the wind cone and it can be adjusted by the user.
  5. Next, drag the Date into the pages tile in tableau and toggle the “all feature” under history to display all the triangles.
  6. To ensure that only the triangles with significantly large values appear on the map (so that smaller values would not create many overlays on all sensors), we use the value threshold, which is first determined by the constructing a box-and-whisker plot of all distributions of chemicals across 3 months. We then arrive on conclusion to use all readings that are above 3.5 ppm for Appluimonia and Chlorodinine; while using 16 ppm for both AGOC-3A and Methylosmolene.

Analysis

By drawing an overlay of wind cones of each chemical as detected by the sensors, we can identify the possible culprits emitting high amounts of chemicals. The wind cones drawn also show us that there are other sources of pollutants besides these 4 factories.

AGOC-3A

Q3AGOC.png

Based on the overlapping wind cones from sensor 3, Roadrunner seems to be responsible in emitting AGOC-3A.

Applumonia

Q3 Appl.JPG

Based on the overlapping wind cones from sensors 3 and 5, Roadrunner, Kaisos and Indigo are likely culprits.

Chlorodinine

Q3 Chlo.JPG

The overlapping wind cones point to Kaisos as responsible for high levels of Chlorodinine emitted.

Methylosmolene

Q3 Meth.JPG

Roadrunner seems likely to have contributed to high amounts of Methylosomolene in the air.

To conclude, the likely emitters of the following chemicals are as such.

AGOC-3A Applumonia Chlorodinine Methylosmolene
Roadrunner Yes Yes No Yes
Kaisos No Yes Yes No
Indigo No Yes No No
Radiance No No No No