IS428 2017-18 T1 Assign Ng Jia Jun

From Visual Analytics for Business Intelligence
Revision as of 12:15, 6 October 2017 by Jiajun.ng.2014 (talk | contribs)
Jump to navigation Jump to search

Links

Credits

This assignment was done in collaboration with Wan Mei Ying and Tan Kun Sheng.

Overview

Mistford is a mid-size city is located to the southwest of a large nature preserve. The city has a small industrial area with four light-manufacturing endeavors. Mitch Vogel is a post-doc student studying ornithology at Mistford College and has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit, a popular local bird due to its attractive plumage and pleasant songs, is decreasing! The decrease is sufficiently significant that the Pangera Ornithology Conservation Society is sponsoring Mitch to undertake additional studies to identify the possible reasons. Mitch is gaining access to several datasets that may help him in his work, and he has asked you (and your colleagues) as experts in visual analytics to help him analyze these datasets.

Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production. Mitch discovered that the state government has been monitoring the gaseous effluents from the factories through a set of sensors, distributed around the factories, and set between the smokestacks, the city of Mistford and the nature preserve. The state has given Mitch access to their air sampler data, meteorological data, and locations map.

Problem #1

Q1: Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.


Dataset involved:

  • Sensor Data.xlsx

In order to determine whether if the sensors are working properly, sensors should be able to receive readings when they are deployed. There should not be any anomaly in all the readings as well.

Cyclic Plot

A cyclic plot of all the chemical readings by all sensors is made using Tableau with the field following fields:

  • Columns
    • Hour
    • Day
  • Rows
    • Month
    • Readings
  • Filter
    • Hour
  • Color
    • Monitor


NJJ-Q1-1.png

Figure 1.1 - Cyclic Plot

Figure 1.1 is a cyclic plot which shows an overview of all the chemical readings by 9 sensors in the three months (April, August and December) as given in the dataset. According to cyclic plot, it shows that there were missing records at 0:00 hour. During April and December, in Day 2, there were a few cases where all the sensors have failed to capture any readings at 0:00 hour. During August, in Day 2, only sensor #3 had reading at 0:00 hour. Hence, I can conclude that there are a few sensors which are already not working properly at the beginning of April. However, the cyclic plot does not review the non-working sensors accurately because of the overlapping lines. Thus, a heat map is created to conduct further analysis on all the sensors.

Heat Map

The heat map is prepared using the following fields:

  • Columns
    • Day
    • Hour
  • Rows
    • Month
    • Monitor
  • Filter
    • Day
  • Color
    • Readings


NJJ-Q1-2.jpg

Figure 1.2 - Heat Map

Figure 1.2 is a heat map which shows an overview of all the chemical readings by 9 sensors in the three months. The color and its density represent the amount of the chemicals detected by sensors. Using the heat map, I have looked through the total of 31 days for the three months and found out that there were 7 days where the sensors were not operating properly. Firstly, On 2nd and 6th April, all sensors could not detect chemical reading at 0 hour. Then, on 2nd August, at 0 hour, only sensor #3 was able to detect reading whereas the other 8 sensors were not able to. On 4th and 7th August, all sensors could not detect chemical reading at 0 hour. Last, on 2nd December, all sensors could not detect chemical reading at 0 hour; and on 7th December, only sensor #6, #7 and #8 were able to detect chemical readings at 0 hour. This findings has shown a strange pattern particular at 0 hour and it occurs once and twice throughout the 3 months. One of the possible reasons could be engineers are maintaining the sensors at that timing.

Anomaly Detection

The heat map is prepared using the following fields:

  • Columns
    • Hour
  • Rows
    • Month
    • Readings
    • Readings - Dual Axis
  • Filter
    • Monitor
  • Color
    • Reading Anomaly (calculated field)
NJJ-Q1-3.png

Figure 1.3 - Reading Anomaly (calculated field)


Reference distributions of standard deviation 3,-3 have been added. I understand that the chemical readings have different ranges thus I defined anomalies as being at least 3 standard deviation from the mean.

NJJ-Q1-4.png

Figure 1.4 - Reference distributions


NJJ-Q1-5.jpeg

Figure 1.5 - Line Graph

Figure 1.5 is a line graph which shows extremely high readings from at 3 standard deviation from the mean. There was a total of 6 extremely readings and are questionable as they deviate from the mean to a large extent. One of the possible reasons for the anomalies could be due to wind.

Key findings:

  • Sensor #2, one extremely high reading in April, August and December.
  • Sensor #3, #4 and #7, one extremely high reading in December.
  • Sensor #5 and #8, one extremely high reading in April.

Problem #2

Q2: Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.


Dataset involved:

  • Sensor Data.xlsx
  • Sensor Location.xlsx


NJJ-Q2-1.png

Figure 2.1 - Tableau Data Source

Drag both Sensor Data.xlsx and Sensor Location.xlsx into the Tableau Data Source. Using these two data sets, I can analyse the patterns of chemical releases based on the sensor location and which sensors are detecting certain chemicals.

Pie Chart Map

The pie chart map is prepared using the following fields:

  • Columns
    • X (Change Geographical Role to Longitude)
  • Rows
    • Y (Change Geographical Role to Latitude)
  • Color
    • Chemical
  • Size
    • Reading
  • Label
    • Monitor

Also, the background image has to be changed to MapLargeLabels.jpg via Map -> Background Images. The setting is set to Left:0, Right:200, Bottom:0 and Top:200.


NJJ-Q2-2.png

Figure 2.2 - Pie Chart Map

Figure 2.2 is a pie chart map which shows chemical releases proportion to a relative whole. At a glance, sensor #3 and #4 detected the most chemical readings whereas sensor #1 and sensor #2 detected the least. Most sensors were able detect the four different chemicals proportionally. Sensor #5, #6 and #9 have high detection on AGOC-3A. Further analysis has to be conducted to determine the patterns of chemical releases.

Chemical Release Calendar

The calendar is prepared using the following fields:

  • Columns
    • Monitor
    • Weekday
  • Rows
    • Chemical
    • Month
    • Week
  • Filter
    • Chemical
  • Color
    • Reading
NJJ-Q2-3.png

Figure 2.3 - Chemical Release Calendar (AGOC-3A)

Figure 2.3 shows a chemical release calendar of AGOC-3A. Sensor #3, #4, #5, #6 have a higher detection of AGOC-3A. Sensor #3 has the highest reading of 481.2 on the second Saturday in August. Both sensor #8 and #9 have one occurrence of higher detection of the chemical throughout the three months.


NJJ-Q2-4.png

Figure 2.4 - Chemical Release Calendar (Appluimonia)

Figure 2.4 shows a chemical release calendar of Appluimonia. Sensor #3 has a relatively high detection of Appluimonia throughout the 3 months (April, August and December). Sensor #4 has a incremental increase in the detection from April to December. Also, it has detected the highest amount of release for the whole month of December. Sensor #9 detected the highest release on third Sunday in December. Comparing with all the sensors, sensor #3 has the highest reading of 52.47 on the last Friday in December.


NJJ-Q2-5.png

Figure 2.5 - Chemical Release Calendar (Chlorodinine)

Figure 2.5 shows a chemical release calendar of Chlorodinine. Once again, sensor #3 has a relatively high detection of Chlorodinine throughout the 3 months (April, August and December). Sensor #4 also has a incremental increase in the detection from April to December. Sensor #4 has the highest detection of release on third Sunday in December. Particularly for Sensor #6, it has highest detection of 81.98 and is the highest among all the sensors.

NJJ-Q2-6.png

Figure 2.6 - Chemical Release Calendar (Methylosmolene)

Figure 2.6 shows a chemical release calendar of Methylosmolene. The detection of this chemical release follows an average pattern throughout all the sensors. Sensor #6 has the highest readings of 294.6 on first Friday in December.

In conclusion, sensor #3 and #4 have a similar pattern when detecting Appluimonia and Chlorodinine. Sensor #3 has a average pattern of detection whereas Sensor #4 has a incremental detection of the chemical release from April, August and December. The rest of the sensors have a lower readings when compared to sensor #3 and #4. In terms of detecting AGOC-3A and Methylosmolene, all sensors have a average pattern of detection with sensor #3 has the highest reading of AGOC-3A on the second Saturday in August and sensor #6 has the highest reading of Methylosmolene on first Friday in December.


Problem #3

Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.