Difference between revisions of "ISSS608 2016-17 T3 Assign TEN KAO YUAN MC2"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 76: Line 76:
  
 
<br>
 
<br>
[[File:TKY Sensor Coordinates.JPG|400px|left|border]]
 
  
<br>
+
[[File:TKY Sensor Coordinates.JPG|400px|border]]
 +
 
 +
 
 +
 
 +
 
 
-----
 
-----
  
Line 101: Line 104:
 
Finally the data is joined in Tableau as follows :  
 
Finally the data is joined in Tableau as follows :  
  
[[File:TKY Join.JPG|400px|border]]
+
[[File:TKY Join.JPG|800px|center|border]]
  
 
==Q1 - Sensors==
 
==Q1 - Sensors==

Revision as of 10:22, 16 July 2017

Test.jpeg VAST Challenge 2017

Introduction

Mini-Challenge 1

Mini-Challenge 2

Mini-Challenge 3

Grand Challenge


Mini-Challenge 2 : No Smoke Without Fire

Introduction

Ornithology student Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production. Mitch discovered that the state government has been monitoring the gaseous effluents from the factories through a set of sensors, distributed around the factories, and set between the smokestacks, the city of Mistford and the nature preserve. The state has given Mitch access to their air sampler data, meteorological data, and locations map. Mitch is very good in Excel, but he knows that there are better tools for data discovery, and he knows that you are very clever at visual analytics and would be able to help perform an analysis.

Mini-Challenge 2 provides a three month set of data for you to analyze, covering April, August, and December 2016.

The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below. In addition, prepare a video that shows how you used visual analytics to solve this challenge. Novel visualizations and analysis approaches are especially interesting for this mini-challenge.


Data Preparation

This data consists of sensor readings from a set of air-sampling sensors and meteorological data from a weather station in proximity to the factories and sensors. The factories and sensors locations are provided in terms of x,y coordinates on a 200x200 grid, with (0,0) at the lower left hand corner (southwest). The sensors map shows the locations of the sensors and factories by number for the sensors and by name for the factories. Some of the other features of the map (such as entrances and gates in that area) have been removed for readability. (Please note that the terms “sensor” and “monitor” are used interchangeably.)




The Meteorological data represents 3 months of readings in the following format:

float


Date: The date and time of the readings, local time with no change for Daylight Savings.
Wind Direction: The compass directions where the wind is originating from, using a north-referenced azimuth bearing where 360/000 is true north.
Wind Speed: The speed of the wind in meters per second.

Each of these reading is taken at the date and time provided.




The Sensor data (provided in an Excel spreadsheet) contains 3 months of readings in the following format:

float


Chemical: Which one of the four chemicals detected by the sensors
Monitor: Which one of the nine sensors picking up the reading
Reading: The air sensor detected amount in parts per million
Date Time: The date and time of day of the reading, local time with no change for Daylight Savings. br>



We create a simple excel file Sensor Coordinates with the monitor number and sensor coordinates to integrate into Tableau


TKY Sensor Coordinates.JPG




The following are the factory locations:

Roadrunner Fitness Electronics: 89,27
Kasios Office Furniture: 90,21
Radiance ColourTek: 109,26
Indigo Sol Boards: 120,22

As such we create an excel file Factory with the Factory Coordinates as well as the Factory name. In order to pad the data in tableau, we repeat the data for each monitor number :

TKY Factory ID.JPG


Likewise we create an excel file called Path ID with Path ID (from 1 to 12 ) and angles (from 0 to 30 with increments of 3 degrees). When the Path ID reaches 12, the angle reverts to 0. This is to enable us to trace the wind direction as polygons in Tableau.

TKY Path ID.JPG


Finally the data is joined in Tableau as follows :

TKY Join.JPG

Q1 - Sensors

Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?

TKY - Missing values, double values.png

When plotting the number of readings by sensor, chemical, day of the month as well as month, a few clear patterns appear. If all the sensors were operating normally there would be one reading every hour. However this is not the case. Blue represents one reading, one represents no reading and red represents 2 readings.

There is a mass black out of monitors for all chemicals at midnight between 1st and 2nd day of every month (except monitor 3 in December). This is possibly due to a systemic issue or a scheduled event like a sensor reset, power blackout or data center reset causing all the sensors to go offline on the 2nd of each month at midnight Similar issue is seen for specific months at midnight as well:

  • 5th to 6th April (all monitors)
  • 3rd to 4th and 6th to 7th August (all monitors)
  • 6th to 7th December except monitor 6,7,8 working for AGOC-3A; 7 working for Appluimonia, 8 working for Methylosmolene

There are missing values for Methylosmolene only. This could be a cause for concern it is toxic and is stringently regulated. There are double readings for AGOC-3A only. It is possible that it is due to the nature of the chemical itself and not the monitor. This is less of a concern as this chemical has a foul odour but is not harmful.


TKY - Missing Values Link.png

When put side by side, double readings for AGOC-3A occur at the same time when there are missing readings for Methylosmolene! This cannot be a coincidence. The chemicals might be affecting each other’s sensor readings; the levesl of methylosmolene might be so high that the sensor cannot give a reading and affect the channel that reads the AGOC-3A value. If true this is very dangerous as Methylosmolene is highly toxic.

TKY - Appluimonia.png

When plotting the different sensor values for Appluimonia it is clear that there are differences between the sensors. Sensors 3 and 7 are obviously giving higher readings than the other sensors. This could indicate that they are not calibrated properly and giving excessively high values. The fact that they are consistently high seems to rule out the influence of meteorological influence. A boxplot would be suitable to illustrate this. As such we should be cautious when relying on the readings of these sensors.

When examining the minimum reading for each of the sensors, we notice that sensor 4 is drifting upwards each month. This seems to indicate that sensor 4 has an offset issue which is getting worse and worse. We should not overly rely on sensor 4's reading in August and December.

TKY - Sensor 4 is broken.png

Q2 - Chemicals

Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?

We only have reading and do not know the danger threshold. In general AGOC-3A readings and Methylosmolene ([0-100] range) are higher than Appluimonia and Cholorodinine values ([0-10]) range.

TKY Heat Map Day AGOC-3A.png
TKY Heat Map Day Appluimonia.png
TKY Heat Map Day Chlorodinine.png
TKY Heat Map Day Methylosmolene.png
  • AGOC-3A and Methylosmolene have lower values during the last week (or 2 weeks) of each month. This could be linked to Christmas holidays in December, summer holidays in August and spring break in April.
  • AGOC-3A occurs more frequently on Fridays and during week 33
TKY Heat Map MvD1.png
TKY Heat Map MvD2.png
  • Methylosmolene peaks on the 2nd of every month, followed by 7-8 days after that
  • Appluimonia is high during 2-4 August
  • When plotting the Month vs the Day of the Month it is even clearer that AGOC-3A and Methylosmolene are not emitted towards the end of the month.
TKY - Time of Day.png
  • Methylosmolene peaks between 10pm-5am
  • AGOC-3A peaks between 6am and 9 pm
  • Applumonia and Chlorodinie are more stable during the day

There seems to be a link between the use of these 2 chemicals which are organic compounds and they complement each other. It is possible that one chemical is converting to the other or that one factory is swapping out a safe compound for a toxic compound at night to avoid detection. Given the link we saw earlier between Methylosmolene and Appluimonia values, this does not seem to be a coincidence.

Q3 - Factories

Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

The chosen method is to focus on when the there is a peak in the detection levels and then use the wind direction to determine which factory could have emitted the chemicals. We need to take into account the fact that the wind data is only available every 3 hours and that what was detected would need to take time to travel to the sensor. We have to be careful about the wind direction given as it is the orientation where the wind comes from. Since we are studying which sensor is downwind, we need to inverse the wind direction.


Methylosmolene

We focus on this one which very toxic. Kasios is likely to be responsible. There a lower probability that it is Road Runner.
Relying on Peak Readings

TKY Q3 9 Apr.png
TKY Q3 9 Apr Sensor.png

We observe a peak for Methylosmolene on sensor 6 at 1AM 4th April. During this time of day the wind is blowing from Kasios.

Relying on Missing Values

TKY - GIF double.gif

If we assume that the missing Methylosmolene values indicates that there is an anomaly with the Methylosmolene emissions (either a massive amount of Methyosmolene outside the range of the sensors or an incomplete conversion of Methylosmolene into AGOC-3A), then we should also study incidents like this closely. As seen in question 1, missing values of Methylosmolene coincide with a double value of AGOC-3A. As such, we can rely on the count of AGOC-3A readings instead. Our visualization shows the direction the wind is blowing to and also the count of of the AGOC-3A readings at each station. In this animation, we see that the emissions could have come from either Kasios or Road Runner. This investigation is carried for the different days where there are double AGOC-3A readings and missing Methylosmolene values.


Chlorodinine

TKY - Chlorodinine.gif

Road Runner is likely to be responsible. There a lower probability that it is Kasios. The animation shows the wind direction and chlorodinine levels for December 5, 18 and 23, days with the highest Chlorodinine levels in December.


Appluimonia

TKY - Animated Appluimonia.gif

The culprit is likely to be Indigo, and a lower possibility of it being Radiance. This is just a smelly chemical and probably not regulated. The animation shows readings and wind directions for 7th and 20th April, 13th Aug as well as 5th, 7th and 18th Dec. We cannot trust the reading from sensor 4 in the months of August and December as the minimum offset is increasing. We can see this by plotting the minimum sensor reading for each day; the readings for sensor 4 are increasing each month. This indicates a problem with the sensor itself. We also observe that sensors 3, 7 and 8 give a high reading for Appluimonia when the wind is blowing in the opposite direction. Appluimonia might be coming from other areas, such as the pathways or the park itself as this could be emitted from rotting vegetation and organic matter.

TKY - Appluomonia minimum.png

AGOC-3A

We have to be cautious with AGOC-3A as in our working hypothesis, it is affected by Methylosmolene. In this case we need to do studies, the first taking into account the peaks due to interference from Methylosmolene and secondly excluding peaks from Methylosmolene. Without filtering for anomalous behavior of the sensors, we obtain the following animation.

TKY - Animated AGOC-3A.gif

References and Feedback

References

Feedback Please leave your feedback here

Hi David,

Nice work! It's pretty cool that you have done all three MC cases! Great efforts!

Here are some of my concerns:

  • For Q2 to see the pattern of chemical release: you look at it by day in all three months, it may be more rational to compare the release pattern in three separate month instead of aggregate all three months together.
  • For Q3 wind plot: Would it be more reasonable to plot the wind area start point at each sensor? Since the wind direction and speed data are detected by the sensor, then we can inverse the wind detected area to see if which factory is under this area.

Best regards,

Xiaoqing

Hi David,

I’m very impressed with your work! Clarity wise there is no issue. I’m particularly impressed at how you managed to squeeze the hours into the y axis. The only (minor) comment I have is on the aesthetics. I find the dark blue and the maroon colours a little hard on the eyes, particularly as they appear to be of the same value despite being of different hues. Maybe if you change the maroon to a brighter shade of red it might be better? The one that I’m most impressed is how you managed to flip the colours of the map around. The use of the overlapping wind plots is also very clear, although aesthetics wise, choosing 4 distinct colours - rather that 2 blue and 2 yellow/orange - may make the distinction between the sources clearer.

Thanks,

Vincent Mack