Difference between revisions of "ISSS608 2016-17 T3 Assign MACK ZHI WEI VINCENT"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 29: Line 29:
 
===Are Sensors working all the time?===
 
===Are Sensors working all the time?===
 
There are some occasions where the data is not available.  
 
There are some occasions where the data is not available.  
[[File:VMQuestion 1 - missing data.png|300x300|framed|center|Screenshot of Missing Chemical Reading Data]]
+
[[File:VMQuestion 1 - missing data.png|632 × 599 pixels|framed|center|Screenshot of Missing Chemical Reading Data]]
 
====Regular disappearances. Maintenance?====
 
====Regular disappearances. Maintenance?====
 
In the dataset, it is observed that on the sensors have no data at 12am on the following dates:
 
In the dataset, it is observed that on the sensors have no data at 12am on the following dates:

Revision as of 19:04, 14 July 2017

Assignment - To be a Visual Detective:

Question Attempted

Mini-Challenge 2

Objectives

Ornithology student Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production. Mitch discovered that the state government has been monitoring the gaseous effluents from the factories through a set of sensors, distributed around the factories, and set between the smokestacks, the city of Mistford and the nature preserve. The state has given Mitch access to their air sampler data, meteorological data, and locations map. Mitch is very good in Excel, but he knows that there are better tools for data discovery, and he knows that you are very clever at visual analytics and would be able to help perform an analysis.

Mini-Challenge 2 provides a three month set of data for you to analyze, covering April, August, and December 2016.

The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below. In addition, prepare a video that shows how you used visual analytics to solve this challenge. Novel visualizations and analysis approaches are especially interesting for this mini-challenge. Please do not use any other data in your work (including other Internet-based sources or other mini-challenge data).

You may use tools you developed in other VAST Challenges in your efforts – please let us know when you do so!

Mini Challenge 2

Question 1

Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.

Overall trend analysis from Slope Graph

Slope graph.png
  • The slope graph above shows us that the sensors can be categorised into three groups. The first group – sensors 2 and 3 – have readings that start low in April, peak in August, and drops back low in December.
  • The second group – sensors 6, 7 and 8 – have readings that start high in April, dip in August, and go back high in December.
  • The final group – sensors 1, 4, 5 and 9 – have readings that start at the lowest value in April, grow higher in August and is the highest in December. Readings for sensor 4 have the highest growth rate, followed by 5 and 9, with 1 at the slowest growth rate.

Are Sensors working all the time?

There are some occasions where the data is not available.

Screenshot of Missing Chemical Reading Data

Regular disappearances. Maintenance?

In the dataset, it is observed that on the sensors have no data at 12am on the following dates:

  • 2 April 2016
  • 4 April 2016
  • 2 August 2016
  • 4 August 2016
  • 7 August 2016
  • 2 December 2016
  • 7 December 2016

For these periods, chemical reading data for all chemical types were mostly missing, with the following exceptions:

  • 2 August 2016 on Monitor 3, where only data for Appluimonia and Chlorodinine were absent;
  • 7 December 2016 on Monitors 6 and 8 where all data except AGOC-3A were absent, and Monitor 7 where only Chloridinine and Methylosmolene were absent.

The regularity of the occurrences suggests that the sensors may have been systemically shut down for maintenance, although the two exceptions hint that there might be something else going on.

Irregular performance of sensors detecting Methylosmolene

Other than the above-mentioned missing data, data on Methylosmolene were absent on many different dates and hours as can be seen in the visualisation.

Missing wind data

A similar examination of wind data shows that like chemical readings, the much of the wind data missing was on the same dates. Unlike the chemical readings, Wind Data are only recorded every 3 hours. Also, wind data collection appear to be non-operational in the first few days of August, as well as the 2nd last day (30th August) in the 3rd hour.

Studying the variation with a calendar plot and horizon chart

Calendar plot analysis

Calendar Plot.png
  • The Calendar plot above provides an overview to the sensor dataset.
  • It tells us that relative to the rest of the sensors, the readings detected by sensors 3 and 4 are generally much higher than the rest of the sensors.
  • For sensor 4, this was especially so for the August and December readings.
  • For sensors 5 and 6 – while readings are not as high as 3 and 4 – have much higher variability in the readings.
  • This calendar plot allows us to detect outlier readings judging by the darker patches on the plots of sensors 5, 6, 7, 8 and 9.
  • However, the calendar plot is limited. We will need to rely on other visualisation methods to get a better sense of the data.

Horizon Chart Analysis

Horizon Chart.png
  • The horizon chart of the sensors provides clearer indication on the performance of the sensors. With the global average reading as the baseline, the horizon chart measures the difference between the average reading per monitor compared to the global average. Readings that fall below the baseline, get coloured as green, while those above the baseline are coloured yellow. Higher readings (i.e. those more than 1 correspond with darker hues of red.
  • Sensors 1, 2, 8 and 9 are mostly in the green zone, with a few fluctuations above the baseline.
  • With the exception of April’s readings, sensor 7 displays similar behaviour as well.
  • From both the horizon chart and the calendar plot, we see that the readings are especially high for sensors 3 and 4. These spikes could be further investigated later to see if they correspond with wind direction.
  • For sensor 4, readings start low in April but get progressively higher each period, with the highest in December.
  • Sensor 6 displays the most irregular performance, as fluctuations in the readings appear below and above the baseline. This may be the result of the location of Sensor 6 being set in the middle of the four factories, which bear further examination later on.
  • In order to get a better sense of any cyclical patterns, we use a cycle plot showing the with the lowest and highest values in each cell labelled. Reference lines and bands were added showing the median, and upper and lower quartiles.

Cycle Plot: Studying seasonality

Cycle Plot.png

The behaviour of sensors 2, 3 shows similar seasonality patterns.

  • The highest readings in April was on Friday where readings rose above their respective upper quartiles.
  • The shape of the curves for the month of August resemble sine curves, where readings are generally low on Sunday, rising, to its first peak on Monday, then dipping on Thursday or Friday and rising to another high on Saturday.
  • The shape of their curves was also similar in December, where readings rose from Sunday to Monday, dipped to a low on Tuesday, and slowly tapered off to the rest of the week.
  • These findings suggest that the same factors may be responsible for the readings recorded by Sensor 2 and 3.
  • The readings of sensor 1 are also roughly similar to sensors 2 and 3 but some minor differences apply – i.e. in April, readings peaked on Saturday instead of Friday, on Tuesday instead of Monday in August, and there was an extra rise in readings on Wednesday in December. These minor differences suggest that other factors may be responsible for these slight deviations in the readings.
  • Other notable groups with similar reading patterns are the sensor-pairs 5-and-9 and 7-and-8 as can be seen from the cycle plot above. They showed noticeable increases and decreases on the same weekday each month suggesting that common factors are responsible for these patterns. This may also be due to the close proximity in which these sensors are to each other.
  • Outlier sensor data detected are those of sensors 4 and 6, although sensor 6’s readings seems to be mirroring both sensor 5’s and 4’s readings – in the sense that the when readings in sensor 4 and 5 rise, sensor 6’s will drop to different extents. Given their respective locations, this may be due to the direction of the wind blowing from certain factories. This hypothesis is supported when observing the Coxcomb chart below – where the wind direction is plotted with the magnitude of the readings, i.e. the highest readings are indicated by largest segments in the respective Coxcomb chart. In the case of sensor 6, the readings tend to be the highest when they are blowing towards a southerly direction.

Strangely enough, the two sensors with the highest readings – i.e. 3 and 4 – do not show much similarities with each other in the cycle plot. However, the coxcomb chart below shows that the similarities in the readings may be the result of the similar wind directions.

Coxcomb sensors.png

Question 2.

Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.

Slope graph comparison

Slopegraphcomparison.png

We turn to the slope graph to get a sense of how the composition of the chemicals contribute to the overall trend observed earlier. Overall trends observed:

  • Generally, AGOC-3A seems to be responsible for the largest fluctuations in the readings, with clear examples as observed by the spike in August for sensors 3 and 5.
  • For sensor 6, the drop in Methylosmolene in August is responsible for the V shape observed in the slope graph.
  • It is interesting to note that for sensors 3, 4, 7, 8, and 9, the shapes of the slopes while different in magnitude, are generally the same for all chemical types, with 8 being the exception for the tiny spike in Applumonia in August.
  • These findings suggest that AGOC-3A and Methylosmolene are possible main factors responsible for variation in the readings.

Horizon and Coxcomb chart analysis of chemical types

  • A separate horizon chart – factoring chemical types – was constructed to explore this further along with a coxcomb chart that map the magnitude of readings to the size of the segments corresponding with the wind direction.
  • For AGOC-3A, we can see that there is co-occurrence in the readings for sensors 1 and 2 (and parts April and August for sensor 4 as well) based on the shapes of the horizon chart. A quick check at the corresponding coxcomb chart shows that the wind directions remain constant for the most part.
  • The same can be observed for sensors 7 and 8, although their wind directions corresponding with the largest readings are dissimilar.
  • This suggests that the similarities in these two pairs of sensor may be a function of their close proximity to each other and the source of much of the readings they register may be ambient rather than blown from some other location.
  • Despite being the sensors with highest average readings, sensor 3 and 4 for the most part, have very different shapes on the horizon chart, suggesting that the drivers for the readings detected by these two sensors may be different. This is further reinforced by the findings from the Coxcomb chart that shows that the wind direction responsible for those readings are different.
  • As previously observed, readings and wind direction data for sensor 6 are pretty much standalone. Given its location, it may be the best candidate for answering Question 3.
AGOC-3a horizon cox.png
  • For Appluimonia, what shouts out to me isn’t so much the shape of the horizon chart but rather that most of the readings seem to be coming from the same the wind direction – i.e. from the northwest to southeast, or from the east to west.
  • Similar observations can be made from the Chlorodinine chart.
Applu horizon cox.png
Chloro horizon cox.png
  • Results from the horizon and coxcomb charts for Methylosmolene resonate with a lot that has been already mentioned. Of note is that for sensors 3, 4 and 6, the largest spikes in readings come from a north or north-westerly source. Given that along that direction lies the park roads and gate 7, there’s a high chance that this chemical may be actually coming from vehicles travelling along the park trails rather than the factories. The same can be said for sensor 9 whose large spike in reading in April is corresponds with a wind blowing from the east, where only park trails are seen on the map.
Meth horizon cox.png

Question 3.

Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.

Impact of wind factors on sensor readings

Impact of wind speed on readings

Windimpact.png
  • From the charts above, we can see that there is a correlation between slower wind speeds and higher readings. This makes logical sense as higher wind speeds probably blow away chemical particles that would otherwise be detected by the sensors.

Impact of wind direction on readings

  • We can see that for sensors 1, 2, 3, 4, 7, and 8, (i.e. the sensors on the left half of the map surrounding the factories) the wind directions that correspond with the highest readings are similar, mostly coming from the northwest or southeast directions. The correlation between the higher readings and the northwest and westerly winds suggests that the factories on the east side of these sensors may be responsible for these readings.
  • Sensors 5 and 9 have similar looking coxcomb charts, suggesting that the wind directions most responsible for their high readings blow to the southwest, west, and the southeast.
  • Most of sensor 6’s readings come from northwards, blowing towards the south-south-east direction.

Unexpected findings

  • However, it is strange that winds blowing towards the southeast have resulted in high readings for sensors 1,2,3,4,7,8 as there are no factories to their north-western side. Given the direction of sensor 6’s data, this mysterious northern source of chemicals also seems to be the one responsible for sensor 6 as well.
  • The same is observed for sensors 5 and 9, that the highest readings come from the side opposite of the factories.

Intro

After separating the dataset by matching the wind direction data to the direction from the factories, I made an interesting discovery:

  • From the factories, Indigo Sol Boards seems to be the biggest culprit in chemical emissions, judging by the readings from sensor 3 and 6.
  • Next in line is Radiance ColourTek, followed by Roadrunner Fitness Electronics and Kasios Furniture.
VMReadings from factories.jpeg

The largest reading that can be traced to Indigo Sol Boards is Methylosmolene in December. Given that Indigo Sol Boards produced snowboards, and if this area is located in the Northern hemisphere, Winter months are in the December period, Methylosmolene may be a chemical that is emitted in the production of these snowboards and other equipment.

VMSlope graph Q3.jpeg

Also, I discovered that most of the readings that were picked up cannot be attributed to the factories, and instead seems to come from the environment - i.e. the park - in more or less equal amounts of chemicals.

VMReadings from other places.jpeg

When I layered the coxcomb chart onto the map, the culprit became clear. The source of the chemical readings, excluding the factories might have come from the campsites and the roads, where vehicles travel on.

VMWind blows.jpeg

Discussion

Please feel free to leave comments here.

Hi Vincent,

Nice work and beautiful wind plot!

I have one concern:

When compare the chemical reading of each monitor, would it be more reasonable to use non-aggregated reading data? Using individual reading can also look at it in more specific details like to see the pattern by hour.

Great efforts overall!

Best regards,

Xiaoqing

Reference

Below are some works I personally admire and learning from it.