ISSS608 2016-17 T3 Assign MACK ZHI WEI VINCENT

From Visual Analytics and Applications
Jump to navigation Jump to search
Assignment - To be a Visual Detective:

Question Attempted

Mini-Challenge 2

Objectives

Ornithology student Mitch Vogel was immediately suspicious of the noxious gases just pouring out of the smokestacks from the four manufacturing factories south of the nature preserve. He was almost certain that all of these companies are contributing to the downfall of the poor Rose-crested Blue Pipit bird. But when he talked to company representatives and workers, they all seem to be nice people and actually pretty respectful of the environment.

In fact, Mitch was surprised to learn that the factories had recently taken steps to make their processes more environmentally friendly, even though it raised their cost of production. Mitch discovered that the state government has been monitoring the gaseous effluents from the factories through a set of sensors, distributed around the factories, and set between the smokestacks, the city of Mistford and the nature preserve. The state has given Mitch access to their air sampler data, meteorological data, and locations map. Mitch is very good in Excel, but he knows that there are better tools for data discovery, and he knows that you are very clever at visual analytics and would be able to help perform an analysis.

Mini-Challenge 2 provides a three month set of data for you to analyze, covering April, August, and December 2016.

The primary job for Mitch is to determine which (if any) of the factories may be contributing to the problems of the Rose-crested Blue Pipit. Often, air sampling analysis deals with a single chemical being emitted by a single factory. In this case, though, there are four factories, potentially each emitting four chemicals, being monitored by nine different sensors. Further, some chemicals being emitted are more hazardous than others. Your task, as supported by visual analytics that you apply, is to detangle the data to help Mitch determine where problems may be. Use visual analytics to analyze the available data and develop responses to the questions below. In addition, prepare a video that shows how you used visual analytics to solve this challenge. Novel visualizations and analysis approaches are especially interesting for this mini-challenge. Please do not use any other data in your work (including other Internet-based sources or other mini-challenge data).


Mini Challenge 2

Question 1

Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.

Overall trend analysis from Slope Graph

Slope graph.png
  • The slope graph above shows us that the sensors can be categorised into three groups. The first group – sensors 2 and 3 – have readings that start low in April, peak in August, and drops back low in December.
  • The second group – sensors 6, 7 and 8 – have readings that start high in April, dip in August, and go back high in December.
  • The final group – sensors 1, 4, 5 and 9 – have readings that start at the lowest value in April, grow higher in August and is the highest in December. Readings for sensor 4 have the highest growth rate, followed by 5 and 9, with 1 at the slowest growth rate.

Are Sensors working all the time?

There are some occasions where the data is not available. Screenshot of Missing Chemical Reading Data

Regular disappearances. Maintenance?

In the dataset, it is observed that on the sensors have no data at 12am on the following dates:

  • 2 April 2016
  • 4 April 2016
  • 2 August 2016
  • 4 August 2016
  • 7 August 2016
  • 2 December 2016
  • 7 December 2016

For these periods, chemical reading data for all chemical types were mostly missing, with the following exceptions:

  • 2 August 2016 on Monitor 3, where only data for Appluimonia and Chlorodinine were absent;
  • 7 December 2016 on Monitors 6 and 8 where all data except AGOC-3A were absent, and Monitor 7 where only Chloridinine and Methylosmolene were absent.

The regularity of the occurrences suggests that the sensors may have been systemically shut down for maintenance, although the two exceptions hint that there might be something else going on.

Irregular performance of sensors detecting Methylosmolene

Other than the above-mentioned missing data, data on Methylosmolene were absent on many different dates and hours as can be seen in the visualisation.

Missing wind data

  • A similar examination of wind data shows that like chemical readings, the much of the wind data missing was on the same dates.
  • Unlike the chemical readings, Wind Data are only recorded every 3 hours.
  • Wind data collection appear to be non-operational in the first few days of August, as well as the 2nd last day (30th August) in the 3rd hour.

Screenshot of Missing Chemical Reading Data

Studying the variation with a calendar plot and horizon chart

Calendar plot analysis

Calendar Plot.png
  • As observed from the calendar plot, the readings detected by sensors 3 and 4 are generally much higher than the rest of the sensors, especially for August and December.
  • Readings for sensors 5 and 6 – while lower than 3 and 4 – have much higher variability.
  • The darker patches on the plots of sensors 5, 6, 7, 8 and 9 indicate outlier readings.

Horizon Chart Analysis

Horizon Chart.png
  • Sensors 1, 2, 8 and 9 are mostly in the green zone, with a few fluctuations above the baseline.
  • With the exception of April’s readings, sensor 7 displays similar behaviour as well.
  • From both the horizon chart and the calendar plot, we see that the readings are especially high for sensors 3 and 4. These spikes could be further investigated later to see if they correspond with wind direction.
  • For sensor 4, readings start low in April but get progressively higher each period, with the highest in December.
  • Sensor 6 displays the most irregular performance, as fluctuations in the readings appear below and above the baseline. This may be the result of the location of Sensor 6 being set in the middle of the four factories, which bear further examination later on.
  • In order to get a better sense of any cyclical patterns, we use a cycle plot showing the with the lowest and highest values in each cell labelled. Reference lines and bands were added showing the median, and upper and lower quartiles.

Cycle Plot: Studying seasonality

Cycle Plot.png

The behaviour of sensors 2, 3 shows similar seasonality patterns.

  • The highest readings in April was on Friday where readings rose above their respective upper quartiles.
  • The shape of the curves for the month of August resemble sine curves, where readings are generally low on Sunday, rising, to its first peak on Monday, then dipping on Thursday or Friday and rising to another high on Saturday.
  • The shape of their curves was also similar in December, where readings rose from Sunday to Monday, dipped to a low on Tuesday, and slowly tapered off to the rest of the week.
  • These findings suggest that the same factors may be responsible for the readings recorded by Sensor 2 and 3.
  • The readings of sensor 1 are also roughly similar to sensors 2 and 3 but some minor differences apply – i.e. in April, readings peaked on Saturday instead of Friday, on Tuesday instead of Monday in August, and there was an extra rise in readings on Wednesday in December. These minor differences suggest that other factors may be responsible for these slight deviations in the readings.
  • Other notable groups with similar reading patterns are the sensor-pairs 5-and-9 and 7-and-8 as can be seen from the cycle plot above. They showed noticeable increases and decreases on the same weekday each month suggesting that common factors are responsible for these patterns. This may also be due to the close proximity in which these sensors are to each other.
  • Outlier sensor data detected are those of sensors 4 and 6, although sensor 6’s readings seems to be mirroring both sensor 5’s and 4’s readings – in the sense that the when readings in sensor 4 and 5 rise, sensor 6’s will drop to different extents. Given their respective locations, this may be due to the direction of the wind blowing from certain factories. This hypothesis is supported when observing the Coxcomb chart below – where the wind direction is plotted with the magnitude of the readings, i.e. the highest readings are indicated by largest segments in the respective Coxcomb chart. In the case of sensor 6, the readings tend to be the highest when they are blowing towards a southerly direction.

Strangely enough, the two sensors with the highest readings – i.e. 3 and 4 – do not show much similarities with each other in the cycle plot. However, the coxcomb chart below shows that the similarities in the readings may be the result of the similar wind directions.

Coxcomb sensors.png

Question 2.

Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.

Slope graph comparison

Slopegraphcomparison.png

We turn to the slope graph to get a sense of how the composition of the chemicals contribute to the overall trend observed earlier. Overall trends observed:

  • Generally, AGOC-3A seems to be responsible for the largest fluctuations in the readings, with clear examples as observed by the spike in August for sensors 3 and 5.
  • For sensor 6, the drop in Methylosmolene in August is responsible for the V shape observed in the slope graph.
  • For sensors 3, 4, 7, 8, and 9, the shapes of the slopes while different in magnitude, are generally the same for all chemical types, with 8 being the exception for the tiny spike in Applumonia in August.
  • These findings suggest that AGOC-3A and Methylosmolene are possible main factors responsible for variation in the readings.

Horizon and Coxcomb chart analysis of chemical types

  • A separate horizon chart – factoring chemical types – was constructed to explore this further along with a coxcomb chart that map the magnitude of readings to the size of the segments corresponding with the wind direction.
  • For AGOC-3A, we can see that there is co-occurrence in the readings for sensors 1 and 2 (and parts April and August for sensor 4 as well) based on the shapes of the horizon chart. A quick check at the corresponding coxcomb chart shows that the wind directions remain constant for the most part.
  • The same can be observed for sensors 7 and 8, although their wind directions corresponding with the largest readings are dissimilar.
  • This suggests that the similarities in these two pairs of sensor may be a function of their close proximity to each other and the source of much of the readings they register may be ambient rather than blown from some other location.
  • Despite having the highest average readings, sensor 3 and 4, have mostly very different shapes on the horizon chart, suggesting that their drivers may be different. From the Coxcomb chart, we also observed the wind direction responsible for those readings are different.
  • As previously observed, readings and wind direction data for sensor 6 are pretty much standalone. Given its location, it may be the best candidate for answering Question 3.
AGOC-3a horizon cox.png
  • For Appluimonia, most of the readings seem to be coming from the same the wind direction – i.e. from the northwest to southeast, or from the east to west.
  • Similar observations can be made from the Chlorodinine chart.
Applu horizon cox.png
Chloro horizon cox.png
  • Results from the horizon and coxcomb charts for Methylosmolene resonate with a lot that has been already mentioned. Of note are sensors 3, 4 and 6, where the largest spikes in readings come from a north or north-westerly source. Given that along that direction lies the park roads and gate 7, there’s a high chance that this chemical may be actually coming from vehicles travelling along the park trails rather than the factories. The same can be said for sensor 9 whose large spike in reading in April is corresponds with a wind blowing from the east, where only park trails are seen on the map.
Meth horizon cox.png

Question 3.

Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.

Impact of wind factors on sensor readings

One of the biggest things I’ve noticed when going through the data is how most of the readings don’t seem to be coming from factories themselves. The observation arose from initially observing wind and speed data and their corresponding chemical readings.

Impact of wind speed on readings

Windimpact.png
  • From the charts above, we can see that there is a correlation between slower wind speeds and higher readings. This makes logical sense as higher wind speeds probably blow away chemical particles that would otherwise be detected by the sensors.

Impact of wind direction on readings

  • We can see that for sensors 1, 2, 3, 4, 7, and 8, (i.e. the sensors on the left half of the map surrounding the factories) the wind directions that correspond with the highest readings are similar, mostly coming from the northwest or southeast directions. The correlation between the higher readings and the northwest and westerly winds suggests that the factories on the east side of these sensors may be responsible for these readings.
  • Sensors 5 and 9 have similar looking coxcomb charts, suggesting that the wind directions most responsible for their high readings blow to the southwest, west, and the southeast.
  • Most of sensor 6’s readings come from northwards, blowing towards the south-south-east direction.

Unexpected wind directions

Comparing those readings with the positioning of the sensors relative to the factories and their surrounding environment we draw a few unexpected conclusions.

Mapzoomed.png
  • It is strange that winds blowing towards the southeast have resulted in high readings for sensors 1,2,3,4,7,8 as there are no factories to their north-western side. Given the direction of sensor 6’s data, this mysterious northern source of chemicals also seems to be the one responsible for sensor 6 as well.
  • The same is observed for sensors 5 and 9, that the highest readings come from the side opposite of the factories.

Isolating readings from factories

In order to better observe the relationship between the wind direction and wind data, I decided to aggregate chemical reading data and the wind data at the three-hourly level. More of how this was done can be found under the data preparation section. I derived the directions of the factories to sensors and matched them to the wind direction data, so as to isolate only the reading data that came from the factories.

After separating the dataset by matching the wind direction data to the direction from the factories, I made an interesting discovery:

  • From the factories, Indigo Sol Boards seems to be the biggest culprit in chemical emissions, judging by the readings from sensor 3 and 6.
  • Next in line is Radiance ColourTek, followed by Roadrunner Fitness Electronics and Kasios Furniture.
VMReadings from factories.jpeg

In examining these readings, we have to be careful what we attribute to which factory, especially for the case of sensor 3, as both Radiance ColourTek and Indigo Sol Boards seem to lie roughly along the same direction. Hence, I paid more credence to the readings of sensors more closely located to the factories. This makes the readings of sensor 6 very interesting as it is located right at the centre of the factories. For Roadrunner and Kasio, I paid more attention to readings detected by sensors 2 and 3, given their close proximity and the predominantly north westerly wind featured earlier.

Mapzoomed.png


Of all the factories, Indigo Sol Boards are responsible for the highest volume of readings detected. Also of note is that the highest chemical type emitted is Methylosmolene, as detected by sensor 6, which is worrying as it has highly toxic effects. Even if we deduct the amount of readings detected by sensor 3 to be from Radiance ColourTek (i.e assuming there’s double counting), the reading levels remain substantial.

Indigo Sol Boards’ largest Methylosmolene reading detected is observed to be found in December. Given that Indigo Sol Boards produced snowboards, and if this area is located in the Northern hemisphere, Winter months are in the December period, Methylosmolene may be a chemical that is emitted in the production of these snowboards and other equipment.

VMSlope graph Q3.jpeg

Identifying source of chemicals that do not come from factories

Also, I discovered that most of the readings that were picked up cannot be attributed to the factories, and instead seems to come from the environment - i.e. the park - in more or less equal proportions of chemical types.

I created another coxcomb chart/windrose plot and added a reverse wind rose parameter option so I can reverse the wind rose to see where the source of the chemicals is coming from. I also created a windrose sizing parameter called “Windrose Adjustment” to play with the size of the wind rose so that extremely large values won’t exceed the space of the map.

VMReadings from other places.jpeg

When I layered the coxcomb chart onto the map, the culprit became clear. The source of the chemical readings, excluding the factories might have come from the campsites and the roads, where vehicles travel on.

VMWind blows.jpeg

Discussion

Please feel free to leave comments here.

Hi Vincent,

Nice work and beautiful wind plot!

I have one concern:

When compare the chemical reading of each monitor, would it be more reasonable to use non-aggregated reading data? Using individual reading can also look at it in more specific details like to see the pattern by hour.

Great efforts overall!

Best regards,

Xiaoqing


Hi Vincent,

Well done. You've made good use of the new chart that we have learnt in class such as the slope plot and the horizon charts. The visualizations are clear and easy to understand. To improve clarity it would be possible to add a layout map to explain the relative position of the sensors and factories. I agree with Xiaoqing that when trying to pinpoint a particular emission event it might make more sense to compare the individual reading instead of looking at the combined data.

All the best,

David Ten Kao Yuan

Reference

Below are some works I personally admire and learning from it.

How to convert coordinates to bearings