ISSS608 2016-17 T3 Assign MACK ZHI WEI VINCENT
Contents
Question Attempted
Mini-Challenge 2
Objectives
Insert text here
Question 1
Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.
Overall trend analysis from Slope Graph
- The slope graph above shows us that the sensors can be categorised into three groups. The first group – sensors 2 and 3 – have readings that start low in April, peak in August, and drops back low in December.
- The second group – sensors 6, 7 and 8 – have readings that start high in April, dip in August, and go back high in December.
- The final group – sensors 1, 4, 5 and 9 – have readings that start at the lowest value in April, grow higher in August and is the highest in December. Readings for sensor 4 have the highest growth rate, followed by 5 and 9, with 1 at the slowest growth rate.
Are Sensors working all the time?
There are some occasions where the data is not available.
Regular disappearances. Maintenance?
In the dataset, it is observed that on the sensors have no data at 12am on the following dates:
- 2 April 2016
- 4 April 2016
- 2 August 2016
- 4 August 2016
- 7 August 2016
- 2 December 2016
- 7 December 2016
For these periods, chemical reading data for all chemical types were mostly missing, with the following exceptions:
- 2 August 2016 on Monitor 3, where only data for Appluimonia and Chlorodinine were absent;
- 7 December 2016 on Monitors 6 and 8 where all data except AGOC-3A were absent, and Monitor 7 where only Chloridinine and Methylosmolene were absent.
The regularity of the occurrences suggests that the sensors may have been systemically shut down for maintenance, although the two exceptions hint that there might be something else going on.
Irregular performance of sensors detecting Methylosmolene
Other than the above-mentioned missing data, data on Methylosmolene were absent on many different dates and hours as can be seen in the visualisation.
Missing wind data
A similar examination of wind data shows that like chemical readings, the much of the wind data missing was on the same dates. Unlike the chemical readings, Wind Data are only recorded every 3 hours. Also, wind data collection appear to be non-operational in the first few days of August, as well as the 2nd last day (30th August) in the 3rd hour.
Studying the variation with a calendar plot and horizon chart
Calendar plot analysis
- The Calendar plot above provides an overview to the sensor dataset.
- It tells us that relative to the rest of the sensors, the readings detected by sensors 3 and 4 are generally much higher than the rest of the sensors.
- For sensor 4, this was especially so for the August and December readings.
- For sensors 5 and 6 – while readings are not as high as 3 and 4 – have much higher variability in the readings.
- This calendar plot allows us to detect outlier readings judging by the darker patches on the plots of sensors 5, 6, 7, 8 and 9.
- However, the calendar plot is limited. We will need to rely on other visualisation methods to get a better sense of the data.
Horizon Chart Analysis
- The horizon chart of the sensors provides clearer indication on the performance of the sensors. With the global average reading as the baseline, the horizon chart measures the difference between the average reading per monitor compared to the global average. Readings that fall below the baseline, get coloured as green, while those above the baseline are coloured yellow. Higher readings (i.e. those more than 1 correspond with darker hues of red.
- Sensors 1, 2, 8 and 9 are mostly in the green zone, with a few fluctuations above the baseline.
- With the exception of April’s readings, sensor 7 displays similar behaviour as well.
- From both the horizon chart and the calendar plot, we see that the readings are especially high for sensors 3 and 4. These spikes could be further investigated later to see if they correspond with wind direction.
- For sensor 4, readings start low in April but get progressively higher each period, with the highest in December.
- Sensor 6 displays the most irregular performance, as fluctuations in the readings appear below and above the baseline. This may be the result of the location of Sensor 6 being set in the middle of the four factories, which bear further examination later on.
- In order to get a better sense of any cyclical patterns, we use a cycle plot showing the with the lowest and highest values in each cell labelled. Reference lines and bands were added showing the median, and upper and lower quartiles.
Cycle Plot: Studying seasonality
The behaviour of sensors 2, 3 shows similar seasonality patterns.
- The highest readings in April was on Friday where readings rose above their respective upper quartiles.
- The shape of the curves for the month of August resemble sine curves, where readings are generally low on Sunday, rising, to its first peak on Monday, then dipping on Thursday or Friday and rising to another high on Saturday.
- The shape of their curves was also similar in December, where readings rose from Sunday to Monday, dipped to a low on Tuesday, and slowly tapered off to the rest of the week.
- These findings suggest that the same factors may be responsible for the readings recorded by Sensor 2 and 3.
- The readings of sensor 1 are also roughly similar to sensors 2 and 3 but some minor differences apply – i.e. in April, readings peaked on Saturday instead of Friday, on Tuesday instead of Monday in August, and there was an extra rise in readings on Wednesday in December. These minor differences suggest that other factors may be responsible for these slight deviations in the readings.
- Other notable groups with similar reading patterns are the sensor-pairs 5-and-9 and 7-and-8 as can be seen from the cycle plot above. They showed noticeable increases and decreases on the same weekday each month suggesting that common factors are responsible for these patterns. This may also be due to the close proximity in which these sensors are to each other.
- Outlier sensor data detected are those of sensors 4 and 6, although sensor 6’s readings seems to be mirroring both sensor 5’s and 4’s readings – in the sense that the when readings in sensor 4 and 5 rise, sensor 6’s will drop to different extents. Given their respective locations, this may be due to the direction of the wind blowing from certain factories. This hypothesis is supported when observing the Coxcomb chart below – where the wind direction is plotted with the magnitude of the readings, i.e. the highest readings are indicated by largest segments in the respective Coxcomb chart. In the case of sensor 6, the readings tend to be the highest when they are blowing towards a southerly direction.
Strangely enough, the two sensors with the highest readings – i.e. 3 and 4 – do not show much similarities with each other in the cycle plot. However, the coxcomb chart below shows that the similarities in the readings may be the result of the similar wind directions.
Question 2.
Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data? Limit your response to no more than 6 images and 500 words.
Slope graph comparison
We turn to the slope graph to get a sense of how the composition of the chemicals contribute to the overall trend observed earlier. Overall trends observed:
- Generally, AGOC-3A seems to be responsible for the largest fluctuations in the readings, with clear examples as observed by the spike in August for sensors 3 and 5.
- For sensor 6, the drop in Methylosmolene in August is responsible for the V shape observed in the slope graph.
- It is interesting to note that for sensors 3, 4, 7, 8, and 9, the shapes of the slopes while different in magnitude, are generally the same for all chemical types, with 8 being the exception for the tiny spike in Applumonia in August.
- These findings suggest that AGOC-3A and Methylosmolene are possible main factors responsible for variation in the readings.
Horizon and Coxcomb chart analysis of chemical types
- A separate horizon chart – factoring chemical types – was constructed to explore this further along with a coxcomb chart that map the magnitude of readings to the size of the segments corresponding with the wind direction.
- For AGOC-3A, we can see that there is co-occurrence in the readings for sensors 1 and 2 (and parts April and August for sensor 4 as well) based on the shapes of the horizon chart. A quick check at the corresponding coxcomb chart shows that the wind directions remain constant for the most part.
- The same can be observed for sensors 7 and 8, although their wind directions corresponding with the largest readings are dissimilar.
- This suggests that the similarities in these two pairs of sensor may be a function of their close proximity to each other and the source of much of the readings they register may be ambient rather than blown from some other location.
- Despite being the sensors with highest average readings, sensor 3 and 4 for the most part, have very different shapes on the horizon chart, suggesting that the drivers for the readings detected by these two sensors may be different. This is further reinforced by the findings from the Coxcomb chart that shows that the wind direction responsible for those readings are different.
- As previously observed, readings and wind direction data for sensor 6 are pretty much standalone. Given its location, it may be the best candidate for answering Question 3.
- For Appluimonia, what shouts out to me isn’t so much the shape of the horizon chart but rather that most of the readings seem to be coming from the same the wind direction – i.e. from the northwest to southeast, or from the east to west.
- Similar observations can be made from the Chlorodinine chart.
- Results from the horizon and coxcomb charts for Methylosmolene resonate with a lot that has been already mentioned. Of note is that for sensors 3, 4 and 6, the largest spikes in readings come from a north or north-westerly source. Given that along that direction lies the park roads and gate 7, there’s a high chance that this chemical may be actually coming from vehicles travelling along the park trails rather than the factories. The same can be said for sensor 9 whose large spike in reading in April is corresponds with a wind blowing from the east, where only park trails are seen on the map.
Question 3.
Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data. Limit your response to no more than 8 images and 1000 words.
Impact of wind factors on sensor readings
Impact of wind speed on readings
- From the charts above, we can see that there is a correlation between slower wind speeds and higher readings. This makes logical sense as higher wind speeds probably blow away chemical particles that would otherwise be detected by the sensors.
Impact of wind direction on readings
- We can see that for sensors 1, 2, 3, 4, 7, and 8, (i.e. the sensors on the left half of the map surrounding the factories) the wind directions that correspond with the highest readings are similar, mostly coming from the northwest or southeast directions. The correlation between the higher readings and the northwest and westerly winds suggests that the factories on the east side of these sensors may be responsible for these readings.
- Sensors 5 and 9 have similar looking coxcomb charts, suggesting that the wind directions most responsible for their high readings blow to the southwest, west, and the southeast.
- Most of sensor 6’s readings come from northwards, blowing towards the south-south-east direction.
Unexpected findings
- However, it is strange that winds blowing towards the southeast have resulted in high readings for sensors 1,2,3,4,7,8 as there are no factories to their north-western side. Given the direction of sensor 6’s data, this mysterious northern source of chemicals also seems to be the one responsible for sensor 6 as well.
- The same is observed for sensors 5 and 9, that the highest readings come from the side opposite of the factories.
Intro
After separating the dataset by matching the wind direction data to the direction from the factories, I made an interesting discovery:
- From the factories, Indigo Sol Boards seems to be the biggest culprit in chemical emissions, judging by the readings from sensor 3 and 6.
- Next in line is Radiance ColourTek, followed by Roadrunner Fitness Electronics and Kasios Furniture.
Also, I discovered that most of the readings that were picked up cannot be attributed to the factories, and instead seems to come from the environment - i.e. the park - in more or less equal amounts of chemicals.
When I layered the coxcomb chart onto the map, the culprit became clear. The source of the chemical readings, excluding the factories might have come from the campsites and the roads, where vehicles travel on.
Discussion
Reference
Below are some works I personally admire and learning from it.