ISSS608 2016-17 T3 Assign ASMIT ADGAONKAR
|
|
|
|
|
TABLEAU PUBLIC LINKS
https://public.tableau.com/profile/asmit.adgaonkar#!/vizhome/AsmitAdgaonkar_VASTChallenge2017_MC2/Story1?publish=yes https://public.tableau.com/profile/asmit.adgaonkar#!/vizhome/AsmitAdgaonkar_VASTChallenge2017_MC2_Q3/Story1
Question1
MC2.1 – Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviours of the sensors through analysing the readings they capture? Limit your response to no more than 9 images and 1000 words
Starting with some basic exploration of the data provided, a boxplot of the hourly number of readings for each chemical each month reveals a pattern in the aberration of sensor performance and operation.
It can be seen that Monitor# 5 has failed to collect readings for Methylosmolene (one of the most harmful chemicals being captured by the sensors) at 12 pm in April, 6 am in August and at 8 am in December. In contrast, the monitor has actually gone on to record those numbers and assign it to AGOC-3A (relatively less harmful) chemical. A further investigation into this matter can be demanded to analyze the sensor logs and understand the manner in which sensors are designed to collect readings. Wind speed and wind direction can also be looked at these times to see if that has any influence on this flaw of sensor operation. Last but not the least, this particular behavior of showing overweight on AGOC and underweight on Methylosmolene should also portrayed on the big picture of whether the companies are any reason behind the tampering of the readings. Are these companies tampering the sensor to stay in compliance of the pollutant readings? After all, Methylosmolene is a chemical the environmentalists cannot miss out !
Similarly, it can be seen that Monitor#1 has failed to capture readings at midnight for all the chemicals for atleast 2-3 days in each of the months. It would be worth investigating if this is due to some sort of maintenance activity for the sensors or is there a genuine flaw in the operations there. If it turns out to be a maintenance activity, then the sensor administrators should help to explain the reasoning behind deciding on those dates.
On similar lines, a quick bubble chart for each of the 3 months involving the readings of all the monitors tell us that readings/pollutants have increased gradually from April to December. However, special mention goes to sensor #4 which has increased significantly with the passage of time. It would be worth investigating if sensor 4 had any issues in April or whether it is largely influenced by wind speed and direction flowing in the line of the sensor.
2. The below dashboard was built to figure out any significant patterns or dependency of wind speed category on the readings recorded by the sensors.
It can be inferred that pretty much all the sensors are following a similar pattern for the wind speed categories and there are no major differences leading up to any specific sensor pattern. Light Air and Light Breeze have each shown to contribute about 30-40% of the readings during day time (6 am to 6pm), whereas during night time (6pm to 6 am), gentle breeze seems to have increased its contribution to the readings for all sensors and for all chemicals, making light air, light breeze and gentle breeze as the major contributors for the readings.
Re-visiting the significant increase in readings of monitor 4, and plotting the aggregated direction of wind for each of the months, it can be seen that there was considerable wind activity (speed + direction) in the line of sensor 4. It can be inferred that the wind was strong enough to blow away the chemical readings on sensor 4. Sensors can therefore be termed susceptible to wind activity and we should not ignore it in our future analysis.
A similar plot for the month of August and December help to support our findings about the influence of speed of wind and direction in the line of sensor 4 affecting its readings.
3. Performance of sensors should also be dependent on the distance of the factories from each of the sensors. Based on the information provided that the map of 200*200 corresponds to 12*12 miles, we can calculate the distance between each sensor and each factory using the latitude-longitude points provided in the dataset.
Distance (in miles) of each sensor from each factory
It can be concluded from the distance matrix that sensor 6 is the closest sensor for all the factories and sensor 9 is farthest for Kasio and RoadRunner, whereas sensor 1 is farthest for Radiance and Indigo. By way of a quick insight, a combined average proximity of each sensor from the factories was computed and utilized to see if we come across a pattern in the sensor performance and operations. Below clustered tree map shows sensors split into 3 clusters based on their proximity to the factories. It can be inferred that the sensor 6 is clearly the most closest to the factories whereas average proximity of sensor 5,9,7,4 fall in the same cluster/distance and sensor 3,8,2,1 fall in the last cluster of sensors that can be considered as sensors in the far reach of the factories.
A general hypothesis can be framed to expect that the sensors in the close proximity would show higher pollutant readings.
However, building a similar tree map for the readings of the sensors reveal that sensor 5 which considerably falls in close proximity of the factories demonstrates quite low readings in the month of April.
The contribution of sensor 5 somehow picks up a bit in the month of the August and December, however given that sensor 5 has already been picked by us in point 1 of the above paper, it is clearly a matter to investigate. Further segregation of sensor 5 contribution by day and night time reveals that the readings falls even low during night time and night time is clearly the time where methylosmolene ( one of the most harmful chemicals of the lot) is getting detected. Is this tampering of the monitor?
With the help of the above dashboard and animation, we can get a good understanding of how the sensors are performing in consideration of the wind speed and the direction that is persisting at that point in time.
It can be inferred that the wind direction has largely been in the north-west direction and hence sensor 3 and 4 have managed to show readings on the higher side, with the exception of senor 4 readings for the month of April where the wind was too strong for it to capture sizable readings.
On the other hand, even when the wind was flowing in the north/north east direction, sensor 5 and sensor 9 have repeatedly shown lower readings and hence demand further investigation.
Question2
MC2.2 Now turn your attention to the chemicals themselves. Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?
It can be seen that all the chemicals are largely captured during their day times or business hours (6 am to 6pm), however Methylosmolene exhibits a different pattern by which it gets captured largely during night hours (6pm to 6 am). It would be worth investigating if this pattern of Methylosmolene combined with its night time exhibition is influenced by the sensors capability to capture that chemical when there is no sunlight or is it being released by the factories specifically at night time. This supplements our earlier doubt on the maintenance timings/ no readings recorded situation that have occurred at midnight hours twice a month.
An horizon plot for the chemicals readings on an hourly basis reaffirms the case of Methylosmolene being exhibited more during 6pm to 6 am window.
In contrast, an horizon plot for AGOC-3A shows that the chemical gets recorded largely during the day time.
A similar horizon plot for the chemicals by days of the month reveals that AGOC-3A is largely getting emitted middle of the month (say 13-16th of each month). It clearly demands further investigation on why this pattern is coming across. Has it got to do with the factories performing certain maintenance activity or goods processing cycle that causes this? Can factories be advised to alter their activity to try and balance the emissions across weeks or days to have the chemical reading fall well within an acceptable range ?
The below dashboard helps us understand sudden spikes for each of the chemicals filtered on a monthly basis. It is once again emphasized here that AGOC-3A is largely a day time chemical emissions as opposed to Methylosmolene which gets emitted during night time. Based on the Area chart shown below, we can initiate a further investigation to see if the spikes seen on specific days/hours are due to some sort of leak happening in the factories. The associated treemap helps to gauge the sensor responsible for capturing the spikes in emission.
The above calendar map helps us understand the extent of pollutants that exists in the park as a whole. The calendar map has been built to communicate the moving average of the last 3 hours (0,-1,-2) of that chemical for that day, giving us a better insight on the quality of air that exists in the park. Besides, it helps to reveal the times (2nd/6th April, 4th/7th August,2nd Dec at 12:00 AM) at which no reading was recorded by the sensors at all. This clearly demands further investigation to understand whether these were maintenance periods managed by the sensor administrators, and if these were indeed maintenance periods then it would be good to understand the approach that was followed to arrive at those specific dates. Based on the data provided and with the help of the above calendar map or area charts of the readings for the chemicals, it can be observed that the last couple of days of every month have relatively been quiet periods for pollutant readings. It would be worth suggesting the sensor administrators to see if these days can be utilized for any required downtime for the sensors. In addition, the hour of the downtime can be moved to sometime in the start of the evening as we wouldn’t want to miss out on methylosmolene(a very harmful chemical) readings whose activity increases as the night progresses and midnight is pretty much a peak time for it get captured.
Question3
MC2.3 – Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.
A tableau dashboard featuring air plumes at each of the sensor points was built to depict the direction and speed of the wind coming from 16 directions. The length of these plumes was made to vary(directly proportional) with the speed of the wind and the plumes/polygons thus formed help to understand the reach of the factories/chemicals onto the sensors. The 3 hourly meteorological data helps to give us a good mapping of the direction and speed of wind with the chemical readings captured by each of the sensors. Any plume(polygon) overlapping on factories coordinates can be considered to be a possible indication of the factory being responsible for releasing the pollutant.
For Instance, the below visualization captured for 18th of Dec at 9 am indicates Indigo releasing high amount of Appluimonia, and RoadRunner and/or Kasio causing AGOC-3A and Chlorodinine emissions. RoadRunner and Kasio are the two factories whose co-ordinates are pretty close to each other and it certainly can get difficult at times to pin point any of these two companies specifically.
The below capture on the 5th of December helps support our finding for Indigo as a possible culprit emitting Appluimonia and RoadRunner and/or Kasio causing Chlorodinine emissions.
Below visualization captured for 23rd of August at 6 am help to provide leads about Kasio to be the cause of AGOC-3A emissions.
Below visualization captured for 22nd of April at 9 am covers the RoadRunner site and can be adjudged a possible cause of AGOC-3A emission, however considering the plume in our case is just a depiction of the wind's reach and speed, the angle of the plume when altered can easily go on to include Kasio and hence Kasio cannot be ruled out from getting mapped AGOC-3A high reading here. Given the band and angle of monitor 9 can vary to some extent, it is not yet conclusive for us to pin point one of the factories specifically.
Below capture of 13th December 9 am shows Roadrunner to be the cause of high Chlorodinine reading captured on monitor 6
Below capture of 27th April 12 am shows Roadrunner to be a possible cause of high Chlorodinine reading captured on monitor 6 once we increase the angle of the plume.
In the below screenshot, Roadrunner and/or Kasio can be investigated for the high Methylosmolene readings seen at monitor 6
Likewise, Indigo can be seen a possible culprit for the high Methylosmolene reading at monitor 9 below :
Here, Roadrunner and/or Kasio are the cause of high Methylosmolene seen captured at monitor 6
In sum, here is a summarized table featuring the chemicals being mapped to corresponding factories at specific times:
It has helped us to narrow down the investigations as below:
- Appluimonia is largely being emitted by Indigo.
- Chlorodinine is being emitted by RoadRunner.
- Methylosmolene is largely being emitted by Kasio, however RoadRunner can also be investigated for its Methylosmolene emissions.
- AGOC-3A is largely emitted by Kasio and RoadRunner.