ISSS608 2018-19 T1 Assign HyderAli Task 2 Insights
|
|
|
|
The Citizen Science Air Quality dataset was utilized for the study of the measurement data by the citizens for both years 2017 and 2018. The dataset is historical P1/P2 data obtained from the citizens which include P1 and P2 together with observation data such as temperature, humidity and pressure. Upon consolidation of the Citizen Science Air Quality data, we proceed to decode the geohashes into latitude/longitude pairs resulting in 3,610,146 measurements.
Characterize the sensors' coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly all the times? Can you detect any unexpected behaviours of the sensors through analyzing the readings they capture? Limit your response to no more than 4 images and 600 words.
Are the Air Quality sensors well-distributed over the entire city?
Emergence of small-scale air quality sensors has led to a significant shift in the approach to measuring air quality beyond those affected by traditional methods that use large, stationary and expensive analyzers. These sensors are usually small and portable, providing data in near real-time at relatively lower costs and thus allowing air quality to be measured with unprecedented temporal and spatial resolution. Heat map density plot of Citizen measurement data across Bulgaria shows that most of the measurements only lie around the major cities/towns such as Sofia, Pernik, Blagoevgrad and Plovdiv between 2017 to 2018 with the highest concentration at Sofia city.
As we proceed to aggregate the spatial point data into square regions of an approximate binning size of 11.1 km, it then becomes clear that the citizen measurements are unevenly distributed.This is because most measurements are densely packed and saturated at particular locations of Sofia city. As these locations are highly populated areas relative to the other non-densely packed regions, thus there is high possibility of higher number of citizen measurements at these regions.
Do these sensors all work properly all the times? Is there any unexpected behaviors of the sensors through analyzing the readings they capture?
The citizen measurements were most likely recorded by local citizens/groups with their own testing kits, hence there lies high possibility that these measurements may not be accurate and could even paint a very inaccurate picture of Air Pollution in Sofia. As the dataset is very large, we proceed to sample 1% of the dataset comprising some 300K rows. Quick analysis of distribution plot shows that there is a significant number of outliers (denoted by the red circles) in P1, P2 and Temperature data: P1 readings ~ 2000, P2 readings ~ 1000 and Temperature < -400K. This indicates that the sensors are not working properly at all times due to some proportion of inaccurate measurements and thus should be excluded from the analysis to ensure consistency in the findings.
Part Two
Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences dependent? Limit your response to no more than 6 images and 800 words.