ISSS608 2018-19 T1 Assign Oh Zhen Yao Matthias Task 2

From Visual Analytics and Applications
Jump to navigation Jump to search

Deadly pollution.jpg Sofia City: Air Quality Analysis

Overview

Task 1

Task 2

Task 3


I. Task Summary

Scope:

  • PM10 (assumed to be P1 variable in the dataset) pollution measurements across Sofia City from 2017-2018, recorded using Citizen Science sensors.


Objectives:

  • Characterize the sensors’ coverage, performance and operation.
  • Investigate if some parts of the city show relatively higher air pollution measurements than others. Further, investigate if these differences time dependent.

II. Insights

1. Sensors

1. Sensors are nucleated at the city centre:

T2_CS Sensor Distr.
  • There is a better horizontal than vertical coverage, with sparse sensors distributed across the east to the west, and no sensors at the northern and southern ends of the city.
  • The distribution of sensors will result in PM10 readings that may not be representative of air pollution measures at the northern and southern parts of Sofia City.

2. There were 6 days identified with noticeable decreases in 'p1' record counts:

3dNo. Reading by Time
  • There could be potential faults with the Citizen Science sensors, that could affect the collection and analysis of time-series air pollution measurements.

2. P1 (PM10) Pollution Readings

1. There are 25,519 outliers with >360μg/m3 PM10 readings (the maximum observed average PM10 reading from the official EEA data).

Univariate Outliers

2. Average hourly PM10 readings seem to be better the further away the sensors are from the city center.

None
  • Note that the colour scale has a lower bound = 0μg/m3, center = 180μg/m3, and upper bound = 360μg/m3.
  • Hexa-binning was used to provide a better, less-granular overview of the PM10 readings over the different hours of a day.
  • This is a GIF with poor resolution. To access a clearer and interactive version of the full animation across all the hours in a day, please refer to Matthias' Tableau Public.

3. Monthly PM10 readings seem to peak from November to February:

Monthly PM10
  • Similar to the EEA data used in Task 1, this could be due to increased power consumption for heating in winter.

4. Hourly PM10 readings seem to be better in the middle of the day from 11:00 to 14:00:

Hourly PM10
  • This could be due to temperatures peaking with the noon sun, thus reducing the need for heating during winter, which in turn reduces the overall PM10 hourly averages for the middle of the day through the years.
  • This could also be due to the middle of the day being the duration of a day where people are most active outdoors, reducing power consumption and hence PM10 hourly averages.

III. Critique

Based on the challenges faced in addressing this task with the underlying tool, Tableau, there are several points of critique:
1. Tableau is memory (RAM) intensive and could not handle the large size of the dataset without incessant lagging.
2. Tableau did not support the original geocode type in the dataset, and necessitated the usage of R for decoding into longtitude and latitude values.
3. No hexagon polygons were available for hexa-binning. External sources had to be used to download this polygon.