ISSS608 2018-19 T1 Assign Clara Chua Kiah Hwii Task 2

From Visual Analytics and Applications
Jump to navigation Jump to search

Thesmokestac.jpg Sofia: So Smoggy

Overview

Data Preparation

Task 1

Task 2

Task 3


Sensor Coverage

The Citizen Air Quality data AirTube dataset has sensors across all of Bulgaria, with one null point and one point completely out of Europe. We subset the data for Sofia-grad (Sofia City province), which gives us 2.1m datapoints from September 2017 to August 2018.

CC-T2Map1.png

Do we have good coverage?
Overall, we can see that the sensors have are clustered around the city centre of Sofia City province, but not on the outskirts.

CC-T2Coverage.JPG

Are the sensors deployed consistently?
For the purpose of the analysis, we assume that each distinct geohash is equivalent to one sensor. This assumption can be done because there is generally only one reading per time period of each sensor. Using the hexbins to group the sensors in a larger area (by rounding the Latitude and Longitude to 2 decimal places), we can see that the number of sensors have grown over time (from 123 sensors in Sep 2017 to 366 in Aug 2018), and the coverage has also spread. In some months, we see that the sensors are not all consistently deployed in the same area (some areas that had sensor readings, now do not have any).

CC-CoverSensors.gif

Are the sensors recording consistently?
When we plot the number of readings against the day and time, we can see spots of missing data, where there are no readings at all, as well as times when there are low counts of readings, which could mean the sensors might have been faulty or undergone some maintenance during that period.

CC-T2NumSensorReadings.png

The missing data in question covers the following dates and time.

Date & Time Duration (hrs)
9 Oct 2200 – 10 Oct 0700 9
15 Nov 0400 – 0600 2
18 Dec 1900 – 2000 1
20 Dec 0500 – 0700 2

The low counts of reading covers the following dates and time:

Date & Time Duration (hrs)
31 Jan 1300 – 1600 3
30 Mar 1400 – 1 Apr 1000 35
1 Apr 1100 – 1900 8
1 May 0200 – 0900 7
4 July 0000 – 6 July 0600 54
9 July 2000 – 2200 2
10 July 0000 – 0100 1
11 July 2300 – 12 July 0500 6


Are the sensors working correctly?
Analysing the sensor readings, we can see that some sensors may be faulty or miscalibrated as some readings are quite impossible (e.g. temperature readings of -5,574 °C, humidity readings of -207, and PM10 readings of 2,000). There is also missing data in the humidity readings (denoted as -999). The figure below shows the distribution of the various sensor readings, and they show a large range of values for each of the variables.

CC-RangeReadings2.png

There are a number of erroneous readings that we will need to filter out to ensure that our analysis makes sense. The following are a range of plausible values for the variables:

Variable Min Max Source
PM10 0 800 +3 standard deviations of sample mean = 757
PM2.5 0 500 +3 standard deviations of sample mean = 482
Humidity 5% 100% Min / Max of Meteo data
Pressure 98,306 10,513 Min / Max of Meteo data
Temperature -23.8 °C 41°C https://en.wikipedia.org/wiki/Sofia#Climate
Air Pollution Measurements

Effect of Temperature, Humidity and Pressure After filtering out erroneous values, we examine the readings from the citizen dataset. Plotting the various measurements by month, we can examine the effect the average temperature, humidity and pressure have on the air pollution measurements.

CC-MVCorrelations2.png

We can see that the PM10 and PM2.5 readings are highly correlated. This can be explained by the fact that PM2.5 particulates form a subset of PM10 readings. For this reason, we will focus our attention on PM10 readings, to also provide a good comparison to the EEA data from Task 1.

Temperature is slightly (negatively) correlated with PM10 and PM2.5. As temperatures decrease, the PM10 and PM2.5 readings would increase. This is in line with our understanding of the situation on the ground as explained in Task 1 (where citizens would burn biomass or other fuels to keep warm during the winter months). Variations in pressure or humidity does not make a noticeable difference on the PM readings in the citizen data set. However, from the correlation matrix, humidity is partially negatively correlated with temperature – as temperature decreases, the humidity will increase. Therefore, humidity may well play an indirect role in air pollution.

Areas that are worse off

Using the Air Quality Index below (AQI), we classify the PM10 levels and create a new calculated field called ‘Severity’ that allows us to see how different areas fare over time. We focus on the winter months, where the air pollution has higher PM10 readings. Using the same hex-bin maps, we can aggregate sensors in a similar lat-long. We can filter and aggregate by month to see which areas fare worse over time, and identify if there are areas that are worse hit in specific months.

AQI Air Pollution Level Health Implications
0 – 50 Good Air quality is considered satisfactory, and air pollution poses little or no risk
51 – 100 Moderate Air quality is acceptable; however, for some pollutants, there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution
101 – 150 Unhealthy for Sensitive Groups Members of sensitive groups may experience health effects. The general public is not likely to be affected
151 – 200 Unhealthy Everyone may begin to experience health effects; members of sensitive groups may experience more serious health effects
201 – 300 Very Unhealthy Health warnings of emergency conditions. The entire population is more likely to be affected.
300+ Hazardous Health alert: everyone may experience more serious health effects.


We can see that the area northeast and northwest of the city centre is relatively worse off on an average over the winter months where the air pollution is in the unhealthy range. In the spring and summer, the readings are mostly in the good to moderate range.

CC-WorseNeighbourhoods.png

We can identify 3 distinct areas (encircled in the charts) that are typically one level higher than the other areas for each month. We can see if we can identify any reasons for the increased air pollution in those areas.

CC-WorseNeighbourhoods2.png


Are differences time dependent?

Yes they are! We can analyse the differences in areas across the different hours over a sample month. We use December as there is more variation in the PM10 readings, and we can easily see which areas are worse hit.

CC-TimeDiff.png

PM10 and PM2.5 concentrations over time

The following visualisation shows how the PM10 and PM2.5 concentrations change over time as a summary of what has been discussed.

CC-AirPollutionTime.gif