ISSS608 2018-19 T1 Assign Clara Chua Kiah Hwii Task 2
|
|
|
|
|
The Citizen Air Quality data AirTube dataset has sensors across all of Bulgaria, with one null point and one point completely out of Europe. We subset the data for Sofia-grad (Sofia City province), which gives us 2.1m datapoints from September 2017 to August 2018.
Do we have good coverage?
Overall, we can see that the sensors have are clustered around the city centre of Sofia City province, but not on the outskirts.
Are the sensors deployed consistently?
For the purpose of the analysis, we assume that each distinct geohash is equivalent to one sensor. This assumption can be done because there is generally only one reading per time period of each sensor. Using the hexbins to group the sensors in a larger area (by rounding the Latitude and Longitude to 2 decimal places), we can see that the number of sensors have grown over time (from 123 sensors in Sep 2017 to 366 in Aug 2018), and the coverage has also spread. In some months, we see that the sensors are not all consistently deployed in the same area (some areas that had sensor readings, now do not have any).
Are the sensors recording consistently?
When we plot the number of readings against the day and time, we can see spots of missing data, where there are no readings at all, as well as times when there are low counts of readings, which could mean the sensors might have been faulty or undergone some maintenance during that period.
The missing data in question covers the following dates and time.
Date & Time | Duration (hrs) |
---|---|
9 Oct 2200 – 10 Oct 0700 | 9 |
15 Nov 0400 – 0600 | 2 |
18 Dec 1900 – 2000 | 1 |
20 Dec 0500 – 0700 | 2 |
The low counts of reading covers the following dates and time:
Date & Time | Duration (hrs) |
---|---|
31 Jan 1300 – 1600 | 3 |
30 Mar 1400 – 1 Apr 1000 | 35 |
1 Apr 1100 – 1900 | 8 |
1 May 0200 – 0900 | 7 |
4 July 0000 – 6 July 0600 | 54 |
9 July 2000 – 2200 | 2 |
10 July 0000 – 0100 | 1 |
11 July 2300 – 12 July 0500 | 6 |
Are the sensors working correctly?
Analysing the sensor readings, we can see that some sensors may be faulty or miscalibrated as some readings are quite impossible (e.g. temperature readings of -5,574 °C, humidity readings of -207, and PM10 readings of 2,000). There is also missing data in the humidity readings (denoted as -999). The figure below shows the distribution of the various sensor readings, and they show a large range of values for each of the variables.
There are a number of erroneous readings that we will need to filter out to ensure that our analysis makes sense. The following are a range of plausible values for the variables:
Variable | Min | Max | Source |
---|---|---|---|
PM10 | 0 | 800 | +3 standard deviations of sample mean = 757 |
PM2.5 | 0 | 500 | +3 standard deviations of sample mean = 482 |
Humidity | 5% | 100% | Min / Max of Meteo data |
Pressure | 98,306 | 10,513 | Min / Max of Meteo data |
Temperature | -23.8 °C | 41°C | https://en.wikipedia.org/wiki/Sofia#Climate |
Effect of Temperature, Humidity and Pressure After filtering out erroneous values, we examine the readings from the citizen dataset. Plotting the various measurements by month, we can examine the effect the average temperature, humidity and pressure have on the air pollution measurements.
We can see that the PM10 and PM2.5 readings are highly correlated. This can be explained by the fact that PM2.5 particulates form a subset of PM10 readings. For this reason, we will focus our attention on PM10 readings, to also provide a good comparison to the EEA data from Task 1.
Temperature is slightly (negatively) correlated with PM10 and PM2.5. As temperatures decrease, the PM10 and PM2.5 readings would increase. This is in line with our understanding of the situation on the ground as explained in Task 1 (where citizens would burn biomass or other fuels to keep warm during the winter months). Variations in pressure or humidity does not make a noticeable difference on the PM readings in the citizen data set. However, from the correlation matrix, humidity is partially negatively correlated with temperature – as temperature decreases, the humidity will increase. Therefore, humidity may well play an indirect role in air pollution.
Areas that are worse off
Using the Air Quality Index below (AQI), we classify the PM10 levels and create a new calculated field called ‘Severity’ that allows us to see how different areas fare over time. We focus on the winter months, where the air pollution has higher PM10 readings. Using the same hex-bin maps, we can aggregate sensors in a similar lat-long. We can filter and aggregate by month to see which areas fare worse over time, and identify if there are areas that are worse hit in specific months.
AQI | Air Pollution Level | Health Implications |
---|---|---|
0 – 50 | Good | Air quality is considered satisfactory, and air pollution poses little or no risk |
51 – 100 | Moderate | Air quality is acceptable; however, for some pollutants, there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution |
101 – 150 | Unhealthy for Sensitive Groups | Members of sensitive groups may experience health effects. The general public is not likely to be affected |
151 – 200 | Unhealthy | Everyone may begin to experience health effects; members of sensitive groups may experience more serious health effects |
201 – 300 | Very Unhealthy | Health warnings of emergency conditions. The entire population is more likely to be affected. |
300+ | Hazardous | Health alert: everyone may experience more serious health effects. |
We can see that the area northeast and northwest of the city centre is relatively worse off on an average over the winter months where the air pollution is in the unhealthy range. In the spring and summer, the readings are mostly in the good to moderate range.
We can identify 3 distinct areas (encircled in the charts) that are typically one level higher than the other areas for each month. We can see if we can identify any reasons for the increased air pollution in those areas.
Are differences time dependent?
Yes they are! We can analyse the differences in areas across the different hours over a sample month. We use December as there is more variation in the PM10 readings, and we can easily see which areas are worse hit.
PM10 and PM2.5 concentrations over time
The following visualisation shows how the PM10 and PM2.5 concentrations change over time as a summary of what has been discussed.