IS428 AY2018-19T1 Siow Chun Yen Jodie
Contents
Problem & Motivation
Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.
Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health.
Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).
According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).
Objectives
Vast amounts of data have been collected about Sofia City over the past 6 years from 2013 to 2018. However, this usability of this data will only be limited unless the data is effectively visualised, allowing users to glean insights from visuals. A clearer visualisation of the data will allow authorities to better understand the air conditions in Sofia City. The interactive visualisation will aim to help one have an overview of the spatio-temporal analysis of official air quality, spatio-temporal analysis of citizen science air quality measurements as well as main reasons behind the high concentrations of PM2.5 and PM10 in Sofia City.
Dataset Analysis & Transformation Process
4 main data sets were provided for this visualisation, namely EEA Data, Air Tube, METEO-data and TOPO-data which will be further discussed.
Official air quality measurements in 5 stations in the city (EEA Data)
Problem #1 | Number of CSV Files |
---|---|
Issue | Air quality measurements taken over 5 years from 2013 to 2018 were provided in the EEA data set. The air quality data for 2013 to 2016 were recorded daily while the data for 2017 to 2018 were hourly data. However, the data for each year were stored in individual csv files, making it challenging for the data over time to be analysed together using one file. |
Solution | We will merge the relevant csv files into one csv file for convenience and simplicity. In this case, the air quality from 2013 to 2016 will be merged into a csv file to examine daily data while the air quality data from 2017 to 2018 will be merged together to examine data over the different hours of a day in Sofia City. |
1. Firstly, all csv files to be merged must be stored in one folder first. In this case, we create a folder to store 2013-2016 daily data.
2. Using Command Prompt on your computer, we will locate the desired folder location before typing in the command of “copy *.csv 1316daily.csv” to merge all the csv files into one.
3. Returning to the folder location, we will then see a new csv file that is created and ready to be used.
4. Repeat this for the hourly data for 2017-2018.
Problem #2 | Station Names |
---|---|
Issue | The station names were all in strings such as BG0052A as seen in the Air Quality Station EoI Code, making it difficult for users to easily differentiate the stations at a glance. |
Solution | By tracing the Air Quality Station EoI Code from Tableau to the csv file titles, the individual stations can be easily renamed to the station numbers indicated on the csv file titles to facilitate readability of data. |
Problem #3 | Incomplete Data for 5 Stations |
---|---|
Issue | The data provided the various stations covered a period from 2013 - 2016 with the exception of station 9484. This station's data ended abruptly in the middle of 2016, causing there to be a data break. |
Solution | Since the task requires for us to comment on the general trend of air quality in Sofia City, the addition of station 9484 is not imperative. As such, this station's data will be filtered out after it is imported into Tableau. |
Citizen science air quality measurements (Air Tube Data)
Problem #1 | Geohash in AirTube Data |
---|---|
Issue | The geographical location provided by the 2017 and 2018 Air Tube Data were in a geohash format. In order to analyse the geographical data on Tableau, it must be in a format that Tableau is able to analyse, such as using latitude and longitude of a location. |
Solution | In order to decode the geohash, geohash and tidyverse packages in R Studio were employed to retrieve the latitude and longitude of a location in the data files. |
1. First, the required packages will need to be downloaded and installed first.
2. Next, the csv file will have to be imported into R Studio before it can be read and decoded.
3. Then, using the code in R Studio, the geohash tags will be decoded into latitude and longitude coordinates.
4. Following which, we will need to join the latitude and longitude data back into the original csv file before it can then be written back into the working directory.
5. This is to be done for both Air Tube Data in 2017 and 2018 before it can be used to analysis by Tableau.
6. Similarly, the csv files for 2017 Air Tube Data and 2018 Air Tube Data were combined for simplicity before being analysed in Tableau.
Problem #2 | Geographical Data Points on Tableau |
---|---|
Issue | After decoding and importing the data into Tableau, I realised that the data points in the csv file covered geographical areas beyond Sofia City itself. |
Solution | Since we are only focusing on Sofia City's air quality data, only the relevant data points belonging to Sofia City are kept. This is done by selecting data points in Sofia City region and choosing to keep only those points, successfully excluding other data points that are not necessary for this visualisation. |
Interactive Visualisation
The interactive visualisation can be found at here.
Do note that it may take a few moments for Citizen Data Measurement Dashboard to load.
Some techniques are applied throughout the visualisation.
Interactive Technique | Rationale | Brief Implementation Steps |
---|---|---|
Highlighting Tool | Allows for data to be interactive, where points when hovered over or selected will result in the highlighting and selection of other points. This allows users to focus on the few data points without having to filter. | 1. Navigate to Dashboard > Actions > select “highlight”. 2. Configure the source and target sheets for the highlighting of charts in the dashboard. |
EEA Data
Interactive Technique | Rationale | Brief Implementation Steps |
---|---|---|
Date Range Filter | The date range can be changed according to user's needs. This allows users to view the data from any time period that he or she desires between 2013-2016 for a more relevant view. | 1. Adding "Datetime begin" to filters. |
Date Granularity | To increase the interactivity and dynamics of the visualisation, date granularity option was implemented for users to choose between the data granularity visualised. For the 2013-2016 data, data can be viewed by Years, Quarters, Months, Weeks and Days. For the 2017-2018 data, data can be viewed by Weeks, Days and Hours. | 1. Creating custom data fields for the various date granularity 2. Using parameter control as well as calculated fields to toggle between the available date granularity. |
Air Quality Index Filter | To increase the interactivity and dynamics of the visualisation, air quality index option was implemented for users to choose between the air quality index data to be visualised. | 1. Using calculated fields to give a color to each data point collected depending on the air quality index category. |
Official air quality measurements in 5 stations in the city (EEA Data)
Interactive Technique | Rationale | Brief Implementation Steps |
---|---|---|
Total Sensor Points Drill Down | The data regarding number of citizen data sensor points in Sofia City can be drilled down by clicking on the headers, allowing users to view the total data points by Year, Quarter, Month, Day and Hour. | 1. Done by Text Tables option on Tableau. |
Change in Sensor Data Points by Quarters | To see the changes in citizen data sensor points over time, to see if there was a growth or decline in data points as well as the concentration of citizen data points in Sofia City. | 1. Placing "Time" into pages and choosing to filter it by quarters. |
Interactive Technique | Rationale | Brief Implementation Steps |
---|---|---|
Hour of Time Filter | User can choose the time period in visualising the concentration scatterplot and citizen data. This allows user to drill down to observe anormalies that can be observed from the line graph. | 1. Placing "Time" into pages and choosing to filter it by hours. |
Task 1: Spatio-temporal Analysis of Official Air Quality
S/N | Observation | Screenshots |
---|---|---|
1 | Past Situation of Sofia City (2013-2016) Examining the past air quality measures of Sofia City by years, we can see that there was a general decrease in average air concentration from 2013 to 2016. By looking the heat map, we can see that on a very macro level overview that air quality in Sofia City has been considered moderate to good over 2013 to 2016. |
|
2 | Past Situation of Sofia City (2013-2016) Trends are noticed once the data is drilled down towards more granular data that has been collected over time. When examining the data by quarters, it can be observed that the concentration of particles in the atmosphere is highest towards the tail end of each year, averaging around 70ug/m3. The concentration will then fall after January, reaching the lowest in April where air concentration is measured to be around 25ug/m3. Similarly, the heat map shows that air quality when observed over quarters, seem to be healthy. |
|
3 | Past Situation of Sofia City (2013-2016) Zooming into the monthly data shows that air concentration levels peaked in January 2014 and January 2016, at around 130 ug/m3. Interestingly, though the air concertation levels did peak in January 2015, it did not peak as much, maintaining below 100ug/m3. Looking at the heatmap, we can see that air concentration reached unhealthy levels of between 151-200ug/m3 in the January of 2014 and 2016. |
|
4 | Past Situation of Sofia City (2013-2016) Drilling down to weeks, three periods were observed to have air concentrations reaching unhealthy levels beyond 150ug/m3, one of which is on mid December 2013 – early January 2014. From the heat map, it is also observed that air quality at stations 9572 and 9616 have hit unhealthy and very unhealthy levels in the last week of December 2013 to first week of January 2014. From the data, other points that have recorded higher air concentrations are on 14 Dec 2014 at station 9572 (159.9ug/m3) and 17 Jan 2016 at station 9421 (221.4ug/m3). It is also evident that air quality levels drops in December and January yearly. |
|
5 | Past Situation of Sofia City (2013-2016)/Interesting Observations The worst air quality observed in Sofia City was indeed in the period in the last week of December 2013 to first week of January 2014 when we drill down further to observe daily data. Here, we see that air levels were around 400 ug/m3 during the first week of January, and it is also reflected accordingly in the heat map. Having air of 400ug/m3 is also classified as hazardous according t8 EEA. |
|
6 | Interesting Observations An interesting observation is that station 9642 generally has a healthier air reading, where readings are always generally lower, even during this period where all stations are observing readings nearing 400ug/m3, station 9642 readings are at around 300ug/m3. This may suggest that the topography or area of the station may not be close to factors negatively affecting air quality. |
|
7 | Recent Situation (2016-2017) Moving on to recent trends and focusing on the air quality data in 2017 – 2018, we can see that there has been a change in the air quality data when examined by weeks, with generally higher concentration levels, and more frequent occurrences where the average air quality has reached unhealthy levels. Air concentration levels were unhealthily high from end November 2017 to mid-December 2017 as well as January 2018, where a peak of 689.7ug/m3 measurement was recorded on the week of 28 January 2018. |
|
8 | Recent Situation (2016-2017)/Day in Sofia City Focusing on the hourly data for 2017 – 2018, it can be observed that maximum concentrations normally peak in the nights from around 8pm to early morning hours of 2am in Sofia City when you hover over the tooltip. The general trend in air quality data of Sofia City starts out lower in the mornings and evenings, before increasing in the late evening hours till dawn before the readings will be reduced again. |
|
9 | Interesting Observations Using the hourly heat map in 2017 and 2018 data, some interesting trends can be identified. Hazardous air quality indexes were measured for stations 9572 and 9616 during the beginning and end of January 2018. However, the air concentration index at stations 9421 and 9642 were not as pronounced, being in the very unhealthy zone. This may suggest that station 9421 has interesting geographical factors that are worth investigating as well, other than station 9642. |
Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements
Part 1: Citizen Data Sensor Coverage Overview
S/N | Observation |
---|---|
1 | Sensor Data Points in City Looking at Citizen Data Censor Coverage Overview dashboard, and looking specifically at Citizen Data Sensor Points, there has been growth of sensors over the time period from 2017 Q3 to 2018 Q3. The growth of sensors’ coverage was a lot faster in 2017 as compared to 2018. Specifically, the increase in data points from 2017 Q3 to 2018 Q1 were significant, having almost doubled in number of sensors’ coverage in Sofa City. Comparing this against Total Censor Points during this period, it can be observed that there was a significant increase in number of sensor points, increasing by more than 250% from 148,323 in October 2017 to 385,969 in March 2018. |
2 | Sensor Distribution/Coverage Looking at the distribution of sensors in Sofia City, it is evident that sensors are found to be more concentrated in the central parts of Sofia City, as can be seen from the increased intensity in colours from the density map of citizens data point. This trend especially, can be witnessed in 2018 Q1, where several concentrated red points are found in the central area of Sofia City. In the concentrated regions, around 2000 data points were captured hourly. Sensor data away from the centre of Sofia City are mostly scattered as well. Unlike the dense center, sensor data decreases significantly by around 50% as we move away from the city centre and continues to decrease as one moves further away from the centre, reflecting less than 100 measurements per hour in the suburban and more rural parts of Sofia City. In general, the sensors’ coverage in Sofia City are not equally well-distributed over the entire city, and sensors are absent in most suburban parts of Sofia City. As seen, the North-Eastern part as well as Southern part of Sofia City are completely absent from citizen data points. |
3 | Unexpected Behavior of Sensors To understand if the sensors are always functioning properly, a drill down on the records was done after looking at the overall trend from the line graph of records and two unusual observations were made. There was a dip in number of records from 31 March 2018 to 1 April 2018 as well as from 4 and 5 July 2018. Drilling down to the hourly data on Total Sensor Points, it can be observed that number of sensors records dropped from the usual of approximately 500 data measurements per hour to 20 measurements per hour after 2pm on March 30 and this persisted until around 6pm on 1 April. On the line graph tool tip, it can be seen that 708 and 4,028 measurements were taken on these two days compared to the other days, where around 13,000 – 14,000 measurements were captured. Similarly, on 4 July 2018, the sensor records dropped drastically after midnight, registering an all-time low record of 16 readings in the hour, and readings for the day remained low until 6am on 6 July 2018, where it returned to a reading of 600 measurements per hour. On the tool tip of the line graph, only 616 measurements were taken on 4 July 2018, as can be seen below. |
Part 2: Citizen Data Measurement
S/N | Observation |
---|---|
1 | Air Pollution Measurements By looking at the pollutant measurements in Citizen Data Measurement, we can see that pollutant P1 has a higher reading as compared to pollutant P2 throughout the time period of measurement. The readings of P1 pollutant is almost two times more than that of P2 pollutant readings generally. The rate of increase of pollutant P1 seem to be higher than that of pollutant P2 as well, as can be seen from the more than proportionate increase of P1 over P2 during hours where the pollutant levels are high. Like the readings from the EEA data, pollution readings tend to be higher in the late evenings as well. |
2 | Air Pollution Measurements Trends - High Concentrations To facilitate and understand the data better, we first begin by analysing the data during the time periods where there are higher pollutant readings to see if trends can be identified. The highest P1 reading on the line graph - the data on 8 January 2018 at 10pm, is first visualised by the scatter plot and geographical map. By using the interactive highlight action on the dashboard, some trends can be noted. Firstly, when zooming in and focusing on the upper right corner of the scatterplot, highest readings in the data set tend to come from the city centre, with the greatest density right in the centre of Sofia City at around coordinates with latitude of 42.49 and longitude of 22.36. |
3 | Air Pollution Measurements Trends - High Concentrations To determine if this trend is true, the second highest P1 reading on 27 January 2018 at 7pm is also analysed. Similarly, points recorded with the highest P1 and P2 readings are all concentrated in Sofia City centre, at coordinates with latitudes 42.69 and longitude 23.36. This shows that generally, there are higher air concentration levels in the city centre during periods of poor air quality. |
4 | Air Pollution Measurements Trends - Low Concentrations To see if this trend is only true during periods of high concentration readings, two random reading from earlier timings in the day in May 2018 at 5am and 1pm were selected. During these periods where air concentrations are lower, areas with higher air concentrations seem more dispersed towards the Western side of Sofia City. This suggests that in Sofia City, higher air concentrations are normally found in the Central region while lower air concentrations are more dispersed. This also further shows that air concentration data in Sofia City are generally higher at night and lower in the day time as can be seen from the difference in size and shade of the readings. |
Task 3
Part 1: Topography Data on Air Concentration
Screenshot | Observation |
---|---|
|
Topography Patterns Looking at the topography map of Sofia, it can be seen that Sofia City has lowest altitude in the North Eastern part of Sofia City, having an elevation of around 500 and highest elevations at the South Western parts. The highest elevation of 1586 in Sofia City can be found at coordinates with latitude of 42.62 and longitude of 23.26. In the Total Sensor Data, the point with highest overall sensor readings is at coordinates with latitude 42.61 and longitude of 23.34, while the point with the second highest overall sensor reading is at coordinates with latitude 42.67 and longitude of 23.23. It seems that elevation of Sofia City may have an overall effect on sensor data, where areas with higher elevations tend to have higher sensor readings based on the topography data of Sofia City. |
Part 2: Meteorology Data on Air Concentration
Screenshot | Observation |
---|---|
Metereology Patterns - 2013-2016 We first begin by looking at the older data presented from 2013-2016 to identify any patterns between metereological factors and air concentration levels. These observations were made: | |
Metereology Patterns - 2017-2018 Next, we look at the more recent data presented from 2017-2018 to identify any patterns between metereological factors and air concentration levels. Similar observations were made in the data from 2017-2018 in Sofia City, where the relationship of precipitation, wind speed and temperature against air concentration levels remain consistent. |
References
Blended dual axis: https://www.thedataschool.co.uk/robbin-vernooij/create-blended-dual-axis-tableau/
Dynamically changing data level: https://kb.tableau.com/articles/howto/Dynamically-Changing-the-Date-Level
Comments
Any feedback is appreciated 😊 Thank you!