IS428 AY2018-19T1 Ng Wei En
To be a Visual Detective
Contents
Preamble
Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.
Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health.
Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).
According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).
Problem & Motivation
Measurements of airborne particulate matter(PM) can be divided into PM2.5 and PM10. PM2.5 and PM10 refers to airborne particulate matter that have a diameter of less than 2.5 micrometres and 10 micrometres respectively.
It is without a doubt that the poor air quality Bulgaria is a great cause for concern, with measurements of PM2.5 and PM10 collected from air quality stations indicating that they are at much higher levels above the thresholds stipulated by the EU and World Health Organisation(WHO) to protect health.
Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).
With the huge amount of data collected from air quality monitoring systems and weather stations all across Bulgaria including Sofia City, there is a need to make sense of these data by building an interactive data visualisation tool to better identify the patterns and trends in measurements collected over a period of time. This is crucial in helping identify the distribution of air pollution in Bulgaria and the tracking down the main sources of air pollution which contribute to the worsening air quality. It would be useful to find out correlation between air quality and other metrological data such as temperature, humidity as well as topographical data like elevation.
With such a huge amount of dataset, there is a pressing need to further investigate the source of air pollution, any noticeable patterns or trends in air quality as well as uncover the possible causes relating to metreologoy or topography.
Dataset Analysis & Transformation Process
As with any given dataset, there is a need to scrutinise the format and attributes of each column in each dataset. The necessary data transformation and data preparation steps would need to be taken to aid the importing of dataset for data visualisation to be based on. Elaboration on the data transformation process for each dataset will be set out below:
Official air quality measurements (5 stations in the city, EEA Data.zip) Issue 1: Incomplete dataset Description: The files contained in EEA Data.zip file can be categorised below:
SPO-BG0040A
- BG_5_9642_2013_timeseries.csv
- BG_5_9642_2014_timeseries.csv
- BG_5_9642_2015_timeseries.csv
- BG_5_9642_2016_timeseries.csv
- BG_5_9642_2017_timeseries.csv
- BG_5_9642_2018_timeseries.csv
SPO-BG0050A
- Bulleted list item
BG_5_9572_2013_timeseries.csv
- Bulleted list item
BG_5_9572_2014_timeseries.csv
- Bulleted list item
BG_5_9572_2015_timeseries.csv
- Bulleted list item
BG_5_9572_2016_timeseries.csv
- Bulleted list item
BG_5_9572_2017_timeseries.csv
- Bulleted list item
BG_5_9572_2017_timeseries.csv
- Bulleted list item
BG_5_9572_2018_timeseries.csv
SPO-BG0052A
- Bulleted list item
BG_5_9421_2013_timeseries.csv
- Bulleted list item
BG_5_9421_2014_timeseries.csv
- Bulleted list item
BG_5_9421_2015_timeseries.csv
- Bulleted list item
BG_5_9421_2016_timeseries.csv
- Bulleted list item
BG_5_9421_2017_timeseries.csv
- Bulleted list item
BG_5_9421_2017_timeseries.csv
- Bulleted list item
BG_5_9421_2018_timeseries.csv
SPO-BG0054A
- Bulleted list item
BG_5_9484_2013_timeseries.csv
- Bulleted list item
BG_5_9484_2014_timeseries.csv
- Bulleted list item
BG_5_9484_2015_timeseries.csv
SPO-BG0073A
- Bulleted list item
BG_5_9616_2013_timeseries.csv
- Bulleted list item
BG_5_9616_2014_timeseries.csv
- Bulleted list item
BG_5_9616_2015_timeseries.csv
- Bulleted list item
BG_5_9616_2016_timeseries.csv
- Bulleted list item
BG_5_9616_2017_timeseries.csv
- Bulleted list item
BG_5_9616_2017_timeseries.csv
- Bulleted list item
BG_5_9616_2018_timeseries.csv
SPO-BG0079A
- Bulleted list item
BG_5_60881_2018_timeseries.csv
Solution: Files names highlighted in red are incomplete and will thus be omitted from the dataset in order to make an accurate comparison across air quality stations. Thus, only data from 4 out of 6 air quality stations will be used.
Issue 2: Combining CSV files
Description: Having to import multiple CSV files into Tableau and joining them can be quite a hassle.
Solution: Merging of CSV files was done via Terminal on OSX instead. Running the code “cat *.csv >merged.csv” will merge all CSV files within the enclosing folder. There will be now be a single CSV file for each Air Quality station. This will then be merged again to form a single CSV file containing air quality data from 4 air quality stations between 2013 to 2018.
Issue 3: Inner-joining Metadata file with BG Datafile
Description: Detailed information about each air quality station is set out in metadata.xsls This information can be inner-joined with the BG data through a common field, AirQualityStationEolCode.
Solution: Perform inner-join in Data Source tab of Tableau.
Data Source> Inner-join Tables in Tableau.
Citizen science air quality measurements (Air Tube.zip) This dataset contains air quality measurements including temperature, humidity, pressure as well as measurements of airborne particulate matter(PM), specifically PM10(denoted by P1) and PM2.5(denoted by P2) across many stations. The location of these stations are also represented by its geohash, which is an alphanumeric string of a set of latitude and longitude. There are a total of 2 dataset files in this zip file, each containing hourly data in 2017 and 2018.
The following issues were identified in either or both of the datsets and data transformation was performed to address them:
Issue 1: Missing geohash values
Description: There were missing geohash values in 4 rows of data in the data_bg_2018.csv file.