IS428 AY2018-19T1 Gokarn Malika Nitin

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Problem and Motivation

Air Pollution is the single largest environmental health risk in Europe. It is also an important risk factor across the rest of the world. This is due to the high number of metrics pointing toward air pollution being the primary cause of distress in terms of disease (most deadly of which include cancer) and death. For example, it is estimated that 7 million people died prematurely across the world due to air population. In fact, in the European Union, 400,000 people suffered a premature death.

The level of air pollution across the world is only increasing. Within the European Union, one of the countries with the highest PM2.5 concentration in urban areas, over a three-year average is Bulgaria. At the same time, Bulgaria is also leading on the top polluted countries in the PM10 measure, with 77 μg/m3 on the daily mean concentration, which is much higher than WHO limit as well as the EU limit (50 μg/m3).

It is now a major concern in Bulgaria as to how clean the air you’re breathing right now is. Measurements show that citizens all over the country breathe air that is considered harmful to health. The Organization for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide.

Therefore, the aim of this assignment is to reveal the spatiotemporal patterns of air quality and measurement techniques in Sofia City of Bulgaria, thereby identifying issues of concern.

Dataset Analysis and Transformation Process

Dataset Download

Four major data sets in zipped file format are used and are available below:

  • Official air quality measurements (5 stations in the city)(EEA Data.zip) – as per EU guidelines on air quality monitoring see the data description HERE…
  • Citizen science air quality measurements (Air Tube.zip), incl. temperature, humidity and pressure (many stations) and topography (gridded data).
  • Meteorological measurements (1 station)(METEO-data.zip): Temperature; Humidity; Wind speed; Pressure; Rainfall; Visibility
  • Topography data (TOPO-DATA)

They can be download by click on this link.

Dataset Cleaning and Transformation

Problem #1 EEA Data Building Issues
Issue The official air quality measurement readings (EEA data) do not include the longitude and latitude of the place of measurement. Instead, they are contained in a separate metadata file. Additionally, each stations' recordings for a specific year are stored in separate .csv files.
Solution Append all the files together, through a Tableau Union. Eliminate data for station 9484, referring to the station named "Orlov Most". This is due to the fact that data for the years 2016 onwards is missing. I choose not to exclude the data for station 60881 referring to the station "Mladost" solely because the data for Mladost is more recent data, and can be considered a new addition to the station list.


Lastly, an inner join of the union and the metadata file is conducted. This is done in order to assign the respective longitude and latitudes to all the rows, based on their respective Air Quality Stations.

Problem #2 AirTube Data Building Issues
Issue The citizen science air quality measurement readings (AirTube data) do not include the longitude and latitude of the place of measurement. Instead, they are contained in the form of a geohash code. Unfortunately, Tableau is not built to handle geohash code.
Solution Making use of the GitHub python geohash2 library [1] I am able to write a python script that can do the decoding for me, taking into consideration the error of transformation as well.


Upon importing the decoded dataset into Tableau, I found 4 points that have latitude and longitude values of 0.000000, as well as 1 point that has a latitude value of -4.025953, and a longitude value of 78.751781. As neither of these 5 points are anywhere near Bulgaria or Sofia City I have excluded them from the dataset as a whole.

Problem #3 AirTube Data Outliers and Noise Removal
Issue The citizen science air quality measurement readings (AirTube data) has multiple "wrong" readings with some being noise while some being representative of broken sensors. Through a simple internet search one can find that the lowest temperature Bulgaria has ever faced is -38.3 degrees Celsius, while the highest is 45.2 degrees Celsius.
Solution In order to remove the noise and outliers, the recorded temparature above 50 degrees Celsius and below -40 degrees Celsius are removed.

Task 1: Spatio-temporal Analysis of Official Air Quality

Characterize the past and most recent situation with respect to air quality measures in Sofia City. What does a typical day look like for Sofia city? Do you see any trends of possible interest in this investigation? What anomalies do you find in the official air quality dataset? How do these affect your analysis of potential problems in the environment?

Your submission for this questions should contain no more than 10 images and 1000 words.

Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

Using appropriate data visualisation, you are required will be asked to answer the following types of questions:

  • Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviours of the sensors by analyzing the readings they capture? Limit your response to no more than 4 images and 600 words.
  • Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time-dependent? Limit your response to no more than 6 images and 800 words.

Task 3

Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:

  • Local energy sources. For example, according to Unmask My City, a global initiative by doctors, nurses, public health practitioners, and allied health professionals dedicated to improving air quality and reducing emissions in our cities, Bulgaria’s main sources of PM10, and fine particle pollution PM2.5 (particles 2.5 microns or smaller) are household burning of fossil fuels or biomass, and transport.
  • Local meteorology such as temperature, pressure, rainfall, humidity, wind etc
  • Local topography
  • Complex interactions between local topography and meteorological characteristics.
  • Transboundary pollution, for example, the haze that intruded into Singapore from our neighbours.

In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2. Limit your response to no more than 5 images and 600 words.

Software

  • Tableau - for visualization of the various tasks
  • Python - for geocoding

References