Difference between revisions of "IS428 AY2018-19T1 Gokarn Malika Nitin"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 9: Line 9:
  
 
Therefore, the aim of this assignment is to reveal the spatiotemporal patterns of air quality and measurement techniques in Sofia City of Bulgaria, thereby identifying issues of concern.  
 
Therefore, the aim of this assignment is to reveal the spatiotemporal patterns of air quality and measurement techniques in Sofia City of Bulgaria, thereby identifying issues of concern.  
 +
</div>
 +
==<div style="background: #581845; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Open Sans, Arial, sans-serif"><font color= #ffffff>Dataset Analysis and Transformation Process</font></div>==
 +
<div style="font-family:Open Sans, Arial, sans-serif;font-size:15px">
 +
<b>Dataset Download</b>
 +
</div>
  
==<div style="background: #581845; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Open Sans, Arial, sans-serif"><font color= #ffffff>Dataset Analysis and Transformation Process</font></div>==
 
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
  
Line 21: Line 25:
  
 
They can be download by click on this [https://storage.cloud.google.com/global-datathon-2018/sofia-air/air-sofia.zip link].
 
They can be download by click on this [https://storage.cloud.google.com/global-datathon-2018/sofia-air/air-sofia.zip link].
 +
</div>
 +
 +
<div style="font-family:Open Sans, Arial, sans-serif;font-size:15px">
 +
<b>Dataset Cleaning and Transformation</b>
 +
</div>
 +
 +
{| class="wikitable"
 +
|-
 +
! Problem #1 || EEA Data Building Issues
 +
|-
 +
| Issue || The official air quality measurement readings (EEA data) do not include the longitude and latitude of the place of measurement. Instead, they are contained in a separate metadata file. Additionally, each stations' recordings for a specific year are stored in separate .csv files.
 +
|-
 +
| Solution || Append all the files together, through a Tableau Union. Eliminate data for station 9484, referring to the station named "Orlov Most". This is due to the fact that data for the years 2016 onwards is missing. I choose not to exclude the data for station 60881 referring to the station "Mladost" solely because the data for Mladost is more recent data, and can be considered a new addition to the station list.
 +
<br/>
 +
Lastly, an inner join of the union and the metadata file is conducted. This is done in order to assign the respective longitude and latitudes to all the rows, based on their respective Air Quality Stations.
 +
|}
 +
  
 
==Task 1: Spatio-temporal Analysis of Official Air Quality==
 
==Task 1: Spatio-temporal Analysis of Official Air Quality==
Line 47: Line 68:
 
In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2.  Limit your response to no more than 5 images and 600 words.   
 
In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2.  Limit your response to no more than 5 images and 600 words.   
  
=Visualisation Software=
+
==<div style="background: #581845; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Open Sans, Arial, sans-serif"><font color= #ffffff>Software</font></div>==
 +
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
 +
 
 +
* Tableau - for visualization of the various tasks
 +
* Python - for geocoding
 +
</div>

Revision as of 04:46, 11 November 2018

Problem and Motivation

Air Pollution is the single largest environmental health risk in Europe. It is also an important risk factor across the rest of the world. This is due to the high number of metrics pointing toward air pollution being the primary cause of distress in terms of disease (most deadly of which include cancer) and death. For example, it is estimated that 7 million people died prematurely across the world due to air population. In fact, in the European Union, 400,000 people suffered a premature death.

The level of air pollution across the world is only increasing. Within the European Union, one of the countries with the highest PM2.5 concentration in urban areas, over a three-year average is Bulgaria. At the same time, Bulgaria is also leading on the top polluted countries in the PM10 measure, with 77 μg/m3 on the daily mean concentration, which is much higher than WHO limit as well as the EU limit (50 μg/m3).

It is now a major concern in Bulgaria as to how clean the air you’re breathing right now is. Measurements show that citizens all over the country breathe air that is considered harmful to health. The Organization for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide.

Therefore, the aim of this assignment is to reveal the spatiotemporal patterns of air quality and measurement techniques in Sofia City of Bulgaria, thereby identifying issues of concern.

Dataset Analysis and Transformation Process

Dataset Download

Four major data sets in zipped file format are used and are available below:

  • Official air quality measurements (5 stations in the city)(EEA Data.zip) – as per EU guidelines on air quality monitoring see the data description HERE…
  • Citizen science air quality measurements (Air Tube.zip), incl. temperature, humidity and pressure (many stations) and topography (gridded data).
  • Meteorological measurements (1 station)(METEO-data.zip): Temperature; Humidity; Wind speed; Pressure; Rainfall; Visibility
  • Topography data (TOPO-DATA)

They can be download by click on this link.

Dataset Cleaning and Transformation

Problem #1 EEA Data Building Issues
Issue The official air quality measurement readings (EEA data) do not include the longitude and latitude of the place of measurement. Instead, they are contained in a separate metadata file. Additionally, each stations' recordings for a specific year are stored in separate .csv files.
Solution Append all the files together, through a Tableau Union. Eliminate data for station 9484, referring to the station named "Orlov Most". This is due to the fact that data for the years 2016 onwards is missing. I choose not to exclude the data for station 60881 referring to the station "Mladost" solely because the data for Mladost is more recent data, and can be considered a new addition to the station list.


Lastly, an inner join of the union and the metadata file is conducted. This is done in order to assign the respective longitude and latitudes to all the rows, based on their respective Air Quality Stations.


Task 1: Spatio-temporal Analysis of Official Air Quality

Characterize the past and most recent situation with respect to air quality measures in Sofia City. What does a typical day look like for Sofia city? Do you see any trends of possible interest in this investigation? What anomalies do you find in the official air quality dataset? How do these affect your analysis of potential problems in the environment?

Your submission for this questions should contain no more than 10 images and 1000 words.

Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

Using appropriate data visualisation, you are required will be asked to answer the following types of questions:

  • Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviours of the sensors by analyzing the readings they capture? Limit your response to no more than 4 images and 600 words.
  • Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time-dependent? Limit your response to no more than 6 images and 800 words.

Task 3

Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:

  • Local energy sources. For example, according to Unmask My City, a global initiative by doctors, nurses, public health practitioners, and allied health professionals dedicated to improving air quality and reducing emissions in our cities, Bulgaria’s main sources of PM10, and fine particle pollution PM2.5 (particles 2.5 microns or smaller) are household burning of fossil fuels or biomass, and transport.
  • Local meteorology such as temperature, pressure, rainfall, humidity, wind etc
  • Local topography
  • Complex interactions between local topography and meteorological characteristics.
  • Transboundary pollution, for example, the haze that intruded into Singapore from our neighbours.

In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2. Limit your response to no more than 5 images and 600 words.

Software

  • Tableau - for visualization of the various tasks
  • Python - for geocoding