Difference between revisions of "IS428 AY2018-19T1 Gokarn Malika Nitin"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 18: Line 18:
  
 
<!--
 
<!--
 +
  
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:15px">
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:15px">
Line 25: Line 26:
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
  
Four major data sets in zipped file format are used and are available below:  
+
Four major data sets in zipped file format are used and are available for download below:  
  
 
* Official air quality measurements (5 stations in the city)(EEA Data.zip) – as per EU guidelines on air quality monitoring see the data description [https://drive.google.com/file/d/1v5yCL-LdriDwa65qXPbFL7b0tydylDlb/view HERE…]
 
* Official air quality measurements (5 stations in the city)(EEA Data.zip) – as per EU guidelines on air quality monitoring see the data description [https://drive.google.com/file/d/1v5yCL-LdriDwa65qXPbFL7b0tydylDlb/view HERE…]
Line 34: Line 35:
 
They can be download by click on this [https://storage.cloud.google.com/global-datathon-2018/sofia-air/air-sofia.zip link].
 
They can be download by click on this [https://storage.cloud.google.com/global-datathon-2018/sofia-air/air-sofia.zip link].
 
</div>
 
</div>
 +
 +
<div style="font-family:Open Sans, Arial, sans-serif;font-size:15px">
 +
<b>Dataset Description and Understanding</b>
 +
</div>
 +
 +
  
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:15px">
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:15px">
Line 39: Line 46:
 
</div>
 
</div>
  
{| class="wikitable"
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|-
! Problem #1 || EEA Data Building Issues
+
! style="font-weight: bold;background: #6F1E29;color:#FFFFFF;width: 10%" | Problem #1  
 +
! style="font-weight: bold;background: #6F1E29;color:#FFFFFF;" | EEA Data Building Issues
 
|-
 
|-
 
| Issue || The official air quality measurement readings (EEA data) do not include the longitude and latitude of the place of measurement. Instead, they are contained in a separate metadata file. Additionally, each stations' recordings for a specific year are stored in separate .csv files.
 
| Issue || The official air quality measurement readings (EEA data) do not include the longitude and latitude of the place of measurement. Instead, they are contained in a separate metadata file. Additionally, each stations' recordings for a specific year are stored in separate .csv files.
 
|-
 
|-
| Solution || Append all the files together, through a Tableau Union. Eliminate data for station 9484, referring to the station named "Orlov Most". This is due to the fact that data for the years 2016 onwards is missing. I choose not to exclude the data for station 60881 referring to the station "Mladost" solely because the data for Mladost is more recent data, and can be considered a new addition to the station list.
+
| Solution || Append all the files together, through a Tableau Union. Eliminate data for station 9484, referring to the station named "Orlov Most". This is due to the fact that data for the years 2016 onwards is missing. Additionally, I choose to exclude the data for station 60881 referring to the station "Mladost" solely because the data for Mladost is data covering 2018 onwards, and can be considered a new addition to the station list.
 
<br/>
 
<br/>
Lastly, an inner join of the union and the metadata file is conducted. This is done in order to assign the respective longitude and latitudes to all the rows, based on their respective Air Quality Stations.
+
Lastly, an inner join of the union and the metadata file is conducted. This is done in order to assign the respective longitude and latitudes to all the rows, based on their respective Air Quality Stations. Therefore the join is done based on the EoI Code.
 
|}
 
|}
  
{| class="wikitable"
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|-
! Problem #2 || AirTube Data Building Issues
+
! style="font-weight: bold;background: #6F1E29;color:#FFFFFF;width: 10%" | Problem #2  
 +
! style="font-weight: bold;background: #6F1E29;color:#FFFFFF;" | AirTube Data Building Issues
 
|-
 
|-
 
| Issue || The citizen science air quality measurement readings (AirTube data) do not include the longitude and latitude of the place of measurement. Instead, they are contained in the form of a geohash code. Unfortunately, Tableau is not built to handle geohash code.
 
| Issue || The citizen science air quality measurement readings (AirTube data) do not include the longitude and latitude of the place of measurement. Instead, they are contained in the form of a geohash code. Unfortunately, Tableau is not built to handle geohash code.
Line 61: Line 70:
 
|}
 
|}
  
{| class="wikitable"
+
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|-
! Problem #3 || AirTube Data Outliers and Noise Removal
+
! style="font-weight: bold;background: #6F1E29;color:#FFFFFF;width: 10%" | Problem #3  
 +
! style="font-weight: bold;background: #6F1E29;color:#FFFFFF;" | AirTube Data Outliers and Noise Removal
 
|-
 
|-
 
| Issue || The citizen science air quality measurement readings (AirTube data) has multiple "wrong" readings with some being noise while some being representative of broken sensors. Through a simple internet search, one can find that the lowest temperature Bulgaria has ever faced is -38.3 degrees Celsius, while the highest is 45.2 degrees Celsius.
 
| Issue || The citizen science air quality measurement readings (AirTube data) has multiple "wrong" readings with some being noise while some being representative of broken sensors. Through a simple internet search, one can find that the lowest temperature Bulgaria has ever faced is -38.3 degrees Celsius, while the highest is 45.2 degrees Celsius.

Revision as of 13:51, 11 November 2018

Problem and Motivation

Dataset Analysis and Transformation Process

Task 1: Spatio-temporal Analysis of Official Air Quality

Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

Task 3

Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:

  • Local energy sources. For example, according to Unmask My City, a global initiative by doctors, nurses, public health practitioners, and allied health professionals dedicated to improving air quality and reducing emissions in our cities, Bulgaria’s main sources of PM10, and fine particle pollution PM2.5 (particles 2.5 microns or smaller) are household burning of fossil fuels or biomass, and transport.
  • Local meteorology such as temperature, pressure, rainfall, humidity, wind etc
  • Local topography
  • Complex interactions between local topography and meteorological characteristics.
  • Transboundary pollution, for example, the haze that intruded into Singapore from our neighbours.

In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2. Limit your response to no more than 5 images and 600 words.

Software

  • Tableau - for visualization of the various tasks
  • Python - for geocoding

References