Difference between revisions of "IS428 AY2018-19T1 Gokarn Malika Nitin"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 11: Line 11:
 
</div>
 
</div>
 
==<div style="background: #581845; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Open Sans, Arial, sans-serif"><font color= #ffffff>Dataset Analysis and Transformation Process</font></div>==
 
==<div style="background: #581845; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Open Sans, Arial, sans-serif"><font color= #ffffff>Dataset Analysis and Transformation Process</font></div>==
 +
 +
<!--
 +
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:15px">
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:15px">
 
<b>Dataset Download</b>
 
<b>Dataset Download</b>
Line 16: Line 19:
  
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
<!--
+
 
 
Four major data sets in zipped file format are used and are available below:  
 
Four major data sets in zipped file format are used and are available below:  
  
Line 50: Line 53:
 
| Solution || Making use of the GitHub python geohash2 library [https://github.com/DBarthe/geohash] I am able to write a python script that can do the decoding for me, taking into consideration the error of transformation as well.
 
| Solution || Making use of the GitHub python geohash2 library [https://github.com/DBarthe/geohash] I am able to write a python script that can do the decoding for me, taking into consideration the error of transformation as well.
 
<br/>
 
<br/>
Upon importing the decoded dataset into Tableau, I found 4 points that have latitude and longitude values of 0.000000, as well as 1 point that has a latitude value of -4.025953, and a longitude value of 78.751781. As neither of these 5 points are anywhere near Bulgaria or Sofia City I have excluded them from the dataset as a whole.
+
Upon importing the decoded dataset into Tableau, I found 4 points that have latitude and longitude values of 0.000000, as well as 1 point that has a latitude value of -4.025953, and a longitude value of 78.751781. As neither of these 5 points is anywhere near Bulgaria or Sofia City I have excluded them from the dataset as a whole.
 
|}
 
|}
  
Line 57: Line 60:
 
! Problem #3 || AirTube Data Outliers and Noise Removal
 
! Problem #3 || AirTube Data Outliers and Noise Removal
 
|-
 
|-
| Issue || The citizen science air quality measurement readings (AirTube data) has multiple "wrong" readings with some being noise while some being representative of broken sensors. Through a simple internet search one can find that the lowest temperature Bulgaria has ever faced is -38.3 degrees Celsius, while the highest is 45.2 degrees Celsius.
+
| Issue || The citizen science air quality measurement readings (AirTube data) has multiple "wrong" readings with some being noise while some being representative of broken sensors. Through a simple internet search, one can find that the lowest temperature Bulgaria has ever faced is -38.3 degrees Celsius, while the highest is 45.2 degrees Celsius.
 
|-
 
|-
| Solution || In order to remove the noise and outliers, the recorded temparature above 50 degrees Celsius and below -40 degrees Celsius are removed.
+
| Solution || In order to remove the noise and outliers, the recorded temperatures above 50 degrees Celsius and below -40 degrees Celsius are removed.
 
|}
 
|}
 
-->
 
-->

Revision as of 05:35, 11 November 2018

Problem and Motivation

Air Pollution is the single largest environmental health risk in Europe. It is also an important risk factor across the rest of the world. This is due to the high number of metrics pointing toward air pollution being the primary cause of distress in terms of disease (most deadly of which include cancer) and death. For example, it is estimated that 7 million people died prematurely across the world due to air population. In fact, in the European Union, 400,000 people suffered a premature death.

The level of air pollution across the world is only increasing. Within the European Union, one of the countries with the highest PM2.5 concentration in urban areas, over a three-year average is Bulgaria. At the same time, Bulgaria is also leading on the top polluted countries in the PM10 measure, with 77 μg/m3 on the daily mean concentration, which is much higher than WHO limit as well as the EU limit (50 μg/m3).

It is now a major concern in Bulgaria as to how clean the air you’re breathing right now is. Measurements show that citizens all over the country breathe air that is considered harmful to health. The Organization for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide.

Therefore, the aim of this assignment is to reveal the spatiotemporal patterns of air quality and measurement techniques in Sofia City of Bulgaria, thereby identifying issues of concern.

Dataset Analysis and Transformation Process

Task 1: Spatio-temporal Analysis of Official Air Quality

Characterize the past and most recent situation with respect to air quality measures in Sofia City. What does a typical day look like for Sofia city? Do you see any trends of possible interest in this investigation? What anomalies do you find in the official air quality dataset? How do these affect your analysis of potential problems in the environment?

Your submission for this questions should contain no more than 10 images and 1000 words.

Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

Using appropriate data visualisation, you are required will be asked to answer the following types of questions:

  • Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviours of the sensors by analyzing the readings they capture? Limit your response to no more than 4 images and 600 words.
  • Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time-dependent? Limit your response to no more than 6 images and 800 words.

Task 3

Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:

  • Local energy sources. For example, according to Unmask My City, a global initiative by doctors, nurses, public health practitioners, and allied health professionals dedicated to improving air quality and reducing emissions in our cities, Bulgaria’s main sources of PM10, and fine particle pollution PM2.5 (particles 2.5 microns or smaller) are household burning of fossil fuels or biomass, and transport.
  • Local meteorology such as temperature, pressure, rainfall, humidity, wind etc
  • Local topography
  • Complex interactions between local topography and meteorological characteristics.
  • Transboundary pollution, for example, the haze that intruded into Singapore from our neighbours.

In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2. Limit your response to no more than 5 images and 600 words.

Software

  • Tableau - for visualization of the various tasks
  • Python - for geocoding

References