Difference between revisions of "IS428 AY2018-19T1 Rainean Young Calubad"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 13: Line 13:
 
== <br/><div style="background: #2B547E; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.08em;font-size:20px"><font color=#fbfcfd face="Arial">
 
== <br/><div style="background: #2B547E; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.08em;font-size:20px"><font color=#fbfcfd face="Arial">
 
Task 1: Spatio-temporal Analysis of Official Air Quality
 
Task 1: Spatio-temporal Analysis of Official Air Quality
</font></div>
+
</font></div> ==
  
 
'''Questions:''' <hr>
 
'''Questions:''' <hr>
Line 56: Line 56:
 
</br>
 
</br>
  
== <br/><div style="background: #2B547E; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.08em;font-size:20px"><font color=#fbfcfd face="Century Gothic">Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements</font></div> ==
+
<br/><div style="background: #2B547E; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.08em;font-size:20px"><font color=#fbfcfd face="Arial">
 +
Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements
 +
</font></div>  
 +
 
 +
 
 +
'''Questions:''' <hr>
 +
Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
 +
<br>
 +
Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time dependent?
 +
</br>
 +
 
 +
'''Data Cleaning & Transformation:''' <hr>
 +
To investigate the above problems, we need to clean the data and then visualise the dataset. The cleaning process involves:
 +
 
 +
* Combining all the datasets across the years into one excel file
 +
* Linking the dataset with the metadata to retrieve the longitude and latitude of the air quality station
 +
* Filter to only get data from Sofia City.
 +
 
 +
<br>
 +
 
 +
[[File:GroupingSofiaCity.png|300px|center]]
 +
 
 +
The filtering process to only show data from Sofia City was done through the use of the Lasso tool in Tableau to group the geohashes for Sofia City. Afterwards, I used the group created to only show values for Sofia city and removed the records for neighboring cities.
 +
 
 +
</br>
 +
 
 +
'''Dashboard:''' <hr>
 +
In order to investigate the above problem, I have created two dashboards as following:
 +
 
 +
</br>
 +
'''Dashboard 1:'''
 +
 
 +
This dashboard uses data from 2013 to 2017 which contains the daily readings of the air quality at different stations
 +
 
 +
This dashboard allows the users to:
 +
# 1. See the fluctuations in the air pollutants daily, and
 +
# 2. Drill-down and filter the data by Year and by Air Quality Station
 +
 
 +
[[File:Screenshot 2018-11-11 at 8.59.02 PM.png|700px|thumb|center]]
 +
</br>
 +
 
 +
'''Dashboard 2:'''
 +
 
 +
This dashboard uses data in 2018 which contains the hourly readings of air quality
 +
 
 +
This dashboard allows the users to:
 +
# 1. See the fluctuations in the air pollutants hourly and daily, and
 +
# 2. Compare the air pollutant readings across different air quality stations
 +
</br>
 +
[[File:Dashboard 2.2 - long.png|700px|thumb|center]]
  
 
''' Sensor Coverage : ''' <hr>
 
''' Sensor Coverage : ''' <hr>

Revision as of 22:06, 11 November 2018

To be a Visual Detective


Overview

Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.

Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health.

Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).

According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).

==

Task 1: Spatio-temporal Analysis of Official Air Quality

== Questions:


Characterize the past and most recent situation with respect to air quality measures in Sofia City. What does a typical day look like for Sofia city? Do you see any trends of possible interest in this investigation? What anomalies do you find in the official air quality dataset? How do these affect your analysis of potential problems to the environment?


Data Cleaning & Transformation:


To investigate the above problems, we need to clean the data and then visualise the dataset. The cleaning process involves:

  • Combining all the datasets across the years into one excel file
  • Linking the dataset with the metadata to retrieve the longitude and latitude of the air quality station


Dashboard:


In order to investigate the above problem, I have created two dashboards as following:


Dashboard 1:

This dashboard uses data from 2013 to 2017 which contains the daily readings of the air quality at different stations

This dashboard allows the users to:

  1. 1. See the fluctuations in the air pollutants daily, and
  2. 2. Drill-down and filter the data by Year and by Air Quality Station
Screenshot 2018-11-11 at 8.59.02 PM.png


Dashboard 2:

This dashboard uses data in 2018 which contains the hourly readings of air quality

This dashboard allows the users to:

  1. 1. See the fluctuations in the air pollutants hourly and daily, and
  2. 2. Compare the air pollutant readings across different air quality stations


Dashboard 2.2 - long.png

Visualisation and Insights:




Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements


Questions:


Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?
Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time dependent?

Data Cleaning & Transformation:


To investigate the above problems, we need to clean the data and then visualise the dataset. The cleaning process involves:

  • Combining all the datasets across the years into one excel file
  • Linking the dataset with the metadata to retrieve the longitude and latitude of the air quality station
  • Filter to only get data from Sofia City.


GroupingSofiaCity.png

The filtering process to only show data from Sofia City was done through the use of the Lasso tool in Tableau to group the geohashes for Sofia City. Afterwards, I used the group created to only show values for Sofia city and removed the records for neighboring cities.


Dashboard:


In order to investigate the above problem, I have created two dashboards as following:


Dashboard 1:

This dashboard uses data from 2013 to 2017 which contains the daily readings of the air quality at different stations

This dashboard allows the users to:

  1. 1. See the fluctuations in the air pollutants daily, and
  2. 2. Drill-down and filter the data by Year and by Air Quality Station
Screenshot 2018-11-11 at 8.59.02 PM.png


Dashboard 2:

This dashboard uses data in 2018 which contains the hourly readings of air quality

This dashboard allows the users to:

  1. 1. See the fluctuations in the air pollutants hourly and daily, and
  2. 2. Compare the air pollutant readings across different air quality stations


Dashboard 2.2 - long.png

Sensor Coverage :



Task2SensorDistribution.png


The symbol map chart above shows where each of the sensors, denoted by a circle, are located in Sofia City. Based on the number of circles, we are able to deduce that the majority of the sensors are distributed in the center of the city, and only a few or even none are placed near the edges of the city. This means that the coverage of the sensors are focused at the center of Sofia City.
The same chart also shows how well the sensors are operating in the city by looking at the size of the circles, which symbolizes the total number of measurements recorded by the sensor in the span of over a year. We can see that most sensors are operating at the same rate, but there are a few that only captured few records as denoted by their small sized circle. It is interesting to see that the sensors that only captured few data are also in the same location as those that captured a lot of measurements. One can speculate that the sensors may have broken down and have been replaced, which is why they only had a few records.

Sensor Performance :




Sensor Performace :
Input Line Chart Here


Sensor Operations : Unusual Behaviour (Detecting Outlier)
Input Dashboard Here

IdentifySensorUnusualBehaviour.PNG
Cleaned Outlier.PNG


Part 2 :


YearlyQuardrant.PNG


Using appropriate data visualisation, you are required will be asked to answer the following types of questions:

Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 4 images and 600 words.


Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time dependent? Limit your response to no more than 6 images and 800 words.


Task 3: Air Quality Measure Analysis

Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:

  1. Local energy sources. For example, according to Unmask My City, a global initiative by doctors, nurses, public health practitioners, and allied health professionals dedicated to improving air quality and reducing emissions in our cities, Bulgaria’s main sources of PM10, and fine particle pollution PM2.5 (particles 2.5 microns or smaller) are household burning of fossil fuels or biomass, and transport.
  2. Local meteorology such as temperature, pressure, rainfall, humidity, wind etc
  3. Local topography
  4. Complex interactions between local topography and meteorological characteristics.
  5. Transboundary pollution for example the haze that intruded into Singapore from our neighbours.

In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2. Limit your response to no more than 5 images and 600 words.


Conclusion


Reference


Feedbacks

Please feel free to provide your feedback. Thank you.