IS428 2018-19 Term1 Assign Yeo Qi Xun

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

To be a Visual Detective: Revealing spatio-temporal patterns

Overview

Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.

Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health.

Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).

According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).

The Task

In this assignment, you are required to use visual analytics approach to reveal spatio-temporal patterns of air quality in Sofia City and to identify issues of concern.

Using appropriate data visualisation, you are required will be asked to answer the following types of questions:

Task 1: Spatio-temporal Analysis of Official Air Quality

Characterize the past and most recent situation with respect to air quality measures in Sofia City. What does a typical day look like for Sofia city? Do you see any trends of possible interest in this investigation? What anomalies do you find in the official air quality dataset? How do these affect your analysis of potential problems to the environment?

Your submission for this questions should contain no more than 10 images and 1000 words.

Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

Using appropriate data visualisation, you are required will be asked to answer the following types of questions:

  • Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 4 images and 600 words.
  • Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time dependent? Limit your response to no more than 6 images and 800 words.

Task 3: Analyse Unmask My City's Claim

Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:

  • Local energy sources. For example, according to Unmask My City, a global initiative by doctors, nurses, public health practitioners, and allied health professionals dedicated to improving air quality and reducing emissions in our cities, Bulgaria’s main sources of PM10, and fine particle pollution PM2.5 (particles 2.5 microns or smaller) are household burning of fossil fuels or biomass, and transport.
  • Local meteorology such as temperature, pressure, rainfall, humidity, wind etc
  • Local topography
  • Complex interactions between local topography and meteorological characteristics.
  • Transboundary pollution for example the haze that intruded into Singapore from our neighbours.

In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2. Limit your response to no more than 5 images and 600 words.

Motivation

These are the main motivations for the development of the visualization tool:

  1. Understand the difference between official and unofficial data about air quality
  2. Monitor emissions of each Coal Plant
  3. Investigate anomalies with weather patterns
  4. Tracking of air quality in different areas

The tool can be used by citizens and the government alike as it provides useful functions for them to understand more about air quality and how it affects all walks of life in Bulgaria.

Background Information

Key Measurement attributes and their significance
Official air quality measurements (5 stations in the city)

  1. TASMAX[degrees C] Daily maximum temperature
  2. TASAVG[degrees C] Daily average temperature
  3. TASMIN[degrees C] Daily minimum temperature
  4. DPMAX[degrees C] Daily maximum dew point temperature
  5. DPAVG[degrees C] Daily average dew point temperature
  6. DPMIN[degrees C] Daily minimum dew point temperature
  7. RHMAX[%] Daily maximum relative humidity
  8. RHAVG[%] Daily average relative humidity
  9. RHMIN[%] Daily minimum relative humidity
  10. sfcWindMAX[km/h] Daily maximum wind speed
  11. sfcWindAVG[km/h] Daily average wind speed
  12. sfcWindMIN[km/h] Daily minimum wind speed
  13. PSLMAX[hpa] Daily maximum surface pressure
  14. PSLAVG[hpa] Daily average surface pressure
  15. PSLMIN[hpa] Daily minimum surface pressure
  16. PRCPMAX[mm] Daily maximum precipitation amount
  17. PRCPAVG[mm] Daily average precipitation amount
  18. PRCPMIN[mm] Daily minimum precipitation amount
  19. VISIB[km] Daily average visibility

Data

You will have the following data and supporting information at your disposal:

  • Official air quality measurements (5 stations in the city)
  • Citizen science air quality measurements
  • Meteorological measurements
  • Topography data

The datasets above can be generally grouped into 3 different categories:

  1. Air Quality Data
  2. Meteorological Data
  3. Topography Data

The data will then be visualized using Tableau. However, some data cleaning and preprocessing steps are required before the data is suitable for use in Tableau. I will be using python to execute the following data cleaning tasks

Data Cleaning

Problem #1 Citizen Science Air Quality Mapping Data
Issue The original citizen science air quality data provided contains Geohashes. Tableau does not have difficulty reading Geohashes, but to plot the choropleth maps, having the data as Geohashes will pose some issues. Thus, there is a need for non-Tableau solution/alternative.
Solution
Cleaning1.png
Problem #2 Official Air Quality Data
Issue The original official air quality data provided does not contain location data. In order to map out the official air quality, there is a need to do an inner join between the metadata data and the original official air quality data. Even though we are able to do this in python using Pandas, I decided to use the out-of-the-box solution of using Tableau to do this inner join.
Solution

Final Excel Files

  1. data_bg_2017_clean.csv
  2. Contains all the citizen obtained data for 2017
  3. data_bg_2018_clean.csv
  4. Contains all the citizen obtained data for 2018

Data Import/Configuration

When importing multiple files, we need to tell Tableau how the files are related to one another. In this case, the files have a common attribute of date/time or ids. However, to allow us to use a filter from one data source to another data source, Tableau needs to understand how the files within the data source are related. For example, in our ,.

Brief Implementation Steps
Normally automatic mapping would be sufficient, however in our case, because of the complexity of our data, Tableau was unable to establish a meaningful relationship between the datasets. Thus, we have to do the custom mapping ourselves.

Visualisation

Findings - Task #1

What are the typical patterns in the official air quality data? What does a typical day look like for Sofia city?

Describe up to ten of the most interesting patterns that appear in the official air quality data. Describe what is notable about the pattern and explain its possible significance.

Findings - Task #2

What are the typical patterns in the citizen science air quality measurements data? What does a typical day look like for Sofia city?

Describe up to ten of the most interesting patterns that appear in the citizen science air quality measurements data. Describe what is notable about the pattern and explain its possible significance.

Findings - Task #3

Conclusion

Link

Improvement

To perform the visual analysis, this is a list of the software which I used.

  • Tableau
  • Excel
  • Chrome
  • Python

Assignment Q&A

Need more clarification, please feel free to pen down your questions.

References

Comments

Do provide me your feedback!:)