Difference between revisions of "IS428 AY2018-19T1 Chen Yuge"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 56: Line 56:
 
Steps
 
Steps
 
# Transform 2018 data time to local time & take average time
 
# Transform 2018 data time to local time & take average time
 +
    * As I explored the dataset, I found the datetime format is UCT. I changed it to local time.
 +
    * The time interval between "DatetimeBegin" column and "Datetime End" is always 1 hour, therefore I decided to take the middle time--0.5 hour late than "DatetimeBegin" as the time of the record.
 +
[[File:Cap8.png|800px|center]]<br>
 +
 +
[[File:Cap9.png|800px|center]]
 
# Aggregate 5 sheets
 
# Aggregate 5 sheets
  
 
== Visualization ==
 
== Visualization ==
 
=== Home Page ===
 
=== Home Page ===
 +
[[File:Capture10.png|center]]
 +
{| class="wikitable"
 +
|-
 +
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 20%;" | Button
 +
! style="font-weight: bold;background: #536a87;color:#fbfcfd;width: 70%" | Description
 +
! style="font-weight: bold;background: #536a87;color:#fbfcfd;" | icon
 +
|-
 +
| <center>'''Official Air Quality Station Timeseries''' <br/>
 +
||
 +
* Timeseries line chart of Average concentration across 5 stations.
 +
* map of location of 5 stations as well as it's average concentration across years (2013-2017)
 +
* Dataset: EEA data
 +
||
 +
[[File:Capture11.png|120px|center]]
 +
|-
 +
| <center>'''Official Air Quality Heatmap''' <br/>
 +
||
 +
* Timeseries heatmap of:
 +
# Concentration by weekdays across years (2013-2018)
 +
# Concentration in 24 hours across years month animation (2018)
 +
# Concentration in 24 hours across years overview (2018)
 +
* Dataset: EEA data
 +
||
 +
[[File:Capture12.png|120px|center]]
 +
|-
 +
| <center>'''Sensor Distribution Map''' <br/>
 +
||
 +
* Timeseries line chart of Average concentration across 5 stations.
 +
* map of location of 5 stations as well as it's average concentration across years (2013-2017)
 +
* Dataset: EEA data
 +
||
 +
[[File:Capture13.png|120px|center]]
 +
|-
 +
| <center>'''Factors Analysis''' <br/>
 +
||
 +
* Timeseries line chart of Average concentration across 5 stations.
 +
* map of location of 5 stations as well as it's average concentration across years (2013-2017)
 +
* Dataset: EEA data
 +
||
 +
[[File:Capture14.png|120px|thumb|center]]
 +
|}
 +
  
 
=== Official Air Quality ===
 
=== Official Air Quality ===

Revision as of 20:22, 11 November 2018

Problem & Motivation

As one of the most polluted countries in Europe, Bulgaria is facing a high level of pollution. It is ranked eighth in the European Environment Agency’s 2017 report on air quality in Europe in terms of most premature deaths caused by PM2.5[1]. Our goal in this report is to find the pollution condition in the capital of Bulgaria– Sofia city.

In this report, I will use 3 Air Quality indicators—Official Air Quality Concentration, PM2.5 and PM10 to analyze the air quality in Sofia city. PM2.5 is a pollutant stemming from fuel combustion, heating, transportation, waste incineration, agriculture and other anthropogenic sources. According to studies, it is highly correlated with cancer rate, with a 36% increase in lung cancer per 10 μg/m3 as it can penetrate deeper into the lungs[2]. Worldwide exposure to PM2.5 contributed to 4.1 million deaths from heart disease and stroke, lung cancer, chronic lung disease, and respiratory infections. PM10 is particulate matter 10 micrometers or less in diameter, which has slight larger dimension than PM2.5 but same level of danger.

By using the above-mentioned indicators, the goal is to visualize the air quality situation as well as the factors affecting the air quality. The factors which will be analyzed in this report are:

     -	Local energy sources
- Meteorology such as temperature, pressure, rainfall, humidity, wind etc
- Human Behavior such as driving, room heating
- Behaviors of Neighbors of Sofia city.



Data Analysis & Transformation

Data Analysis and Cleaning

Air Tube data


After doing a quick plotting of the raw data, I found that there are outliers (or misreading) in the Air Tube Dataset: Temperature: remove temperature less than -20 and larger than 50

Column Name Action Screenshot
Temperature
remove temperature less than -20 and larger than 50
Cap.png
Pressure
remove pressures valued 0
Cap2.png

Aggregation

EEA data

  • Aggregate data from different years (2013-2018) into one sheet with “day” details:
Capture3.png
Arrow.jpg
Capture4.png


Steps

  1. Aggregate average concentration by day in each data sheets from 2013-2018
Cap5.png
  1. Combine sheets of different stations and years into 1 (2013-2018)


  • Aggregate 2018 dataset into one sheet with “hour” details:
Capture6.png
Arrow.jpg
Capture7.png

Steps

  1. Transform 2018 data time to local time & take average time
   * As I explored the dataset, I found the datetime format is UCT. I changed it to local time.
   * The time interval between "DatetimeBegin" column and "Datetime End" is always 1 hour, therefore I decided to take the middle time--0.5 hour late than "DatetimeBegin" as the time of the record.
Cap8.png


Cap9.png
  1. Aggregate 5 sheets

Visualization

Home Page

Capture10.png
Button Description icon
Official Air Quality Station Timeseries
  • Timeseries line chart of Average concentration across 5 stations.
  • map of location of 5 stations as well as it's average concentration across years (2013-2017)
  • Dataset: EEA data
Capture11.png
Official Air Quality Heatmap
  • Timeseries heatmap of:
  1. Concentration by weekdays across years (2013-2018)
  2. Concentration in 24 hours across years month animation (2018)
  3. Concentration in 24 hours across years overview (2018)
  • Dataset: EEA data
Capture12.png
Sensor Distribution Map
  • Timeseries line chart of Average concentration across 5 stations.
  • map of location of 5 stations as well as it's average concentration across years (2013-2017)
  • Dataset: EEA data
Capture13.png
Factors Analysis
  • Timeseries line chart of Average concentration across 5 stations.
  • map of location of 5 stations as well as it's average concentration across years (2013-2017)
  • Dataset: EEA data
Capture14.png


Official Air Quality

2013-2017

2018

Anomalies

  • Some of the data missing such as 2017 Jan to Oct. Might because of breakdown of devices

Citizen Science Air Quality

Sensor geographical Distribution

  • Across Cities: Sensors are evenly spread out across cities except Sofia city and Polvdiv, which has more condensed sensor distribution.
  • Inside Sofia City: The sensors are more condensed in the middle of city. It is not evenly distributed in the entire city.


====Sensing data statistics====
TODO: choose filter to navigate to 2017


Factors influencing air quality

  • Scatter plot

 We can deduce the following information from the scatter plot: • PM2.5 and PM10 are higher when pressure is around 1000k and 170k • PM2.5 and PM10 are positive correlated. it’s possible to have high PM10 and low PM2.5, but when PM10 is low, PM2.5 must below • PM2.5 and PM10 are higher when temperature is around 0 slowly reduce as temperature increase. It reaches near to 0 when temperature is close to 40 • The upper bond of PM2.5 & PM10 increases as humidity increase. But there is no linear relationship between humidity and PM2.5&10.