Difference between revisions of "IS428 AY2018-19T1 Chen Yuge"
Yg.chen.2015 (talk | contribs) (Created page with " == Problem & Motivation == As one of the most polluted countries in Europe, Bulgaria is facing a high level of pollution. It is ranked eighth in the European Environment Agen...") |
Yg.chen.2015 (talk | contribs) |
||
(12 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
== Problem & Motivation == | == Problem & Motivation == | ||
As one of the most polluted countries in Europe, Bulgaria is facing a high level of pollution. It is ranked eighth in the European Environment Agency’s 2017 report on air quality in Europe in terms of most premature deaths caused by PM2.5[1]. Our goal in this report is to find the pollution condition in the capital of Bulgaria– Sofia city. <br> | As one of the most polluted countries in Europe, Bulgaria is facing a high level of pollution. It is ranked eighth in the European Environment Agency’s 2017 report on air quality in Europe in terms of most premature deaths caused by PM2.5[1]. Our goal in this report is to find the pollution condition in the capital of Bulgaria– Sofia city. <br> | ||
+ | <br> | ||
In this report, I will use 3 Air Quality indicators—Official Air Quality Concentration, PM2.5 and PM10 to analyze the air quality in Sofia city. PM2.5 is a pollutant stemming from fuel combustion, heating, transportation, waste incineration, agriculture and other anthropogenic sources. According to studies, it is highly correlated with cancer rate, with a 36% increase in lung cancer per 10 μg/m3 as it can penetrate deeper into the lungs[2]. Worldwide exposure to PM2.5 contributed to 4.1 million deaths from heart disease and stroke, lung cancer, chronic lung disease, and respiratory infections. PM10 is particulate matter 10 micrometers or less in diameter, which has slight larger dimension than PM2.5 but same level of danger. | In this report, I will use 3 Air Quality indicators—Official Air Quality Concentration, PM2.5 and PM10 to analyze the air quality in Sofia city. PM2.5 is a pollutant stemming from fuel combustion, heating, transportation, waste incineration, agriculture and other anthropogenic sources. According to studies, it is highly correlated with cancer rate, with a 36% increase in lung cancer per 10 μg/m3 as it can penetrate deeper into the lungs[2]. Worldwide exposure to PM2.5 contributed to 4.1 million deaths from heart disease and stroke, lung cancer, chronic lung disease, and respiratory infections. PM10 is particulate matter 10 micrometers or less in diameter, which has slight larger dimension than PM2.5 but same level of danger. | ||
− | By using the above-mentioned indicators, the goal is to visualize the air quality situation as well as the factors affecting the air quality. The factors | + | <br><br> |
− | + | By using the above-mentioned indicators, the goal is to visualize the air quality situation as well as the factors affecting the air quality. The factors such as meteorology, topography and human behavior will be analyzed in this report. | |
− | |||
− | |||
− | |||
− | |||
<br> | <br> | ||
<br> | <br> | ||
== Data Analysis & Transformation == | == Data Analysis & Transformation == | ||
===Data Analysis and Cleaning === | ===Data Analysis and Cleaning === | ||
− | ==== | + | ====Air Tube data==== |
− | + | <br> | |
After doing a quick plotting of the raw data, I found that there are outliers (or misreading) in the Air Tube Dataset: | After doing a quick plotting of the raw data, I found that there are outliers (or misreading) in the Air Tube Dataset: | ||
− | + | Temperature: remove temperature less than -20 and larger than 50<br> | |
− | [[File:Cap.png|thumb]] | + | {| class="wikitable" |
+ | |- | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 20%;" | Column Name | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 40%" | Action | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;" | Screenshot | ||
+ | |- | ||
+ | | <center>'''Geocoding''' <br/> | ||
+ | || Transform geohash to latitude and longtitude | ||
+ | || | ||
+ | [[File:Capture22.png|800px|thumb|center]] | ||
+ | |- | ||
+ | | <center>'''Temperature''' <br/> | ||
+ | || remove temperature less than -20 and larger than 50 | ||
+ | || | ||
+ | [[File:Cap.png|800px|center]] | ||
+ | |- | ||
+ | | <center>'''Pressure''' <br/> | ||
+ | || remove pressures valued 0 | ||
+ | || | ||
+ | [[File:Cap2.png|600px|center]] | ||
+ | |} | ||
+ | |||
+ | === Aggregation === | ||
+ | ====EEA data==== | ||
+ | * Aggregate data from different years (2013-2018) into one sheet with “day” details:<br> | ||
+ | [[File:Capture3.png|600px|center]] | ||
+ | [[File:Arrow.jpg|70px|center]] | ||
+ | [[File:Capture4.png|600px|center]]<br> | ||
+ | |||
+ | Steps | ||
+ | # Aggregate average concentration by day in each data sheets from 2013-2018 | ||
+ | [[File:Cap5.png|600px|center]] | ||
+ | # Combine sheets of different stations and years into 1 (2013-2018) | ||
+ | |||
+ | |||
+ | *Aggregate 2018 dataset into one sheet with “hour” details: | ||
+ | [[File:Capture6.png|600px|center]] | ||
+ | [[File:Arrow.jpg|70px|center]] | ||
+ | [[File:Capture7.png|600px|center]] | ||
+ | |||
+ | Steps | ||
+ | # Transform 2018 data time to local time & take average time | ||
+ | * As I explored the dataset, I found the datetime format is UCT. I changed it to local time. | ||
+ | * The time interval between "DatetimeBegin" column and "Datetime End" is always 1 hour, therefore I decided to take the middle time--0.5 hour late than "DatetimeBegin" as the time of the record. | ||
+ | [[File:Cap8.png|800px|center]]<br> | ||
+ | |||
+ | [[File:Cap9.png|800px|center]] | ||
+ | # Aggregate 5 sheets | ||
+ | |||
+ | == Visualization == | ||
+ | Tableau Link: unable to upload to tableau online because the data amount exceeds the limit. | ||
+ | === Home Page === | ||
+ | [[File:Capture10.png|800px|center]] | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 20%;" | Button | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 70%" | Description | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;" | icon | ||
+ | |- | ||
+ | | <center>'''Official Air Quality Station Timeseries''' <br/> | ||
+ | || | ||
+ | * Timeseries line chart of Average concentration across 5 stations. | ||
+ | * map of location of 5 stations as well as it's average concentration across years (2013-2017) | ||
+ | * Dataset: EEA data | ||
+ | || | ||
+ | [[File:Capture11.png|120px|center]] | ||
+ | |- | ||
+ | | <center>'''Official Air Quality Heatmap''' <br/> | ||
+ | || | ||
+ | * Timeseries heatmap of: | ||
+ | # Concentration by weekdays across years (2013-2018) | ||
+ | # Concentration in 24 hours across years month animation (2018) | ||
+ | # Concentration in 24 hours across years overview (2018) | ||
+ | * Dataset: EEA data | ||
+ | || | ||
+ | [[File:Capture12.png|120px|center]] | ||
+ | |- | ||
+ | | <center>'''Sensor Distribution Map''' <br/> | ||
+ | || | ||
+ | * Timeseries line chart of Average concentration across 5 stations. | ||
+ | * map of location of 5 stations as well as it's average concentration across years (2013-2017) | ||
+ | * Dataset: EEA data | ||
+ | || | ||
+ | [[File:Capture13.png|120px|center]] | ||
+ | |- | ||
+ | | <center>'''Factors Analysis''' <br/> | ||
+ | || | ||
+ | * Timeseries line chart of Average concentration across 5 stations. | ||
+ | * map of location of 5 stations as well as it's average concentration across years (2013-2017) | ||
+ | * Dataset: EEA data | ||
+ | || | ||
+ | [[File:Capture14.png|120px|thumb|center]] | ||
+ | |} | ||
+ | |||
+ | === Official Air Quality === | ||
+ | <br> | ||
+ | |||
+ | ==== Official Air Quality Timeseries 2013-2018 ==== | ||
+ | [[File:Cap15.png|800px|center]]<br> | ||
+ | [[File:Cap16.png|500px|center]] | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 20%;" | Feature | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 70%" | Description | ||
+ | |- | ||
+ | | <center>Flat layout of concentration across years<br/> | ||
+ | || | ||
+ | The 6 line charts put in parallel allows user to compare the data between years more easily. | ||
+ | |- | ||
+ | | <center>Highlight across 6 line charts<br/> | ||
+ | || | ||
+ | All lines of same station will be highlighted when click on one line. This provide user with more intuitive visualization of change across years within same location. | ||
+ | |- | ||
+ | | <center>Animation by month on map of stations <br/> | ||
+ | || | ||
+ | The orange color dots represents the location of stations, and the darker the color is, the higher concentration the location is.<br> | ||
+ | The animation allows user to see the change of stations locations and concentration value across time in different years. This gives user an intuitive overview of the air quality change in recent years. | ||
+ | |}<br> | ||
+ | <br> | ||
+ | |||
+ | ==== Official Air Quality heatmap 2013-2018 ==== | ||
+ | [[File:Cap17.png|800px|center]] | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 20%;" | Feature | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 70%" | Description | ||
+ | |- | ||
+ | | <center>Weekdays breakdown of concentration<br/> | ||
+ | || | ||
+ | Considering weekdays as one of the potential influencing factors on air pollution, I drawed a headmap of concentration distribution by weekdays across years (2013-2017) to find our the partterns. | ||
+ | |- | ||
+ | | <center>Hourly breakdown of cencentration<br/> | ||
+ | || | ||
+ | As part of incluencing factors, I drew the concentration timeseries breakdown by 24 hours. | ||
+ | |- | ||
+ | | <center>heatmap animation by month<br/> | ||
+ | || | ||
+ | Heatmap animaiton show the data transition between months more clearly | ||
+ | |} | ||
+ | Trends: | ||
+ | # Concentration is higher during night and morning time, lower at noon. This may be due to the traffic pollution during peak hours. | ||
+ | # The concentration is almost identical across different weekdays. This means weekdays is not a huge influencing factor on air quality. | ||
+ | # Generally speaking, Concentration is becoming lower from year of 2013 to today. This means the air pollution problem is being solved. | ||
+ | <br> | ||
+ | |||
+ | ====Anomalies ==== | ||
+ | * Some of the data missing such as 2017 Jan to Oct. Might because of breakdown of devices<br> | ||
+ | * The concentration in January and February is significantly higher than other months in each year. | ||
+ | |||
+ | ===Citizen Science Air Quality=== | ||
+ | |||
+ | ====Sensor geographical Distribution==== | ||
+ | * Across Cities: Sensors are evenly spread out across cities except Sofia city and Polvdiv, which has more condensed sensor distribution. <br> | ||
+ | |||
+ | [[File:Cap18.png|400px|center]] | ||
+ | * Inside Sofia City: The sensors are more condensed in the middle of city. It is not evenly distributed in the entire city. | ||
+ | [[File:Cap19.png|400px|center]] | ||
+ | <br> | ||
+ | |||
+ | ====Sensing data statistics==== | ||
+ | [[File:Cap20.png|800px|center]] | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 20%;" | Feature | ||
+ | ! style="font-weight: bold;background: #84def4;color:#fbfcfd;width: 70%" | Description | ||
+ | |- | ||
+ | | <center>animation on weekly distribution of PM2.5 & PM10 value on map<br/> | ||
+ | || | ||
+ | The animations shows intuitively the data distribution across years | ||
+ | |- | ||
+ | | <center>Boxplot of PM2.5 & PM10 value across weeks<br/> | ||
+ | || | ||
+ | Box plot shows not only the average, but also the spread of value, which helps user to identify the outliers and understand data distribution better. | ||
+ | |} | ||
+ | Observation: | ||
+ | # Week 1, 2 and 4 (1st, 2nd and 4th week of January 2018) has very high concentration, whereas the concentration in week 2 and the rest weeks are low. | ||
+ | # The high concentration points are located at the north part of Sofia city | ||
+ | # In boxplot week5, there is a high outlier of concentration. This means something abnormal happened in that week. | ||
+ | |||
+ | ====Factors influencing air quality==== | ||
+ | * Scatter plot | ||
+ | [[File:Cap21.png|1000px|center]] | ||
+ | We can deduce the following information from the scatter plot: | ||
+ | #PM2.5 and PM10 are higher when pressure is around 1000k and 170k | ||
+ | #PM2.5 and PM10 are positive correlated. it’s possible to have high PM10 and low PM2.5, but when PM10 is low, PM2.5 must below | ||
+ | #PM2.5 and PM10 are higher when temperature is around 0 slowly reduce as temperature increase. It reaches near to 0 when temperature is close to 40 | ||
+ | #The upper bond of PM2.5 & PM10 increases as humidity increase. But there is no linear relationship between humidity and PM2.5&10.<br> | ||
+ | <br> | ||
+ | |||
+ | =Reference= | ||
+ | [1] https://zerowasteeurope.eu/2018/01/bulgaria-air-pollution/<br> | ||
+ | [2] https://en.wikipedia.org/wiki/Particulates | ||
+ | <br> | ||
+ | |||
+ | =Comments= | ||
+ | Feel free to leave your comments here |
Latest revision as of 22:58, 11 November 2018
Contents
Problem & Motivation
As one of the most polluted countries in Europe, Bulgaria is facing a high level of pollution. It is ranked eighth in the European Environment Agency’s 2017 report on air quality in Europe in terms of most premature deaths caused by PM2.5[1]. Our goal in this report is to find the pollution condition in the capital of Bulgaria– Sofia city.
In this report, I will use 3 Air Quality indicators—Official Air Quality Concentration, PM2.5 and PM10 to analyze the air quality in Sofia city. PM2.5 is a pollutant stemming from fuel combustion, heating, transportation, waste incineration, agriculture and other anthropogenic sources. According to studies, it is highly correlated with cancer rate, with a 36% increase in lung cancer per 10 μg/m3 as it can penetrate deeper into the lungs[2]. Worldwide exposure to PM2.5 contributed to 4.1 million deaths from heart disease and stroke, lung cancer, chronic lung disease, and respiratory infections. PM10 is particulate matter 10 micrometers or less in diameter, which has slight larger dimension than PM2.5 but same level of danger.
By using the above-mentioned indicators, the goal is to visualize the air quality situation as well as the factors affecting the air quality. The factors such as meteorology, topography and human behavior will be analyzed in this report.
Data Analysis & Transformation
Data Analysis and Cleaning
Air Tube data
After doing a quick plotting of the raw data, I found that there are outliers (or misreading) in the Air Tube Dataset:
Temperature: remove temperature less than -20 and larger than 50
Column Name | Action | Screenshot |
---|---|---|
Transform geohash to latitude and longtitude | ||
remove temperature less than -20 and larger than 50 | ||
remove pressures valued 0 |
Aggregation
EEA data
- Aggregate data from different years (2013-2018) into one sheet with “day” details:
Steps
- Aggregate average concentration by day in each data sheets from 2013-2018
- Combine sheets of different stations and years into 1 (2013-2018)
- Aggregate 2018 dataset into one sheet with “hour” details:
Steps
- Transform 2018 data time to local time & take average time
* As I explored the dataset, I found the datetime format is UCT. I changed it to local time. * The time interval between "DatetimeBegin" column and "Datetime End" is always 1 hour, therefore I decided to take the middle time--0.5 hour late than "DatetimeBegin" as the time of the record.
- Aggregate 5 sheets
Visualization
Tableau Link: unable to upload to tableau online because the data amount exceeds the limit.
Home Page
Button | Description | icon |
---|---|---|
|
||
|
||
|
||
|
Official Air Quality
Official Air Quality Timeseries 2013-2018
Feature | Description |
---|---|
The 6 line charts put in parallel allows user to compare the data between years more easily. | |
All lines of same station will be highlighted when click on one line. This provide user with more intuitive visualization of change across years within same location. | |
The orange color dots represents the location of stations, and the darker the color is, the higher concentration the location is. |
Official Air Quality heatmap 2013-2018
Feature | Description |
---|---|
Considering weekdays as one of the potential influencing factors on air pollution, I drawed a headmap of concentration distribution by weekdays across years (2013-2017) to find our the partterns. | |
As part of incluencing factors, I drew the concentration timeseries breakdown by 24 hours. | |
Heatmap animaiton show the data transition between months more clearly |
Trends:
- Concentration is higher during night and morning time, lower at noon. This may be due to the traffic pollution during peak hours.
- The concentration is almost identical across different weekdays. This means weekdays is not a huge influencing factor on air quality.
- Generally speaking, Concentration is becoming lower from year of 2013 to today. This means the air pollution problem is being solved.
Anomalies
- Some of the data missing such as 2017 Jan to Oct. Might because of breakdown of devices
- The concentration in January and February is significantly higher than other months in each year.
Citizen Science Air Quality
Sensor geographical Distribution
- Across Cities: Sensors are evenly spread out across cities except Sofia city and Polvdiv, which has more condensed sensor distribution.
- Inside Sofia City: The sensors are more condensed in the middle of city. It is not evenly distributed in the entire city.
Sensing data statistics
Feature | Description |
---|---|
The animations shows intuitively the data distribution across years | |
Box plot shows not only the average, but also the spread of value, which helps user to identify the outliers and understand data distribution better. |
Observation:
- Week 1, 2 and 4 (1st, 2nd and 4th week of January 2018) has very high concentration, whereas the concentration in week 2 and the rest weeks are low.
- The high concentration points are located at the north part of Sofia city
- In boxplot week5, there is a high outlier of concentration. This means something abnormal happened in that week.
Factors influencing air quality
- Scatter plot
We can deduce the following information from the scatter plot:
- PM2.5 and PM10 are higher when pressure is around 1000k and 170k
- PM2.5 and PM10 are positive correlated. it’s possible to have high PM10 and low PM2.5, but when PM10 is low, PM2.5 must below
- PM2.5 and PM10 are higher when temperature is around 0 slowly reduce as temperature increase. It reaches near to 0 when temperature is close to 40
- The upper bond of PM2.5 & PM10 increases as humidity increase. But there is no linear relationship between humidity and PM2.5&10.
Reference
[1] https://zerowasteeurope.eu/2018/01/bulgaria-air-pollution/
[2] https://en.wikipedia.org/wiki/Particulates
Comments
Feel free to leave your comments here