Difference between revisions of "ISSS608 2018-19 T1 Assign Hou Xuelin"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(15 intermediate revisions by the same user not shown)
Line 8: Line 8:
 
Druzhba is improving its air condition in recent two years, but Nadezhda showed uplift in 2017. <br>
 
Druzhba is improving its air condition in recent two years, but Nadezhda showed uplift in 2017. <br>
 
PM10 concentration in the rest of areas are gradually declining.<br>
 
PM10 concentration in the rest of areas are gradually declining.<br>
 +
A typical PM10 trend within a day remains average around 30, and declines to around 20 between 10am - 5pm, when most of people are out for working.
  
[[Image:Task1-4.png|600px]]
+
[[Image:Official-1.jpg|1000px]]
  
 
The PM10 trend in Sofia is highly periodic and the peaks are always fall on winters (Jan/Dec).<br>
 
The PM10 trend in Sofia is highly periodic and the peaks are always fall on winters (Jan/Dec).<br>
 
This may be due to domestic heating in winters.
 
This may be due to domestic heating in winters.
  
[[Image:Task1-1.png|600px]]
+
[[Image:Task1-1.png|500px]][[Image:Task1-2.png|500px]]
  
[[Image:Task1-2.png|600px]]
+
=== Anomalies of Official Data ===
 +
* Only Nov/Dec data is recorded in 2017 and the rest months data is all missing.This may not be representative for 2017 annual data.
 +
* the sampling frequency `AveagingTime` is inconsistent throughout the data, it ranges from day, hour and var. This may introduce some bias, when we aggregate the data.
  
A typical PM10 trend within day remains average around 30, and declines to around 20 between 10am - 5pm, when most of people are out for working.
+
== Task2 Exploration of Citizen Science Data ==
  
[[Image:Task1-3.png|600px]]
+
=== Sensor Data Quality ===
 +
==== Coverage ====
 +
The highest density of sensor is around the urban area around the capital of the city. <br>
 +
Sensors densely covers the southern of Sofia city, while left with a low coverage on the north of the city. <br>
  
=== anomalies of official data ===
+
[[Image:Censor-coverage-chart.jpg|600px|Number of Sensors Around Sofia City]]
What anomalies do you find in the official air quality dataset? How do these affect your analysis of potential problems to the environment?
 
  
== Task2 Exploration of Sensor Data ==
+
==== Operation ====
 +
There in total 1265 sensors deployed around sofia from Sep 2017 to Sep 2018.<br>
 +
The average working sensors is 453.2, and median of working sensors is 513.<br>
  
=== exploratory of sensor data ===
+
[[Image:Calendar-chart.jpg|600px]]
Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city? Are they all working properly at all times?
 
  
=== anomalies of sensor data ===
+
==== Performance ====
Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 4 images and 600 words.
+
The measurement of sensors are not consistently reliable. Because some abnormal measurement are observed:
  
=== air pollution correlation ===
+
* P1, P2 value are capped at 2000, 1000, which may be the maximum the sensor can measured or measurement error. This is not sure from the data.
Now turn your attention to the air pollution measurements themselves. Which part of the city shows relatively higher readings than others? Are these differences time dependent? Limit your response to no more than 6 images and 800 words.
+
* pressure should be ranged from 90000 to 100000 hPs. Negative value is observed from data.
 +
* temperature should be ranged from -10 to 50 degree Celsus. Extreme value, (e.g. -5573, 435) is unreasonable.
 +
* humidity is an percentage, which should be ranged from 0 to 100. Anomalies are also observed, such as -999 and 898.
  
 +
{| class="wikitable"
 +
! quantile
 +
! P1
 +
! P2
 +
! pressure
 +
! temperature
 +
! humidity
 +
|-
 +
| 0%
 +
| 0
 +
| 0
 +
| -20148
 +
| -5573
 +
| -999
 +
|-
 +
| 10%
 +
| 4
 +
| 2
 +
| 0
 +
| 0
 +
| 28
 +
|-
 +
| 20%
 +
| 7
 +
| 4
 +
| 93178
 +
| 4
 +
| 40
 +
|-
 +
| 30%
 +
| 9
 +
| 6
 +
| 94075
 +
| 8
 +
| 49
 +
|-
 +
| 40%
 +
| 11
 +
| 7
 +
| 94552
 +
| 12
 +
| 57
 +
|-
 +
| 50%
 +
| 14
 +
| 9
 +
| 94936
 +
| 15
 +
| 63
 +
|-
 +
| 60%
 +
| 18
 +
| 11
 +
| 95360
 +
| 18
 +
| 69
 +
|-
 +
| 70%
 +
| 23
 +
| 15
 +
| 96242
 +
| 21
 +
| 74
 +
|-
 +
| 80%
 +
| 34
 +
| 20
 +
| 99027
 +
| 24
 +
| 80
 +
|-
 +
| 90%
 +
| 62
 +
| 33
 +
| 100140
 +
| 27
 +
| 88
 +
|-
 +
| 100%
 +
| 2000
 +
| 1000
 +
| 254165
 +
| 435
 +
| 898
 +
|}
 +
 +
I also noticed that, not all types of measurement are available for all sensors. Therefore, I divided the sensors into 3 types:
 +
 +
* only measure particle concentrations (P1, P2) => particle-measurement only
 +
* only measure Temperature, Pressure & Humidity => TPH-measurement only
 +
* measure all 5 indexes => All-measurement
 +
 +
[[Image:Censor-type-chart.jpg|600px]]
 +
 +
=== Air Quality Condition ===
 +
 +
According to [https://airtube.info/stats.php?country=BG&city=Sofia AirTube], P1, P2 reading are referring to PM10 and PM2.5 in µg/m³. <br>
 +
I calculated the cutoff of PM10 for different band, based on [https://www.airnow.gov/index.cfm?action=airnow.calculator AQI calculator] & [https://www.airnow.gov/index.cfm?action=aqibasics.aqi AQI definition].
 +
 +
{| class="wikitable"
 +
! style="font-weight:bold; background-color:#ecf4ff; color:#3166ff;" | PM10
 +
! style="font-weight:bold; background-color:#ecf4ff; color:#3166ff;" | AQI
 +
! style="font-weight:bold; background-color:#ecf4ff; color:#3166ff;" | Label
 +
|-
 +
| style="background-color:#009901; color:#000000;" | 0 to 54
 +
| style="background-color:#009901; color:#000000;" | 0 to 50
 +
| style="background-color:#009901; color:#000000;" | Good
 +
|-
 +
| style="background-color:#f8ff00;" | 55 to 154
 +
| style="background-color:#f8ff00;" | 51 to 100
 +
| style="background-color:#f8ff00;" | Moderate
 +
|-
 +
| style="background-color:#ffcb2f; color:#ffffff;" | 155 to 254
 +
| style="background-color:#ffcb2f; color:#ffffff;" | 101 to 150
 +
| style="background-color:#ffcb2f; color:#ffffff;" | Unhealthy for Sensitive Groups
 +
|-
 +
| style="background-color:#cb0000; color:#ffffff;" | 255 to 354
 +
| style="background-color:#cb0000; color:#ffffff;" | 151 to 200
 +
| style="background-color:#cb0000; color:#ffffff;" | Unhealthy
 +
|-
 +
| style="background-color:#340096; color:#ffffff;" | 355 to 425
 +
| style="background-color:#340096; color:#ffffff;" | 201 to 300
 +
| style="background-color:#340096; color:#ffffff;" | Very Unhealthy
 +
|-
 +
| style="background-color:#680100; color:#ffffff;" | > 425
 +
| style="background-color:#680100; color:#ffffff;" | 301 to 500
 +
| style="background-color:#680100; color:#ffffff;" | Hazardous
 +
|}
 +
 +
==== Seasonal Effect ====
 +
 +
The air condition in summer is generally much better than winter, which should be contributed by the domestic heating by the residents in winters.<br>
 +
In summer, there are only very few locations on the southwest and south areas are suffering in poor air condition, but this might be due to measurement error, because the measurement of adjacent sensors are fairly good. <br>
 +
However, in winter, the entire city air condition is worse and southwest and south part are even worse.
 +
 +
[[Image:Censor-season-chart.jpg|1000px]]
 +
 +
==== Hourly Effect ====
 +
 +
The air condition in daytime (7am - 4pm) is better than in nighttime(7pm - 6am). <br>
 +
A strong hypothesis is pointed to emission from daily commuting to work/home and domestic heating in the evening.
 +
 +
[[Image:Censor-hour-chart.jpg|600px]]
  
 
== Task3 Factors Affect Sofia Air Pollution ==
 
== Task3 Factors Affect Sofia Air Pollution ==
 +
I explored the correlation between PM10 concentration (P1) and climate indicators (Temperature, Pressure & Humidity).<br>
 +
There slight positive correlation of P1 and humidity and slight negative correlation between temperature and P1. <br>
 +
Again, the assumption is still point to domestic heating, the usage of which is extensively higher when the temperature is low. <br>
 +
[[Image:Task3-correlation.jpg|400px]]
  
Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:
+
Eventually, I confirm the relationship with trend chart. Notice that the P1 concentration spikes when temperature is low. And when humidity remains high level, P1 spikes. <br>
 +
[[Image:Task3-trend.jpg|800px]]
  
Local energy sources. For example, according to Unmask My City, a global initiative by doctors, nurses, public health practitioners, and allied health professionals dedicated to improving air quality and reducing emissions in our cities, Bulgaria’s main sources of PM10, and fine particle pollution PM2.5 (particles 2.5 microns or smaller) are household burning of fossil fuels or biomass, and transport.
+
== Interactive Web Visualization ==
Local meteorology such as temperature, pressure, rainfall, humidity, wind etc
+
Link of [https://public.tableau.com/views/task1_62/ExploratoryAnalysisofAirPollutioninSofia?:embed=y&:display_count=yes tableau visualization]<br>
Local topography
+
[[Image:External-vis-link.jpg|800px]]
Complex interactions between local topography and meteorological characteristics.
 
Transboundary pollution for example the haze that intruded into Singapore from our neighbours.
 
In this third task, you are required to reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2. Limit your response to no more than 5 images and 600 words.
 
  
 
== Data Source ==
 
== Data Source ==
  
 +
Four major data sets in zipped file format are provided for this assignment, they are:
  
 +
* Official air quality measurements (5 stations in the city)(EEA Data.zip) – as per EU guidelines on air quality monitoring see the data description [https://drive.google.com/file/d/1v5yCL-LdriDwa65qXPbFL7b0tydylDlb/view HERE…]
 +
* Citizen science air quality measurements (Air Tube.zip) , incl. temperature, humidity and pressure (many stations) and topography (gridded data).
 +
* Meteorological measurements (1 station)(METEO-data.zip): Temperature; Humidity; Wind speed; Pressure; Rainfall; Visibility
 +
* Topography data (TOPO-DATA)
  
== Methodology ==
+
They can be download by click on this [https://storage.cloud.google.com/global-datathon-2018/sofia-air/air-sofia.zip link].
 
 
 
 
  
 
== Application Libraries & Packages ==
 
== Application Libraries & Packages ==
{|class="wikitable"  
+
{| class="wikitable"
 +
! style="font-weight:bold; background-color:#c0c0c0;" | Package Name
 +
! style="font-weight:bold; background-color:#c0c0c0;" | Descriptions
 +
|-
 +
| style="font-style:italic; font-size:12px;" | xlsx
 +
| style="font-size:12px;" | R package for Excel file manipulation.
 +
|-
 +
| style="font-style:italic; font-size:12px;" | dplyr
 +
| style="font-size:12px;" | dataframe general manipulation
 
|-
 
|-
! Package Name !! Descriptions
+
| style="font-style:italic; font-size:12px;" | ggplot2
 +
| style="font-size:12px;" | general charting
 
|-
 
|-
| ''xlsx''  || R package for Excel file manipulation.
+
| style="font-style:italic; font-size:12px;" | leaflet
 +
| style="font-size:12px;" | geo-spatial chart
 
|-
 
|-
 +
| style="font-style:italic; font-size:12px;" | googleVis
 +
| style="font-size:12px;" | calendar chart
 
|}
 
|}
  
 
== References ==
 
== References ==
 +
# [https://airtube.info/stats.php?country=BG&city=Sofia AirTube Official Website]
 +
# [https://www.airnow.gov/index.cfm?action=airnow.calculator AQI calculator]
 +
# [https://www.airnow.gov/index.cfm?action=aqibasics.aqi Air Now - AQI definition]
 +
# [https://rstudio.github.io/leaflet/ Leaflet for R Github]
 +
# [https://cran.r-project.org/web/packages/googleVis/vignettes/googleVis_examples.html googleVis for R examples]
 +
# [http://bnr.bg/en/post/101017639/european-audit-office-sofia-has-no-plan-for-solving-air-pollution-from-heating European Court of Auditors: Sofia has no plan for solving air pollution from domestic heating]

Latest revision as of 15:22, 17 November 2018


Xuelin banner.jpg

Task1 Exploration of Official Data

Situation of Air Quality

The annual average PM10 concentration is around 45 from 2013 to 2018.
Druzhba is improving its air condition in recent two years, but Nadezhda showed uplift in 2017.
PM10 concentration in the rest of areas are gradually declining.
A typical PM10 trend within a day remains average around 30, and declines to around 20 between 10am - 5pm, when most of people are out for working.

Official-1.jpg

The PM10 trend in Sofia is highly periodic and the peaks are always fall on winters (Jan/Dec).
This may be due to domestic heating in winters.

Task1-1.pngTask1-2.png

Anomalies of Official Data

  • Only Nov/Dec data is recorded in 2017 and the rest months data is all missing.This may not be representative for 2017 annual data.
  • the sampling frequency `AveagingTime` is inconsistent throughout the data, it ranges from day, hour and var. This may introduce some bias, when we aggregate the data.

Task2 Exploration of Citizen Science Data

Sensor Data Quality

Coverage

The highest density of sensor is around the urban area around the capital of the city.
Sensors densely covers the southern of Sofia city, while left with a low coverage on the north of the city.

Number of Sensors Around Sofia City

Operation

There in total 1265 sensors deployed around sofia from Sep 2017 to Sep 2018.
The average working sensors is 453.2, and median of working sensors is 513.

Calendar-chart.jpg

Performance

The measurement of sensors are not consistently reliable. Because some abnormal measurement are observed:

  • P1, P2 value are capped at 2000, 1000, which may be the maximum the sensor can measured or measurement error. This is not sure from the data.
  • pressure should be ranged from 90000 to 100000 hPs. Negative value is observed from data.
  • temperature should be ranged from -10 to 50 degree Celsus. Extreme value, (e.g. -5573, 435) is unreasonable.
  • humidity is an percentage, which should be ranged from 0 to 100. Anomalies are also observed, such as -999 and 898.
quantile P1 P2 pressure temperature humidity
0% 0 0 -20148 -5573 -999
10% 4 2 0 0 28
20% 7 4 93178 4 40
30% 9 6 94075 8 49
40% 11 7 94552 12 57
50% 14 9 94936 15 63
60% 18 11 95360 18 69
70% 23 15 96242 21 74
80% 34 20 99027 24 80
90% 62 33 100140 27 88
100% 2000 1000 254165 435 898

I also noticed that, not all types of measurement are available for all sensors. Therefore, I divided the sensors into 3 types:

  • only measure particle concentrations (P1, P2) => particle-measurement only
  • only measure Temperature, Pressure & Humidity => TPH-measurement only
  • measure all 5 indexes => All-measurement

Censor-type-chart.jpg

Air Quality Condition

According to AirTube, P1, P2 reading are referring to PM10 and PM2.5 in µg/m³.
I calculated the cutoff of PM10 for different band, based on AQI calculator & AQI definition.

PM10 AQI Label
0 to 54 0 to 50 Good
55 to 154 51 to 100 Moderate
155 to 254 101 to 150 Unhealthy for Sensitive Groups
255 to 354 151 to 200 Unhealthy
355 to 425 201 to 300 Very Unhealthy
> 425 301 to 500 Hazardous

Seasonal Effect

The air condition in summer is generally much better than winter, which should be contributed by the domestic heating by the residents in winters.
In summer, there are only very few locations on the southwest and south areas are suffering in poor air condition, but this might be due to measurement error, because the measurement of adjacent sensors are fairly good.
However, in winter, the entire city air condition is worse and southwest and south part are even worse.

Censor-season-chart.jpg

Hourly Effect

The air condition in daytime (7am - 4pm) is better than in nighttime(7pm - 6am).
A strong hypothesis is pointed to emission from daily commuting to work/home and domestic heating in the evening.

Censor-hour-chart.jpg

Task3 Factors Affect Sofia Air Pollution

I explored the correlation between PM10 concentration (P1) and climate indicators (Temperature, Pressure & Humidity).
There slight positive correlation of P1 and humidity and slight negative correlation between temperature and P1.
Again, the assumption is still point to domestic heating, the usage of which is extensively higher when the temperature is low.
Task3-correlation.jpg

Eventually, I confirm the relationship with trend chart. Notice that the P1 concentration spikes when temperature is low. And when humidity remains high level, P1 spikes.
Task3-trend.jpg

Interactive Web Visualization

Link of tableau visualization
External-vis-link.jpg

Data Source

Four major data sets in zipped file format are provided for this assignment, they are:

  • Official air quality measurements (5 stations in the city)(EEA Data.zip) – as per EU guidelines on air quality monitoring see the data description HERE…
  • Citizen science air quality measurements (Air Tube.zip) , incl. temperature, humidity and pressure (many stations) and topography (gridded data).
  • Meteorological measurements (1 station)(METEO-data.zip): Temperature; Humidity; Wind speed; Pressure; Rainfall; Visibility
  • Topography data (TOPO-DATA)

They can be download by click on this link.

Application Libraries & Packages

Package Name Descriptions
xlsx R package for Excel file manipulation.
dplyr dataframe general manipulation
ggplot2 general charting
leaflet geo-spatial chart
googleVis calendar chart

References

  1. AirTube Official Website
  2. AQI calculator
  3. Air Now - AQI definition
  4. Leaflet for R Github
  5. googleVis for R examples
  6. European Court of Auditors: Sofia has no plan for solving air pollution from domestic heating