Difference between revisions of "ISSS608 Assign Pu Yiran-Data Preparation"
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<div style=background:#708090 border:#778899> | <div style=background:#708090 border:#778899> | ||
[[File:Pollution-1.jpg|250px]] | [[File:Pollution-1.jpg|250px]] | ||
− | <font size = 6; color="#FFFFFF"> Unmask Air Pollution in Sofia City</font> | + | <font size = 6; color="#FFFFFF"><b> Unmask Air Pollution in Sofia City</b></font> |
</div> | </div> | ||
<!--MAIN HEADER --> | <!--MAIN HEADER --> | ||
Line 20: | Line 20: | ||
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#708090; text-align:center;" width="20%" | | | style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#708090; text-align:center;" width="20%" | | ||
; | ; | ||
− | [[ISSS608 | + | [[ISSS608 Assign Pu Yiran-Task 2 | <font size=3; color="#FFFFFF"><b>Task 2</b></font>]] |
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#708090; text-align:center;" width="20%" | | | style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#708090; text-align:center;" width="20%" | | ||
; | ; | ||
− | [[ISSS608 | + | [[ISSS608 Assign Pu Yiran-Task 3 | <font size=3; color="#FFFFFF"><b>Task 3</b></font>]] |
| | | | ||
|} | |} | ||
+ | |||
+ | =<font face="Book Antigua"; size=5>'''Datasets Overview'''</font>= | ||
+ | ===<font face="Book Antigua"; size=4>Official Air Quality Datasets (EEA)</font>=== | ||
+ | <font face="Modern", size=2> | ||
+ | In urban area of Sofia city, there are 6 air quality monitoring stations, named as Nadezhda (BG0040A), Hipodruma (BG0050A), Druzhba (BG0052A), Orlov Most (BG0054A), IAOS/Pavlovo (BG0073A) and Mladost(BG0079A), monitoring concentration of air pollutant PM10. | ||
+ | <br> | ||
+ | <br> | ||
+ | The dataset consists of concentration of PM10 measured daily/hourly in 6 years (2013-2018), by 6 stations. Additional information about each monitoring station, such as geo-location is also provided. </font> | ||
+ | <br> | ||
+ | [[File:Dataprep 001.png|900px]] | ||
+ | <br> | ||
+ | <br> | ||
+ | [[File:Dataprep 002.png|400px|right]] | ||
+ | ===<font face="Book Antigua"; size=4>Citizen Science Air Quality Data (AirTube)</font>=== | ||
+ | <font face="Modern", size=2> | ||
+ | In this dataset, concentration of pollutant PM10 and PM2.5, humidity, pressure and temperature have been measured hourly from 538 sensors located across Bulgaria in 2017 and 2018. Geo-location is encoded in geohash format. | ||
+ | PM10 and PM2.5 are named as P1 and P2 in given data. | ||
+ | </font> | ||
+ | <br> | ||
+ | <br> | ||
+ | ===<font face="Book Antigua"; size=4>Meteorological Measurements and Topographic Data</font>=== | ||
+ | <font face="Modern", size=2> | ||
+ | * Meteorological measurements have been monitored daily at Sofia Airport (longitude= 42.6537, latitude= 23.3829 and altitude= 595 metres) from 2012 to 2018. Meteorological measurements include temperature, humidity, wind speed, surface pressure, precipitation volume and visibility. | ||
+ | * Longitude, latitude and elevation of 196 geo-point in Sofia capital city are given in topographic data. | ||
+ | </font> | ||
+ | |||
+ | =<font face="Book Antigua"; size=5>'''Data Preparation'''</font>= | ||
+ | ==<font face="Book Antigua", size=4><b>EEA Dataset-</b>Discover and fix inconsistent time interval of measurement</font>== | ||
+ | [[File:Dataprep001.png|500px|right]] | ||
+ | <font face="Modern", size=2>For year 2013 and 2014, concentration of PM10 is given as daily average. However, in 2015,2016 and 2017, data of certain days is given in hourly average. In 2018, data is all given in hourly average. | ||
+ | <br> | ||
+ | <br> | ||
+ | To analyse the past the most recent patterns of PM10 in Sofia city, data of 2013-2017 has been aggregated to daily average and merged in R. Data of 2018 remained hourly average.</font> | ||
+ | <br> | ||
+ | [[File:Dataprep 003.png|500px]] | ||
+ | <br> | ||
+ | ==<font face="Book Antigua", size=4><b>EEA Dataset-</b>Group 24 hours into 4 time periods</font>== | ||
+ | <font face="Modern", size=2>As hourly data of 2018 is given, we will be able to find out how air quality is changing during one day, to make which more realistic, 24 hours of a day are split into 4 time periods—before dawn, morning, afternoon and evening, by using below formula.</font> | ||
+ | [[File:Dataprep002.png|750px]] | ||
+ | |||
+ | ==<font face="Book Antigua", size=4><b>Airtube Dataset-</b>Decode geohash into Long-Lat format</font>== | ||
+ | <font face="Modern", size=2> | ||
+ | Since Tableau is not able to identify geohash format, we need to decode geohash into corresponding longitude and latitude in R. The package used for decoding geohash is ‘geohash’. | ||
+ | <br> | ||
+ | [[File:Dataprep 004.PNG|1000px]] | ||
+ | </font> |
Latest revision as of 23:57, 16 November 2018
|
Data Preparation |
|
|
|
Contents
Datasets Overview
Official Air Quality Datasets (EEA)
In urban area of Sofia city, there are 6 air quality monitoring stations, named as Nadezhda (BG0040A), Hipodruma (BG0050A), Druzhba (BG0052A), Orlov Most (BG0054A), IAOS/Pavlovo (BG0073A) and Mladost(BG0079A), monitoring concentration of air pollutant PM10.
The dataset consists of concentration of PM10 measured daily/hourly in 6 years (2013-2018), by 6 stations. Additional information about each monitoring station, such as geo-location is also provided.
Citizen Science Air Quality Data (AirTube)
In this dataset, concentration of pollutant PM10 and PM2.5, humidity, pressure and temperature have been measured hourly from 538 sensors located across Bulgaria in 2017 and 2018. Geo-location is encoded in geohash format.
PM10 and PM2.5 are named as P1 and P2 in given data.
Meteorological Measurements and Topographic Data
- Meteorological measurements have been monitored daily at Sofia Airport (longitude= 42.6537, latitude= 23.3829 and altitude= 595 metres) from 2012 to 2018. Meteorological measurements include temperature, humidity, wind speed, surface pressure, precipitation volume and visibility.
- Longitude, latitude and elevation of 196 geo-point in Sofia capital city are given in topographic data.
Data Preparation
EEA Dataset-Discover and fix inconsistent time interval of measurement
For year 2013 and 2014, concentration of PM10 is given as daily average. However, in 2015,2016 and 2017, data of certain days is given in hourly average. In 2018, data is all given in hourly average.
To analyse the past the most recent patterns of PM10 in Sofia city, data of 2013-2017 has been aggregated to daily average and merged in R. Data of 2018 remained hourly average.
EEA Dataset-Group 24 hours into 4 time periods
As hourly data of 2018 is given, we will be able to find out how air quality is changing during one day, to make which more realistic, 24 hours of a day are split into 4 time periods—before dawn, morning, afternoon and evening, by using below formula.
Airtube Dataset-Decode geohash into Long-Lat format
Since Tableau is not able to identify geohash format, we need to decode geohash into corresponding longitude and latitude in R. The package used for decoding geohash is ‘geohash’.