Difference between revisions of "IS428 AY2018-19T1 Zhuo Yunying"
Yyzhuo.2016 (talk | contribs) |
Yyzhuo.2016 (talk | contribs) |
||
(20 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
<p>Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer. | <p>Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer. | ||
− | Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health. Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10). </p> | + | Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health. Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3). According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10). </p> |
<p>This assignment aims to study the following: </p> | <p>This assignment aims to study the following: </p> | ||
Line 10: | Line 10: | ||
== Dataset Analysis & Transformation Process == | == Dataset Analysis & Transformation Process == | ||
− | |||
− | |||
− | |||
===Task 1: Spatio-temporal Analysis of Official Air Quality=== | ===Task 1: Spatio-temporal Analysis of Official Air Quality=== | ||
===1. Combine all time-series data (e.g. BG_5_9572_2017_timeseries) into one single excel spreadsheet=== | ===1. Combine all time-series data (e.g. BG_5_9572_2017_timeseries) into one single excel spreadsheet=== | ||
− | + | *Create a new "Station" Column (indicating station code i.e. 9421) and "Year" Column based on the year for each of the time-series data | |
− | + | *Use excel to combine the rest of time-series files based on all the common columns (e.g. Countrycode, Namespace, AirQualityNetwork and etc.) | |
− | + | *Based on the analysis of the existing combined time-series data from 2013 to 2018, there is a drastic difference in the level of aggregation across the years. As shown in the table below, Year 2016 has a combination of hourly air quality readings and daily air quality readings while in 2017, there is a mixture of hour and var readings as for certain days, readings are not measured at one-hour interval continuously. As such, the analysis on air quality readings will be based on "Day" averaging time from year 2013 to 2016 as it is impossible to lower the aggregation level of "Day" to "Hour" readings in 2016. On the other hand, the analysis on air quality readings will be based on "Hour" averaging time for 2017 and 2018. For the data in 2017,average readings will be taken if the readings for any specific days are done on "Var" basis. Hourly analysis is also more accurate for both 2017 and 2018 as there are missing data on specific months. (In 2017, only Nov and Dec data are available while in 2018, only Jan to Sep's data are available) | |
− | + | [[File:Task1DataTransformation a.png|300px|center]] | |
− | + | *Due to the standardization in averaging time, the values under "Concentration", "DatetimeBegin" and "DatetimeEnd" have been adjusted accordingly while other column values remain unchanged. | |
− | + | *There are quite a number of duplicated readings in the dataset. These duplicated readings are removed during the transformation process to avoid unequal weightage. | |
− | + | *Due to the high variation in raw data and small dataset, excel is used for the transformation process. | |
− | + | ||
− | + | ===2. Merge metadata file with combined time-series data === | |
− | + | *According to the source of scrapped data (http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm),"the join between time-series files and the metadata file should be made using the Countrycode (or Namespace) and SamplingPoint". Thus, metadata file and time-series data time are merged via Vlookup function in excel based on SamplingPoint (since Countrycode are all "BA" for both data file). | |
− | [[File: | + | *Upon further inspection, these two datasets have a number of common columns with same values. These columns include "Countrycode, Namespace, AirQualityNetwork, AirQualityStation, AirQualityStationEoICode, SamplingProcess, AirPollutantCode, AirPollutant" and thus repeated columns are removed. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | === | + | ===Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements=== |
− | + | *Geohash represents station locations. However, Tableau is not able to interpret geohash as geographic data. Before Air Tube data is imported to Tableau for analysis, geohash needs to be decoded into geographical coordinates. Due to the sheer size in Air Tube datasets(data_bg_2017.xlsx and data_bg_2018.xlsx), R packages ("devtools", "tidyverse" and "geohas") as indicated by Prof Kam will be used to transform the geohashed raw data to a new csv file containing the coordinates of the locations. | |
− | + | *Next, 2017 and 2018 datasets that have been transformed will be combined based on the R programming code below. | |
− | + | [[File:Task2Transformation.png|500px|center]] | |
− | + | *The combined datasets comprises of readings across the whole Bulgaria area which is beyond the scope of our concern (i.e. Sofia). To maintain the consistency in comparison, Inclusion filter function is used to filter in only longitudes and latitudes that are in Sofia city. Due to the difficulty in comparing with the real geographical area of Sofia, an existing map on Sofia City (from Open Street Map) was used to filter through the Tableau filter function. The folowing shows map of Sofia and the filtering function. | |
− | + | [[File:Sofia Map.png|500px|center]] | |
− | + | [[File:Tableau Filter.png|300px|center]] | |
− | * | ||
− | |||
− | |||
− | |||
− | |||
− | * | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | [[File: | ||
− | |||
− | |||
− | |||
− | + | ===Task 3: Find out the relationship of the air quality analysis with other factors=== | |
+ | *The Meteorological data scrapped is currently in crosstab format which is not suitable for analysis on Tableau. Hence, the dataset has to be transformed to columnar format. Hence, pivoting will be done to transform the data. | ||
+ | *As the data includes average, minimum and maximum readings for different types of measurements and are presented on separate columns. The format is not suitable to be processed on Tableau if there is a need to introduce filters in the dashboard. Hence the dataset is transformed to as shown below. | ||
+ | [[File:Task3Transformation.png|600px|center]] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | + | == Analysis == |
=== Task 1: Spatio-temporal Analysis of Official Air Quality === | === Task 1: Spatio-temporal Analysis of Official Air Quality === | ||
{| class="wikitable" style="background-color:#FFFFFF;" width="100%" | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
− | |'''1.1 | + | |'''<big>1.1 Characterize Recent and Past Situation of Air Quality in Sofia City</big> |
''' | ''' | ||
− | |||
− | <p>The above | + | [[File:Task 1 Characterize PastandRecentSituation2013FinalFinal.png|800px|center]] |
− | <p> <b>Trends:</b> | + | [[File:Task 1 Characterize PastandRecentSituation2014FinalFinal.png|800px|center]] |
− | * | + | [[File:Task 1 Characterize PastandRecentSituation2015FinalFinal.png|800px|center]] |
− | * | + | [[File:Task 1 Characterize PastandRecentSituation2016FinalFinal.png|800px|center]] |
+ | |||
+ | <p>The figures above shows the PAST average daily concentration heat map for year 2013 to 2016 as daily readings are not computed for year 2017 and year 2018. The calendar heat map visualization allows users to appreciate the trend in average daily concentration from Jan to Dec at one glance. The calendar map also includes filter by Sampling location, air quality range as well as Year. With reference to EU Air Quality classification based on research, the air quality range includes 0-20:Good, 20-40:Moderate, 40-50:Normal, 50-75:Unhealthy, 75 and above: Very Unhealthy.The heat map’s colour intensity is specified within a range such that dark red represent unhealthy and very unhealthy air quality range and vice versa.</p> | ||
+ | <p> <b>Trends Analysis:</b> | ||
+ | * From year 2013 to year 2016, the air quality exceeds EU limit (50 µg/m3) at the end of each year and at the beginning of each year. Specifically, the air quality reached unhealthy and very unhealthy range for majority of the days In January, February, November and December across all four years. In the middle of year (i.e. Quarter 2 and Quarter3), the air quality is much better and are generally below the EU limit value of 50 µg/m3. | ||
+ | * There isn’t any clear trend in the change in daily concentration of PM10 across the days in a month throughout 2013 to 2016. | ||
|} | |} | ||
{| class="wikitable" style="background-color:#FFFFFF;" width="100%" | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
|'''1.2 A Typical Day in Sofia City | |'''1.2 A Typical Day in Sofia City | ||
''' | ''' | ||
− | [[File: | + | [[File:Task 1 Characterize PastandRecentSituation20172018Final.png|800px|center]] |
− | <p> | + | <p>This calendar heat map shows the hourly concentration of PM 10 throughout a typical day across different months from Nov 2017 to Sep 2018. The rightmost column “Average” shows the overall average readings of hourly concentration. Filter on sampling location and air quality range could be utilized to narrow down the scope of comparison. <p> |
+ | <p> <b>Trends Analysis:</b> | ||
+ | * For the most RECENT average hourly concentration of PM 10 calendar map from Nov 2017 to Sep 2018, a typical day in Sofia city does not experience drastic change in air quality on average over the months. However, the average hourly concentration across these 11 months are generally higher in the range of 30+ µg/m3 before 9am and after 6pm which are after working hours. The average hourly concentration is lower in the range of 20+ µg/m3 from 9am to 6pm. The air quality is considered good throughout the day on average. | ||
+ | *However, the air quality readings are exceptionally high throughout the day (24 hours) in both November 2017 and January 2018. The air quality has reached unhealthy and very unhealthy range. This signifies that there might be external factors that influence the air quality to be extremely poor in those months. | ||
+ | |||
+ | |} | ||
+ | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
+ | |'''1.3 Further Analysis on Sampling Points | ||
+ | ''' | ||
+ | [[File:Task1 3.png|800px|center]] | ||
− | <p> <b>Trends:</b> | + | <p>TThis dashboard shows the distribution of sampling points in Sofia city as well as the readings concentration, distance to Kerb, altitude and distance to building for these sampling points.<p> |
− | * | + | <p> <b>Trends Analysis:</b> |
− | * | + | *All the sampling points follow the same cyclical pattern for the time-series analysis of average daily concentration of PM10 pollutants. The cyclical pattern repeated across the years from 2013 to 2018. |
+ | *Across the years, the mean concentration of PM10 pollutants is the highest for Orlov Most followed by Nadezhda. Orlov Most is near to residential or industrial building while Nadezhda is near not as near as Orlov Most to buildings. | ||
+ | *The variation in concentration is highest for Hipodruma (as seen by the high range and outliers). There might be association with other factors since the distance to kerb, altitude and distance to building is similar to most of other sampling points. | ||
− | |||
|} | |} | ||
+ | |||
+ | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
+ | |'''1.4 What anomalies do you find in the official air quality dataset? | ||
+ | ''' | ||
+ | *Inconsistent measurement on air quality data: The consistency occurs in the way they measure the air quality as hourly, var and daily concentration values are all available. | ||
+ | *Incomplete data for robust time-series analysis: There are a few months (Jan 2017 to Oct 2017) that do not have any air quality data. There is a need to find out why. | ||
+ | *Inconsistent coverage for Sampling Points: For Mladost (60881), only has one year of air quality reading available while for Orlov Most (9484), only have readings from year 2013 to 2015. | ||
+ | |} | ||
+ | |||
{| class="wikitable" style="background-color:#FFFFFF;" width="100%" | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
− | |'''1. | + | |'''1.5 How do these affect your analysis of potential problems to the environment? |
''' | ''' | ||
− | * | + | *Need to consider the appropriate measurement time method when comparing the air quality concentration for time-series analysis |
− | + | *The data points for sampling points might be too small (only 4 points with consistent readings) and thus affect the accuracy of analysis. | |
+ | *The poor air quality might be related to (1) the surrounding regions and facilities around the sampling points, (2) cyclical patterns across the months in different years and (3) Consumption behaviours of residents living in Sofia city | ||
|} | |} | ||
+ | |||
=== Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements === | === Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements === | ||
Line 118: | Line 102: | ||
|'''2.1 Sensor Coverage, Performance and Operation | |'''2.1 Sensor Coverage, Performance and Operation | ||
''' | ''' | ||
− | * 2.1.1 | + | * 2.1.1 Are they well distributed over the entire city? |
− | [[File: | + | [[File:SensorCoverageFinal.png|800px|center]] |
− | The | + | The heat map density illustrates the distribution of sensors across Sofia city. The higher the intensity, the more number of sensors found in the specific area. According to this heat map, it shows that the sensors mostly concentrate in the central area of Sofia city while the number of sensors decreases drastically beyond the central area. This could be due to the fact that the city area has a higher percentage of residential areas and transportation network. Thus, citizens place more sensors in the central area of Sofia city. |
− | * 2.1.2 | + | |
− | [[File: | + | * 2.1.2 Are they all working properly at all times? |
− | + | [[File:Sensor's Operation Analysis.png|800px|center]] | |
− | + | By looking at the number of records captures by the sensors from Sept 17 to Sept 18, the number of records increases steadily from September 17 but from May 18 onwards, the records dropped drastically from 200 to below 80 records. By looking at the number of records across months, it shows that the number of recordings is the highest in Jan, Feb, Mar, Oct, Nov, and Dec. While for other months, the number of recording decrease rapidly from around 60-159k to 17-38k.Considering that the lifetime of sensors last more than one year, the inconsistent number of records shows that the sensors are not working properly throughout the months. | |
− | + | ||
− | [[File: | + | * 2.1.3 Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? |
− | + | [[File:Time-series Analysis Comparison.png|800px|center]] | |
− | + | Through drawing out the time-series data of the average pollutant concentration and standard deviation for both Official Air Quality and Citizen Science Air Quality measurements, it shows that the reading range of P1 is much higher than that of PM10 under official air quality records (by 100 µg/m3). Meanwhile, the standard deviation of P1 is also much higher than that of PM10 under official air quality record. Therefore, the readings from sensors might not be fully accurate as the quality may differ significantly depending on the citizens' purchasing power or needs. Thus, the inconsistency in sensors used resulted in higher variation in the readings which may casue the readings to be inaccurate and inconsistent over time. | |
+ | |||
|} | |} | ||
{| class="wikitable" style="background-color:#FFFFFF;" width="100%" | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
|'''2.2 Air Pollution | |'''2.2 Air Pollution | ||
''' | ''' | ||
− | + | Which part of the city shows relatively higher readings than others? Are these differences time dependent? | |
− | + | [[File:Oct Reading.png|400px|center]] | |
− | [[File: | + | [[File:Nov Reading.png|400px|center]] |
− | [[File: | + | [[File:Jan Reading.png|400px|center]] |
− | [[File: | + | [[File:March 18.png|400px|center]] |
− | + | [[File:June Reading.png|400px|center]] | |
− | [[File: | ||
− | [[File: | ||
− | |||
+ | The central region has higher readings than other regions. However, as we show the history of the change in concentration for both P1 and P2 pollutants based on the density plot. At the beginning, the difference is relatively small from September 17 to October 17. However, from November 17 onwards, the difference increases sharply considering the increase in density in central region versus the rest. The difference is the sharpest in January 18. Subsequently, the difference decrease and in June 18, there is negligible difference between central region and non-central region. | ||
|} | |} | ||
Line 149: | Line 132: | ||
{| class="wikitable" style="background-color:#FFFFFF;" width="100%" | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
|'''Factor 1: Local Energy Sources ''' | |'''Factor 1: Local Energy Sources ''' | ||
− | [[File: | + | [[File:Task3 ThermalPlant and Meteorology Station.png|800px|center]] |
− | + | According to EU’s air quality report, production of electricity by burning of coal in thermal power plants and other industrial processes are a major contributor to unhealthy air. Based on research, it was found out that the two power plants (namely Sofia Power Plant and Sofia Iztok Power Plant are found near the city centre in Sofia. This shows that the high degree of pollution in the city central area might be due to the presence of power plant. Especially in areas with high population as well as during winter periods, the increase in needs for burning of coal will have an impact on Sofia’s air quality as a whole. Therefore, local energy source places an important role in influencing the air quality. | |
|} | |} | ||
{| class="wikitable" style="background-color:#FFFFFF;" width="100%" | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
|'''Factor 2: Local Meteorology ''' | |'''Factor 2: Local Meteorology ''' | ||
− | + | The Meteorology data is collected based on the coordinates at the meteorology station near Sofia’s airport. As the location is centralized in Sofia’s map, we will consider the readings as the average readings for Sofia’s city as a whole. | |
− | [[File: | + | As compared to Citizen Science Air Quality readings, official air quality readings are more consistent with similar locations of sampling points and consistent readings at specific time duration. PM10 pollutant concentration from Official Air Quality will be analysed together with local meteorology data instead. |
− | + | [[File:WindSpeedFinal.png|800px|center]] | |
+ | As seen in the graph above, in January, October and November, the readings for average concentration in each month is high whereas that for average wind speed is lower. Likewise for the months (February to December), the readings for average concentration in each month is low whereas that for average wind speed is high. Wind speed is one factor that can influence air quality as it will determine how fast would be pollutants disperse to other cities or other areas. Having a low wind speed would trap the pollutants in the area within Sofia city while a high wind speed allows the pollutants in Sofia city to spread quickly, thus lowering the pollutant concentration. | ||
+ | |||
+ | [[File:TempFinal.png|800px|center]] | ||
+ | As seen in the graph above, from Apr to Oct, the temperature is higher than average while for average concentration of the pollutants, the readings are lower than average. This shows that temperature and pollutant concentration have an inverse relationship clearly. This could be due to the fact that energy consumption is higher during winter period while that for summer period is much lower. During summer period, Sofia’s temperature has a highest value in the 20 degree’s range, indicating that air-conditioning is unlikely to be heavily utilized. However, during winter, the low temperature would signify a need for high energy consumption. | ||
+ | |||
|} | |} | ||
{| class="wikitable" style="background-color:#FFFFFF;" width="100%" | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
|'''Factor 3: Local Topography ''' | |'''Factor 3: Local Topography ''' | ||
− | [[File: | + | [[File:TopographicalDataFinal.png|800px|center]] |
− | + | The topography plot shows the altitude distribution in Sofia city. The result shows that the high altitude areas are at the left bottom corner of the map while the low altitude areas covers the rest of the map. Altitude can play a role in transportation of pollutants within Sofia City. During summer period, wind will blow from low to high altitude area while the wind will blow from high to low altitude area during winter period. This explains why the pollution is more serious in quarter 1 and quarter 4. As the main pollutant producer area concentrate in the city central area, the pollutants will be able to spread to high altitude area during summer. However, during winter, the pollutants will be trapped within the low altitude area in city central. The wind flow from high altitude area might also carry other pollutants abroad to Sofia city. | |
+ | |||
|} | |} | ||
{| class="wikitable" style="background-color:#FFFFFF;" width="100%" | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
|'''Factor 4: Complex interactions between local topography and meteorological characteristics ''' | |'''Factor 4: Complex interactions between local topography and meteorological characteristics ''' | ||
− | [[File: | + | [[File:Correlation MatrixFinal.png|800px|center]] |
− | The | + | A correlation matrix plot can show the relationships within the meteorological measures. Within the meteorological characteristics, factors such as Dew Point Temperature and Temperature are positively correlated. Another example would be Wind Speed and Temperature. The multicollinearity that exists within the meteorological characteristics makes it difficult to pinpoint the exact variable that affects air quality clearly. In addition, the altitude measure from local topography data will also play a part in influencing the meteorological characteristics. However, the dataset on meteorological data is insufficient to study the interactions with local topography detailedly. |
+ | |||
|} | |} | ||
{| class="wikitable" style="background-color:#FFFFFF;" width="100%" | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
|'''Factor 5: Transboundary Pollution ''' | |'''Factor 5: Transboundary Pollution ''' | ||
− | + | [[File:Map-on-the-Bulgarian-coal-resources-energy-infrastructure-and-largest-power-plants.png|800px|center]] | |
− | [[File: | + | Transboundary pollution might be one cause for the poor air quality considering the rest of the thermal power plants that are operating in other cities in Bulgaria. Additional datasets should be studied to analyze the wind flow directions within Bulgarian and with its immediate neighbours. |
|} | |} | ||
+ | |||
+ | == Interactive Visualization == | ||
+ | |||
+ | <p>The interactive visualization can be accessed here: https://public.tableau.com/profile/yunyingkaelyn#!/vizhome/Spatio-temporalAnalysisofAirQualityinSofiaCity/FinalStory </p> | ||
+ | |||
== References == | == References == | ||
− | <p>Understanding | + | <p>Understanding the current issues of poor air quality in Bulgaria https://www.eea.europa.eu/publications/air-quality-in-europe-2018/at_download/file </p> |
− | <p>Map on the Bulgarian Coal Resources and Energy Infrastructure https://www.researchgate.net/figure/Map-on-the-Bulgarian-coal-resources-energy-infrastructure-and-largest-power-plants_fig1_257941554</p> | + | <p>Map on the Bulgarian Coal Resources and Energy Infrastructure https://www.researchgate.net/figure/Map-on-the-Bulgarian-coal-resources-energy-infrastructure-and-largest-power-plants_fig1_257941554 </p> |
− | + | <p>Coordinates of Sofia's Thermal Plants http://www.wikiwand.com/en/List_of_power_stations_in_Bulgaria#/Thermal <p> | |
− | |||
== Comments == | == Comments == |
Latest revision as of 06:24, 13 November 2018
Contents
- 1 Problem & Motivation
- 2 Dataset Analysis & Transformation Process
- 2.1 Task 1: Spatio-temporal Analysis of Official Air Quality
- 2.2 1. Combine all time-series data (e.g. BG_5_9572_2017_timeseries) into one single excel spreadsheet
- 2.3 2. Merge metadata file with combined time-series data
- 2.4 Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements
- 2.5 Task 3: Find out the relationship of the air quality analysis with other factors
- 3 Analysis
- 4 Interactive Visualization
- 5 References
- 6 Comments
Problem & Motivation
Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer. Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health. Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3). According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).
This assignment aims to study the following:
- Task 1: Spatio-temporal Analysis of Official Air Quality
- Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements
- Task 3: Find out the relationship of the above analysis with other factors (Local energy sources, Local meteorology, Local topography, Complex interactions between local topography and meteorological characteristics and Transboundary pollution)
Dataset Analysis & Transformation Process
Task 1: Spatio-temporal Analysis of Official Air Quality
1. Combine all time-series data (e.g. BG_5_9572_2017_timeseries) into one single excel spreadsheet
- Create a new "Station" Column (indicating station code i.e. 9421) and "Year" Column based on the year for each of the time-series data
- Use excel to combine the rest of time-series files based on all the common columns (e.g. Countrycode, Namespace, AirQualityNetwork and etc.)
- Based on the analysis of the existing combined time-series data from 2013 to 2018, there is a drastic difference in the level of aggregation across the years. As shown in the table below, Year 2016 has a combination of hourly air quality readings and daily air quality readings while in 2017, there is a mixture of hour and var readings as for certain days, readings are not measured at one-hour interval continuously. As such, the analysis on air quality readings will be based on "Day" averaging time from year 2013 to 2016 as it is impossible to lower the aggregation level of "Day" to "Hour" readings in 2016. On the other hand, the analysis on air quality readings will be based on "Hour" averaging time for 2017 and 2018. For the data in 2017,average readings will be taken if the readings for any specific days are done on "Var" basis. Hourly analysis is also more accurate for both 2017 and 2018 as there are missing data on specific months. (In 2017, only Nov and Dec data are available while in 2018, only Jan to Sep's data are available)
- Due to the standardization in averaging time, the values under "Concentration", "DatetimeBegin" and "DatetimeEnd" have been adjusted accordingly while other column values remain unchanged.
- There are quite a number of duplicated readings in the dataset. These duplicated readings are removed during the transformation process to avoid unequal weightage.
- Due to the high variation in raw data and small dataset, excel is used for the transformation process.
2. Merge metadata file with combined time-series data
- According to the source of scrapped data (http://discomap.eea.europa.eu/map/fme/AirQualityExport.htm),"the join between time-series files and the metadata file should be made using the Countrycode (or Namespace) and SamplingPoint". Thus, metadata file and time-series data time are merged via Vlookup function in excel based on SamplingPoint (since Countrycode are all "BA" for both data file).
- Upon further inspection, these two datasets have a number of common columns with same values. These columns include "Countrycode, Namespace, AirQualityNetwork, AirQualityStation, AirQualityStationEoICode, SamplingProcess, AirPollutantCode, AirPollutant" and thus repeated columns are removed.
Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements
- Geohash represents station locations. However, Tableau is not able to interpret geohash as geographic data. Before Air Tube data is imported to Tableau for analysis, geohash needs to be decoded into geographical coordinates. Due to the sheer size in Air Tube datasets(data_bg_2017.xlsx and data_bg_2018.xlsx), R packages ("devtools", "tidyverse" and "geohas") as indicated by Prof Kam will be used to transform the geohashed raw data to a new csv file containing the coordinates of the locations.
- Next, 2017 and 2018 datasets that have been transformed will be combined based on the R programming code below.
- The combined datasets comprises of readings across the whole Bulgaria area which is beyond the scope of our concern (i.e. Sofia). To maintain the consistency in comparison, Inclusion filter function is used to filter in only longitudes and latitudes that are in Sofia city. Due to the difficulty in comparing with the real geographical area of Sofia, an existing map on Sofia City (from Open Street Map) was used to filter through the Tableau filter function. The folowing shows map of Sofia and the filtering function.
Task 3: Find out the relationship of the air quality analysis with other factors
- The Meteorological data scrapped is currently in crosstab format which is not suitable for analysis on Tableau. Hence, the dataset has to be transformed to columnar format. Hence, pivoting will be done to transform the data.
- As the data includes average, minimum and maximum readings for different types of measurements and are presented on separate columns. The format is not suitable to be processed on Tableau if there is a need to introduce filters in the dashboard. Hence the dataset is transformed to as shown below.
Analysis
Task 1: Spatio-temporal Analysis of Official Air Quality
1.1 Characterize Recent and Past Situation of Air Quality in Sofia City
The figures above shows the PAST average daily concentration heat map for year 2013 to 2016 as daily readings are not computed for year 2017 and year 2018. The calendar heat map visualization allows users to appreciate the trend in average daily concentration from Jan to Dec at one glance. The calendar map also includes filter by Sampling location, air quality range as well as Year. With reference to EU Air Quality classification based on research, the air quality range includes 0-20:Good, 20-40:Moderate, 40-50:Normal, 50-75:Unhealthy, 75 and above: Very Unhealthy.The heat map’s colour intensity is specified within a range such that dark red represent unhealthy and very unhealthy air quality range and vice versa. Trends Analysis:
|
1.2 A Typical Day in Sofia City
This calendar heat map shows the hourly concentration of PM 10 throughout a typical day across different months from Nov 2017 to Sep 2018. The rightmost column “Average” shows the overall average readings of hourly concentration. Filter on sampling location and air quality range could be utilized to narrow down the scope of comparison.
Trends Analysis:
|
1.3 Further Analysis on Sampling Points
TThis dashboard shows the distribution of sampling points in Sofia city as well as the readings concentration, distance to Kerb, altitude and distance to building for these sampling points.
Trends Analysis:
|
1.4 What anomalies do you find in the official air quality dataset?
|
1.5 How do these affect your analysis of potential problems to the environment?
|
Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements
2.1 Sensor Coverage, Performance and Operation
The heat map density illustrates the distribution of sensors across Sofia city. The higher the intensity, the more number of sensors found in the specific area. According to this heat map, it shows that the sensors mostly concentrate in the central area of Sofia city while the number of sensors decreases drastically beyond the central area. This could be due to the fact that the city area has a higher percentage of residential areas and transportation network. Thus, citizens place more sensors in the central area of Sofia city.
By looking at the number of records captures by the sensors from Sept 17 to Sept 18, the number of records increases steadily from September 17 but from May 18 onwards, the records dropped drastically from 200 to below 80 records. By looking at the number of records across months, it shows that the number of recordings is the highest in Jan, Feb, Mar, Oct, Nov, and Dec. While for other months, the number of recording decrease rapidly from around 60-159k to 17-38k.Considering that the lifetime of sensors last more than one year, the inconsistent number of records shows that the sensors are not working properly throughout the months.
Through drawing out the time-series data of the average pollutant concentration and standard deviation for both Official Air Quality and Citizen Science Air Quality measurements, it shows that the reading range of P1 is much higher than that of PM10 under official air quality records (by 100 µg/m3). Meanwhile, the standard deviation of P1 is also much higher than that of PM10 under official air quality record. Therefore, the readings from sensors might not be fully accurate as the quality may differ significantly depending on the citizens' purchasing power or needs. Thus, the inconsistency in sensors used resulted in higher variation in the readings which may casue the readings to be inaccurate and inconsistent over time. |
2.2 Air Pollution
Which part of the city shows relatively higher readings than others? Are these differences time dependent? The central region has higher readings than other regions. However, as we show the history of the change in concentration for both P1 and P2 pollutants based on the density plot. At the beginning, the difference is relatively small from September 17 to October 17. However, from November 17 onwards, the difference increases sharply considering the increase in density in central region versus the rest. The difference is the sharpest in January 18. Subsequently, the difference decrease and in June 18, there is negligible difference between central region and non-central region. |
Task 3: Factors Affecting Air Quality in Sofia City
Factor 1: Local Energy Sources
According to EU’s air quality report, production of electricity by burning of coal in thermal power plants and other industrial processes are a major contributor to unhealthy air. Based on research, it was found out that the two power plants (namely Sofia Power Plant and Sofia Iztok Power Plant are found near the city centre in Sofia. This shows that the high degree of pollution in the city central area might be due to the presence of power plant. Especially in areas with high population as well as during winter periods, the increase in needs for burning of coal will have an impact on Sofia’s air quality as a whole. Therefore, local energy source places an important role in influencing the air quality. |
Factor 2: Local Meteorology
The Meteorology data is collected based on the coordinates at the meteorology station near Sofia’s airport. As the location is centralized in Sofia’s map, we will consider the readings as the average readings for Sofia’s city as a whole. As compared to Citizen Science Air Quality readings, official air quality readings are more consistent with similar locations of sampling points and consistent readings at specific time duration. PM10 pollutant concentration from Official Air Quality will be analysed together with local meteorology data instead. As seen in the graph above, in January, October and November, the readings for average concentration in each month is high whereas that for average wind speed is lower. Likewise for the months (February to December), the readings for average concentration in each month is low whereas that for average wind speed is high. Wind speed is one factor that can influence air quality as it will determine how fast would be pollutants disperse to other cities or other areas. Having a low wind speed would trap the pollutants in the area within Sofia city while a high wind speed allows the pollutants in Sofia city to spread quickly, thus lowering the pollutant concentration. As seen in the graph above, from Apr to Oct, the temperature is higher than average while for average concentration of the pollutants, the readings are lower than average. This shows that temperature and pollutant concentration have an inverse relationship clearly. This could be due to the fact that energy consumption is higher during winter period while that for summer period is much lower. During summer period, Sofia’s temperature has a highest value in the 20 degree’s range, indicating that air-conditioning is unlikely to be heavily utilized. However, during winter, the low temperature would signify a need for high energy consumption. |
Factor 3: Local Topography
The topography plot shows the altitude distribution in Sofia city. The result shows that the high altitude areas are at the left bottom corner of the map while the low altitude areas covers the rest of the map. Altitude can play a role in transportation of pollutants within Sofia City. During summer period, wind will blow from low to high altitude area while the wind will blow from high to low altitude area during winter period. This explains why the pollution is more serious in quarter 1 and quarter 4. As the main pollutant producer area concentrate in the city central area, the pollutants will be able to spread to high altitude area during summer. However, during winter, the pollutants will be trapped within the low altitude area in city central. The wind flow from high altitude area might also carry other pollutants abroad to Sofia city. |
Factor 4: Complex interactions between local topography and meteorological characteristics
A correlation matrix plot can show the relationships within the meteorological measures. Within the meteorological characteristics, factors such as Dew Point Temperature and Temperature are positively correlated. Another example would be Wind Speed and Temperature. The multicollinearity that exists within the meteorological characteristics makes it difficult to pinpoint the exact variable that affects air quality clearly. In addition, the altitude measure from local topography data will also play a part in influencing the meteorological characteristics. However, the dataset on meteorological data is insufficient to study the interactions with local topography detailedly. |
Factor 5: Transboundary Pollution
Transboundary pollution might be one cause for the poor air quality considering the rest of the thermal power plants that are operating in other cities in Bulgaria. Additional datasets should be studied to analyze the wind flow directions within Bulgarian and with its immediate neighbours. |
Interactive Visualization
The interactive visualization can be accessed here: https://public.tableau.com/profile/yunyingkaelyn#!/vizhome/Spatio-temporalAnalysisofAirQualityinSofiaCity/FinalStory
References
Understanding the current issues of poor air quality in Bulgaria https://www.eea.europa.eu/publications/air-quality-in-europe-2018/at_download/file
Map on the Bulgarian Coal Resources and Energy Infrastructure https://www.researchgate.net/figure/Map-on-the-Bulgarian-coal-resources-energy-infrastructure-and-largest-power-plants_fig1_257941554
Coordinates of Sofia's Thermal Plants http://www.wikiwand.com/en/List_of_power_stations_in_Bulgaria#/Thermal