IS428 AY2018-19T1 Chrysta Yuen Jia Lin

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Problem and Motivation

Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide, an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.

In particular, air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health. Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).

According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).

With the huge amount of data collected, there is a need to build an interactive data visualization tool to assist the WHO and the government officials in Bulgaria to identify the areas with highly polluted air that is unfit for breathing.

Dataset Analysis & Transformation Process

Before analyzing the data, there is a need to do data preparation to make sense of the data. Under the Sofia Air data, there are 4 different zip files provided in the assignment with each own unique ways to process and make sense of the data. This particular section will be used to elaborate on the dataset analysis and its transformation process for each dataset, to prepare the data for import and analysis onto tableau.

EEA Data

Problem 1: The raw dataset (EEA Data) has numerous data(bg_x_xxx_year) located in different csv files as seen in Figure 1.

Figure 1

Solution 1: To successfully upload the data set onto Tableau, use the union function(figure 2) to include all the different csv files.

To integrate the metadata, innerjoin metadata and the union-ed bg data based on the variable: AirQualityEoiCode. This step helps to integrate both the bg_data and the metadata.

Figure 2

Problem 2: The raw dataset (EEA Data) has data of stations with limited number of yearly data.

As seen in Figure 3, the problematic data is highlighted with the purple border.

Figure 3

Solution 2: To prevent the data from affecting the rest of the dataset, it will be omitted .

As seen from Figure 3, the data file affected includes: Station 60881 and Station 9484. Both data file will be excluded from the visualization.

Air Tube Data

Problem 1: AirTube's data does not brings about the exact location as it is given in geohash format.

Problem 2a.jpg

Solution 1: Determine the location of the data points by using the geohash package in R environment to convert the geohash format into longitude and latitude. Using the geohash package in R environment, convert the geohash as seen in Figure 5.

Figure 6

Task 1: Spatio-temporal Analysis of Official Air Quality

A typical day in Sofia city can be seen from the image in Figure 8, where the days in a week ranges from Sunday to Satursay. The concentration level is divided into 5 different concentration bins(Figure 7):

Figure 7
Figure 8

A typical day in Sofia city is generally rated “Fair”; where a "Fair" grade is determine by a concentration level between 30-45um/g. However, a typical day in Sofia city from November to February is generally rated “Very Poor”; where a "Very Poor" grade is determine by a concentration level that is higher than 60. In particular, the high pollution level during December can be attributed to the Bulgarian's Christmas traditions; fire be built in the hearth, with enough wood to burn all night and into Christmas Day, to help with the new birth of the sun. With this tradition, the amount of pollution during Christmas season will naturally be higher than usual. Having the majority of Bulgarian burn wood throughout the night for the festive season is a huge contribution to the increased pollution concentration level. Despite the continuous burning year on year, it is observed that pollution's concentration level decreased over the years. This can be attributed to modernization, where Bulgarian families' tradition evolve to substitute the lighting of wood with the lighting of candles. As lighted candles produced lesser air pollution as compared to burnt wood, there is a general decrease in Bulgaria's pollution concentration level over the years(Figure 8).

Visualizing the data also reflects the anomalies in the data set, as seen in Figure 9. As seen from Figure 9, the spike in pollution's concentration levels periodically happen during the end of December and the middle of January. This helps to support the previous discussion about Bulgarians' tradition of burning woods during the Christmas.

Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

Figure 10
Figure 11

As seen from Figure 10 and 11, the sensor's coverage focuses mainly on the central area of Sofia City for both 2017 and 2018. This led to a negligence of the outer rims of Sofia City, namely the North East and South East of Sofia City. While Figure 11 shows that the number of sensor coverage increased in year 2018, the line graph also shows the increased in the number of inaccuracy in year 2018. The line graph in year 2017 reflects no inaccuracy while the line graph in year 2018 shows an increase in another 5 inaccuracy values.

Figure 12

Figure 12 reflects the stability of the sensors by comparing time(hourly) with the total number of records. This helps to determine if the sensors were inaccurate or malfunctioned at any point in time. The time series above shows the number of measurements over time and displays an obvious increase in the number of citizen science sensors from September 2017 to August 2018. There are certain days where measurements are missing, as seen by the massive downward spikes. These sudden drop in measurements seem to occur at the end and start of the month (eg. MAR 29, MAY 1, JUL 4). Taking a closer look at Figure 12, there is an obvious dip in 3 instances; 31st January, 1st April, 1st May, and 4th to 12th July. While the dips did not reflect an empty value, the value is too small to be significantly compared with the remaining data.


Through the readings captured, Figure 13 reflects the unexpected behaviors of pressure, humidity and temperature.

Pressure

Humidity

Temperature

Figure 14

Figure 14 shows the pollution concentration level during a regular month(Exclude irregular months with high pollution concentration level; January, February, November, December). As seen from Figure 14, non working hours(12am to 8am, and 5pm to 11.59pm) seems to have a higher pollution concentration level. This might be attributed to the increase in human activities during non-working hours.

Task 3

Context
Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:

Local energy sources/ Transboundary Pollution Based on research, Sofia City is heavily polluted with air pollution due to the use of household fossil fuels burning as well as the energy sources used. Currently, Bulgaria uses large amount of coal-fired power plants and thermal plants to power up the city. The usage of such plants is highly detrimental to the environment. According to the Environmental Protection Agency’s (EPA) National Emissions Inventory, US coal power plants emitted 45,676 pounds of mercury in 2014. Additionally, waste from countries like Italy and the United Kingdom are imported to Bulgaria for burning. The Devnya cement plant used for the burning of waste stockpiles uncovered bales of water, polluting the air in Bulgaria.

Local meteorology


Local topography

Conclusion

In conclusion, based on the data visualisation, we can deduce that the concentration level of the air pollution in Sofia City is relatively poor. In particular, there is a spike in the air pollution's concentration level during the festive months such as January, February, November and December.

Reference

https://zerowasteeurope.eu/2018/01/bulgaria-air-pollution/ https://www.reuters.com/article/us-bulgaria-coal/bulgaria-joins-poland-in-appeal-against-eu-pollution-crackdown-idUSKBN1EZ20I https://www.ucsusa.org/clean-energy/coal-and-other-fossil-fuels/coal-air-pollution#.W-hA3Xozb-Y

Feedback