Difference between revisions of "IS428 AY2018-19T1 Chrysta Yuen Jia Lin"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(29 intermediate revisions by the same user not shown)
Line 12: Line 12:
 
Before analyzing the data, there is a need to do data preparation to make sense of the data. Under the Sofia Air data, there are 4 different zip files provided in the assignment with each own unique ways to process and make sense of the data. This particular section will be used to elaborate on the dataset analysis and its transformation process for each dataset, to prepare the data for import and analysis onto tableau.  
 
Before analyzing the data, there is a need to do data preparation to make sense of the data. Under the Sofia Air data, there are 4 different zip files provided in the assignment with each own unique ways to process and make sense of the data. This particular section will be used to elaborate on the dataset analysis and its transformation process for each dataset, to prepare the data for import and analysis onto tableau.  
  
==<div style="background: #000000; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #ffffff>EEA Data</font></div>==
+
==<div style="background: #000000; padding: 20px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #ffffff>EEA Data</font></div>==
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
+
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
 +
 
 +
'''Problem 1: The raw dataset (EEA Data) has numerous data(bg_x_xxx_year) located in different csv files as seen in Figure 1.'''
 +
[[File:Problem1b.jpg|thumb|center|Figure 1]]
 +
 
 +
'''Solution 1: To successfully upload the data set onto Tableau, use the union function(figure 2) to include all the different csv files.'''
 +
 
 +
To integrate the metadata, innerjoin metadata and the union-ed bg data based on the variable: AirQualityEoiCode. This step helps to integrate both the bg_data and the metadata.
 +
[[File:Problem 1a.jpg|thumb|center|Figure 2]]
 +
 
 +
'''Problem 2: The raw dataset (EEA Data) has data of stations with limited number of yearly data.'''
 +
 
 +
As seen in Figure 3, the problematic data is highlighted with the purple border.
 +
[[File:Problem 1.jpg|thumb|center|Figure 3]]
 +
 
 +
'''Solution 2: To prevent the data from affecting the rest of the dataset, it will be omitted .'''
 +
 
 +
As seen from Figure 3, the data file affected includes: Station 60881 and Station 9484.
 +
Both data file will be excluded from the visualization.
 +
 
 +
==<div style="background: #000000; padding: 20px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #ffffff>Air Tube Data</font></div>==
 +
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
 +
 
 +
'''Problem 1: AirTube's data does not brings about the exact location as it is given in geohash format. '''
 +
[[File:Problem 2a.jpg|thumb|center]]
 +
 
 +
'''Solution 1: Determine the location of the data points by using the geohash package in R environment to convert the geohash format into longitude and latitude.'''
 +
Using the geohash package in R environment, convert the geohash as seen in Figure 5.
 +
[[File:Problem 2b.jpg|thumb|center|Figure 6]]
  
 
==<div style="background: #000000; padding: 15px; line-height: 0.6em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #ffffff>Task 1: Spatio-temporal Analysis of Official Air Quality</font></div>==
 
==<div style="background: #000000; padding: 15px; line-height: 0.6em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #ffffff>Task 1: Spatio-temporal Analysis of Official Air Quality</font></div>==
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
{| class="wikitable"
+
A typical day in Sofia city can be seen from the image in Figure 8, where the days in a week ranges from Sunday to Satursay.
|-
+
The concentration level is divided into 5 different concentration bins(Figure 7):
! '''What does a typical day look like for Sofia city?'''
 
|-
 
A typical day in Sofia city can be seen from the image in Figure 1.  
 
The concentration level is divided into 5 different concentration bins:
 
[[File:Concentration Level Grade.jpg|300px|center|Figure 1]]
 
  
 +
[[File:Concentration Level Grade.jpg|300px|thumb|center|Figure 7]]
 +
[[File:Calendar HeatMap.jpg|700px|thumb|center|Figure 8]]
  
[[File:Calendar HeatMap.jpg|700px|center|Figure 2]]
+
A typical day in Sofia city is generally rated “Fair”; where a "Fair" grade is determine by a concentration level between 30-45um/g.
 +
However, a typical day in Sofia city from November to February is generally rated “Very Poor”; where a "Very Poor" grade is determine by a concentration level that is higher than 60. In particular, the high pollution level during December can be attributed to the Bulgarian's Christmas traditions; fire be built in the hearth, with enough wood to burn all night and into Christmas Day, to help with the new birth of the sun. With this tradition, the amount of pollution during Christmas season will naturally be higher than usual. Having the majority of Bulgarian burn wood throughout the night for the festive season is a huge contribution to the increased pollution concentration level. Despite the continuous burning year on year, it is observed that pollution's concentration level decreased over the years. This can be attributed to modernization, where Bulgarian families' tradition evolve to substitute the lighting of wood with the lighting of candles. As lighted candles produced lesser air pollution as compared to burnt wood, there is a general decrease in Bulgaria's pollution concentration level over the years(Figure 8).
  
A typical day in Sofia city from March to October is generally rated “Fair”; where a Fair grade is determine by a concentration level between 30-45um/g.
+
Visualizing the data also reflects the anomalies in the data set, as seen in Figure 9.
However, a typical day in Sofia city from November to February is generally rated “Very Poor”; where a Very Poor grade is determine by a concentration level that is higher than 60.
+
<!--[[File:Problem 2c.jpg|700px|center|Figure 9]]:-->
|-
+
As seen from Figure 9, the spike in pollution's concentration levels periodically happen during the end of December and the middle of January. This helps to support the previous discussion about Bulgarians' tradition of burning woods during the Christmas.
! '''Do you see any trends of possible interest in this investigation?'''
 
|-
 
|
 
|-
 
! '''What anomalies do you find in the official air quality dataset?'''
 
|-
 
|
 
|-
 
! '''How do these affect your analysis of potential problems to the environment?'''
 
|-
 
|
 
|}
 
  
 
==<div style="background: #000000; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #ffffff>Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements</font></div>==
 
==<div style="background: #000000; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #ffffff>Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements</font></div>==
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
{| class="wikitable"
 
|-
 
! '''Characterize the sensors’ coverage, performance and operation. Are they well distributed over the entire city?'''
 
|-
 
|
 
|-
 
! '''Are they all working properly at all times?'''
 
|-
 
|
 
|-
 
! '''Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?'''
 
|-
 
|
 
|-
 
! '''Which part of the city shows relatively higher readings than others?'''
 
|-
 
|
 
|-
 
! '''Are these differences time dependent?'''
 
|-
 
|
 
|}
 
  
 +
[[File:Visual 1.jpg|700px|thumb|center|Figure 10]]
 +
[[File:Visual 1 2018.jpg|700px|thumb|center|Figure 11]]
 +
 +
As seen from Figure 10 and 11, the sensor's coverage focuses mainly on the central area of Sofia City for both 2017 and 2018. This led to a negligence of the outer rims of Sofia City, namely the North East and South East of Sofia City. 
 +
While Figure 11 shows that the number of sensor coverage increased in year 2018, the line graph also shows the increased in the number of inaccuracy in year 2018. The line graph in year 2017 reflects no inaccuracy while the line graph in year 2018 shows an increase in another 5 inaccuracy values.
 +
 +
[[File:Visual 2.jpg|700px|center|Figure 12]]
 +
 +
Figure 12 reflects the stability of the sensors by comparing time(hourly) with the total number of records. This helps to determine if the sensors were inaccurate or malfunctioned at any point in time.
 +
The time series above shows the number of measurements over time and displays an obvious increase in the number of citizen science sensors from September 2017 to August 2018. There are certain days where measurements are missing, as seen by the massive downward spikes. These sudden drop in measurements seem to occur at the end and start of the month (eg. MAR 29, MAY 1, JUL 4).
 +
Taking a closer look at Figure 12, there is an obvious dip in 3 instances; 31st January, 1st April, 1st May, and 4th to 12th July. While the dips did not reflect an empty value, the value is too small to be significantly compared with the remaining data.
 +
 +
 +
 +
Through the readings captured, Figure 13 reflects the unexpected behaviors of pressure, humidity and temperature.
 +
 +
<!-- '''Pressure''' :-->
 +
 +
<!-- '''Humidity''' :-->
 +
 +
<!-- '''Temperature''' :-->
 +
 +
[[File:Concentration level (light period).jpg|700px|thumb|center|Figure 14]]
 +
 +
Figure 14 shows the pollution concentration level during a regular month(Exclude irregular months with high pollution concentration level; January, February, November, December).
 +
As seen from Figure 14, non working hours(12am to 8am, and 5pm to 11.59pm) seems to have a higher pollution concentration level. This might be attributed to the increase in human activities during non-working hours.
  
 
==<div style="background: #000000; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #ffffff>Task 3</font></div>==
 
==<div style="background: #000000; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:Helvetica"><font color= #ffffff>Task 3</font></div>==
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
 
<div style="font-family:Open Sans, Arial, sans-serif;font-size:12px">
{| class="wikitable"
 
|-
 
! '''Context'''
 
|-
 
|Urban air pollution is a complex issue.  There are many factors affecting the air quality of a city.  Some of the possible causes are:
 
  
* Local energy sources.  For example, according to [http://unmaskmycity.org/project/sofia/ Unmask My City], a global initiative by doctors, nurses, public health practitioners, and allied health professionals dedicated to improving air quality and reducing emissions in our cities, Bulgaria’s main sources of PM10, and fine particle pollution PM2.5 (particles 2.5 microns or smaller) are household burning of fossil fuels or biomass, and transport. 
+
'''Context'''
* Local meteorology such as temperature, pressure, rainfall, humidity, wind etc
 
* Local topography
 
* Complex interactions between local topography and meteorological characteristics.
 
* Transboundary pollution for example the haze that intruded into Singapore from our neighbours.
 
|-
 
! '''Reveal the relationships between the factors mentioned above and the air quality measure detected in Task 1 and Task 2.'''
 
|-
 
|
 
|-
 
|
 
|}
 
  
 +
Urban air pollution is a complex issue.  There are many factors affecting the air quality of a city.  Some of the possible causes are:
 +
 +
'''Local energy sources/ Transboundary Pollution'''
 +
Based on research, Sofia City is heavily polluted with air pollution due to the use of household fossil fuels burning as well as the energy sources used. Currently, Bulgaria uses large amount of coal-fired power plants and thermal plants to power up the city. The usage of such plants is highly detrimental to the environment. According to the Environmental Protection Agency’s (EPA) National Emissions Inventory, US coal power plants emitted 45,676  pounds of mercury in 2014. Additionally, waste from countries like Italy and the United Kingdom are imported to Bulgaria for burning. The Devnya cement plant used for the burning of waste stockpiles uncovered bales of water, polluting the air in Bulgaria.
 +
 +
<!-- ''' Local meteorology ''':-->
 +
 +
 +
<!-- ''' Local topography ''':-->
 +
 +
=Conclusion=
 +
 +
In conclusion, based on the data visualisation, we can deduce that the concentration level of the air pollution in Sofia City is relatively poor. In particular, there is a spike in the air pollution's concentration level during the festive months such as January, February, November and December.
 +
Apart from having a spike in the air pollution's concentration level, Sofia City is also heavily polluted due to the high amount of pollutions from coal-fired plants and thermal plants. Environmental factors such as the wind speed and rain precipitation is also another factor leading to the high air pollution. With faster wind speed, the air particles that pollutes the air are being carried out of Sofia City. This is helps reduce the level of air pollution. Vice versa, a slower wind speed will lead to a relatively higher air pollution concentration level due to the stillness of the air.
 +
To improve the air pollution issue in Sofia City, there will be a need for the governmental authorities to take steps to cut back on air pollution resulting factories and industries. Additionally, more rules and policies should be past by the government to push Sofia City towards the goal of being a greener city.
  
 
=Reference=
 
=Reference=
 +
 +
https://zerowasteeurope.eu/2018/01/bulgaria-air-pollution/
 +
https://www.reuters.com/article/us-bulgaria-coal/bulgaria-joins-poland-in-appeal-against-eu-pollution-crackdown-idUSKBN1EZ20I
 +
https://www.ucsusa.org/clean-energy/coal-and-other-fossil-fuels/coal-air-pollution#.W-hA3Xozb-Y
 +
 +
=Feedback=

Latest revision as of 23:06, 11 November 2018

Problem and Motivation

Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide, an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.

In particular, air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health. Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).

According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).

With the huge amount of data collected, there is a need to build an interactive data visualization tool to assist the WHO and the government officials in Bulgaria to identify the areas with highly polluted air that is unfit for breathing.

Dataset Analysis & Transformation Process

Before analyzing the data, there is a need to do data preparation to make sense of the data. Under the Sofia Air data, there are 4 different zip files provided in the assignment with each own unique ways to process and make sense of the data. This particular section will be used to elaborate on the dataset analysis and its transformation process for each dataset, to prepare the data for import and analysis onto tableau.

EEA Data

Problem 1: The raw dataset (EEA Data) has numerous data(bg_x_xxx_year) located in different csv files as seen in Figure 1.

Figure 1

Solution 1: To successfully upload the data set onto Tableau, use the union function(figure 2) to include all the different csv files.

To integrate the metadata, innerjoin metadata and the union-ed bg data based on the variable: AirQualityEoiCode. This step helps to integrate both the bg_data and the metadata.

Figure 2

Problem 2: The raw dataset (EEA Data) has data of stations with limited number of yearly data.

As seen in Figure 3, the problematic data is highlighted with the purple border.

Figure 3

Solution 2: To prevent the data from affecting the rest of the dataset, it will be omitted .

As seen from Figure 3, the data file affected includes: Station 60881 and Station 9484. Both data file will be excluded from the visualization.

Air Tube Data

Problem 1: AirTube's data does not brings about the exact location as it is given in geohash format.

Problem 2a.jpg

Solution 1: Determine the location of the data points by using the geohash package in R environment to convert the geohash format into longitude and latitude. Using the geohash package in R environment, convert the geohash as seen in Figure 5.

Figure 6

Task 1: Spatio-temporal Analysis of Official Air Quality

A typical day in Sofia city can be seen from the image in Figure 8, where the days in a week ranges from Sunday to Satursay. The concentration level is divided into 5 different concentration bins(Figure 7):

Figure 7
Figure 8

A typical day in Sofia city is generally rated “Fair”; where a "Fair" grade is determine by a concentration level between 30-45um/g. However, a typical day in Sofia city from November to February is generally rated “Very Poor”; where a "Very Poor" grade is determine by a concentration level that is higher than 60. In particular, the high pollution level during December can be attributed to the Bulgarian's Christmas traditions; fire be built in the hearth, with enough wood to burn all night and into Christmas Day, to help with the new birth of the sun. With this tradition, the amount of pollution during Christmas season will naturally be higher than usual. Having the majority of Bulgarian burn wood throughout the night for the festive season is a huge contribution to the increased pollution concentration level. Despite the continuous burning year on year, it is observed that pollution's concentration level decreased over the years. This can be attributed to modernization, where Bulgarian families' tradition evolve to substitute the lighting of wood with the lighting of candles. As lighted candles produced lesser air pollution as compared to burnt wood, there is a general decrease in Bulgaria's pollution concentration level over the years(Figure 8).

Visualizing the data also reflects the anomalies in the data set, as seen in Figure 9. As seen from Figure 9, the spike in pollution's concentration levels periodically happen during the end of December and the middle of January. This helps to support the previous discussion about Bulgarians' tradition of burning woods during the Christmas.

Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

Figure 10
Figure 11

As seen from Figure 10 and 11, the sensor's coverage focuses mainly on the central area of Sofia City for both 2017 and 2018. This led to a negligence of the outer rims of Sofia City, namely the North East and South East of Sofia City. While Figure 11 shows that the number of sensor coverage increased in year 2018, the line graph also shows the increased in the number of inaccuracy in year 2018. The line graph in year 2017 reflects no inaccuracy while the line graph in year 2018 shows an increase in another 5 inaccuracy values.

Figure 12

Figure 12 reflects the stability of the sensors by comparing time(hourly) with the total number of records. This helps to determine if the sensors were inaccurate or malfunctioned at any point in time. The time series above shows the number of measurements over time and displays an obvious increase in the number of citizen science sensors from September 2017 to August 2018. There are certain days where measurements are missing, as seen by the massive downward spikes. These sudden drop in measurements seem to occur at the end and start of the month (eg. MAR 29, MAY 1, JUL 4). Taking a closer look at Figure 12, there is an obvious dip in 3 instances; 31st January, 1st April, 1st May, and 4th to 12th July. While the dips did not reflect an empty value, the value is too small to be significantly compared with the remaining data.


Through the readings captured, Figure 13 reflects the unexpected behaviors of pressure, humidity and temperature.



Figure 14

Figure 14 shows the pollution concentration level during a regular month(Exclude irregular months with high pollution concentration level; January, February, November, December). As seen from Figure 14, non working hours(12am to 8am, and 5pm to 11.59pm) seems to have a higher pollution concentration level. This might be attributed to the increase in human activities during non-working hours.

Task 3

Context

Urban air pollution is a complex issue. There are many factors affecting the air quality of a city. Some of the possible causes are:

Local energy sources/ Transboundary Pollution Based on research, Sofia City is heavily polluted with air pollution due to the use of household fossil fuels burning as well as the energy sources used. Currently, Bulgaria uses large amount of coal-fired power plants and thermal plants to power up the city. The usage of such plants is highly detrimental to the environment. According to the Environmental Protection Agency’s (EPA) National Emissions Inventory, US coal power plants emitted 45,676 pounds of mercury in 2014. Additionally, waste from countries like Italy and the United Kingdom are imported to Bulgaria for burning. The Devnya cement plant used for the burning of waste stockpiles uncovered bales of water, polluting the air in Bulgaria.



Conclusion

In conclusion, based on the data visualisation, we can deduce that the concentration level of the air pollution in Sofia City is relatively poor. In particular, there is a spike in the air pollution's concentration level during the festive months such as January, February, November and December. Apart from having a spike in the air pollution's concentration level, Sofia City is also heavily polluted due to the high amount of pollutions from coal-fired plants and thermal plants. Environmental factors such as the wind speed and rain precipitation is also another factor leading to the high air pollution. With faster wind speed, the air particles that pollutes the air are being carried out of Sofia City. This is helps reduce the level of air pollution. Vice versa, a slower wind speed will lead to a relatively higher air pollution concentration level due to the stillness of the air. To improve the air pollution issue in Sofia City, there will be a need for the governmental authorities to take steps to cut back on air pollution resulting factories and industries. Additionally, more rules and policies should be past by the government to push Sofia City towards the goal of being a greener city.

Reference

https://zerowasteeurope.eu/2018/01/bulgaria-air-pollution/ https://www.reuters.com/article/us-bulgaria-coal/bulgaria-joins-poland-in-appeal-against-eu-pollution-crackdown-idUSKBN1EZ20I https://www.ucsusa.org/clean-energy/coal-and-other-fossil-fuels/coal-air-pollution#.W-hA3Xozb-Y

Feedback