IS428 AY2018-19T1 Sheryl Chong Man Er

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search

Problem & Motivation

Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer. Air quality in Bulgaria is a big concern. Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3). According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10). This is extremely dangerous to people living in Bulgaria as they are constantly exposed to air pollution which placed them in a high risk of cancer and early death.

Through the interactive visualisation, we aim to:

  1. Discover the patterns of air pollution in Sofia City over a period of time
  2. Uncover the relationship between air quality, metrological data such as temperature, humidity and topographical data such as elevation
  3. Suggestion to tackle air pollution


Dataset Analysis & Transformation Process

There were 4 different datasets provided for this assignment which are Official air quality measurements (EEA Data), Citizen science air quality measurements (Air Tube Data), Meteorological measurements and Topography data. This section will elaborate on the dataset analysis and transformation process for each dataset in order to prepare the data for import and analysis on an interactive visualization.

Official Air Quality Data (EEA)

Issue 1:
Some of the stations have missing records. 9484 only have records from 2013 to 2015 while 60881 only have records for 2018
Solution 1:

Eea merge1.png

This records will not be included for merging of data for the visualisation. This ensures that the visualisation is content in terms of values After removing the records, we will start merging the files together for the visualisation.


Issue 2:
For EEA Data, there is an excel file called "metadata" which consists of important information like CommonName, Latitude and Longitude of the station which is critical for the analysis.
Solution 2:

Eea merged2.png

Hence, we need to merge the metadata with the merged EEA file together.

Citizen Science Air Quality Measurements (Air Tube)

Issue 3:
For Air Tube data, the latitude and longitude data are being hashed together which is under geohash column in the data file. Tableau is unable to read that information.
Solution 3:

Geohash sheryl.png

We will use a python packaged called "geohash" and decode all the geohash into latitude and longitude.


Issue 4:
The meteorology data is stored in another file. In order to analyse how the meteorology data affects the P1 and P2 concentration rate. We need to merge the files together.
Solution 4:
The common field between the meteorology file and air tube data file is the date. However, in the meteorology file, the date is split into 3 individual columns which are Year, Month and Day. We need to merge the 3 column together to get the full date using excel formula.

Date end.png

Dataset Import Structure & Process

With the dataset analysis and transformation phase completed, the following files will have to be imported into Tableau for analysis:

Data.png

An additional merging of files is needed in Tableau. Air Tube data will be merging with Meteorological data based on the date of the records.

Air meta merge.png
  1. Import the air tube data as a data source.
  2. Add new data connections to the air tube data source. The new data connection file will be the meteo data.
  3. Perform an inner join between the air tube and the meteo to correlate the air tube data with meteorological readings. This will allow us to capture the meteorological reading for each air tube data.


Interactive Visualisation

Link to Tableau Visualisation: https://public.tableau.com/views/Assignment1_SofiaAirPollution/MainDashboard?:embed=y&:display_count=yes&publish=yes
Link to Data Cleaning Scripts: https://github.com/sherylcme/Assignment1_SofiaAirPollution

Main Dashboard

Maindashboard.png


Official Data Trends - Overview

Official1.png


Official Data Trends - Calendar View

Official2.png


Citizen Data Trend

Citizen1.png


Factor Relationship Analysis

Factor1.png


Interesting & Anomalous Observations

Task 1: Spatio-temporal Analysis of Official Air Quality

The past and most recent situation with respect to air quality measures in Sofia City

Concentration Level Trend by Year.png

The graph above shows the concentration trends over the years, from this we can observe that:

  1. There are no data collected from January 2017 to October 2017 hence there's a gap between the graph.
  2. The sharp peaks are during the winter period (November to January) where concentration level can go as high as 373.3.
  3. Period between May to September, the concentration level are below the daily limit value set by EU of 50.
  4. The average concentration level for traffic station type is more than the average concentration level for background station type.


How does a typical day look like for Sofia City

Calendar View on PM10 Concentration.png

The graph above shows the calendar view of concentration level over the year. This gives an insight into the daily life of a citizen in Sofia City. From this we can observe that:

  1. There is no noticeable pattern between Monday to Sunday which gives us an insight into how the concentration level differs from weekdays and weekends
  2. However, this visualisation is aligned with the concentration trend graph which indicates that there is a higher pollution level in December and January.


Since we are unable to detect any patterns based on day of the week. Lets dive in to analyse the air pollution level by hourly level.

Average PM10 Concentration Level By Hours.png

From the line graph, we are able to find out that:

  1. The average concentration level throughout the day is 31.89
  2. At 5am, the concentration level starts to increase and hit its peak at 8am.
  3. At 2pm, the concentration level starts to increase and hit its peak at 9pm.


Anomalies and Trends of possible interest

  1. Since there are missing data from January 2017 to October 2017, we cannot determine the situation of air pollution during that period of time and discover whether the air pollution improves over that period of time.
  2. The sharp peaks are during the winter period (November to January). This might due to the citizen burning the coals to keep themselves warm during the winter period hence worsening the pollution level.
  3. Period between May to September, the concentration level are below the daily limit value set by EU of 50. This might be because it is the summer period when the temperature is generally higher and hotter. The citizen does not need to burn as many coals to keep themselves warm.
  4. The average concentration level for traffic station type is more than the average concentration level for background station type. This is because the stations are placed near the traffic roads which cars exhaust toxic gases hence the station picked up higher concentration level. Citizens who stay near the traffic will be more exposed to harmful pollution level.
  5. At 5am, the concentration level starts to increase and hit its peak at 8am. This might be the time period where the citizen are heading to work as they might be driving to work.
  6. Even though the average hourly concentration level is below 50, it will definitely be higher during November and January


Analysis of potential problems to the environment

  1. One possible conclusion for having high peaks from November to January is the burning of coal to withstand the cold of the winter season. If the behaviour persists, the concentration level during this peak will be always high. Besides worsening the pollution level, it might danger the health of the citizen which can result in cancer or early death.


Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

Sensor's Coverage and Distribution

Distribution of Sensors sheryl.png

After plotting the locations of the sensor, We realised that the sensors are very widespread and some of the sensors are outside Sofia City. Hence, we are going to exclude those located outside Sofia City for our analysis.

Slocation.png

After excluding sensors which are outside Sofia City, we realised that:

  1. Most of the sensors are located at the centre of Sofia City
  2. There are not many sensors at the North-Eastern and South-Eastern part of Sofia City hence the sensors are not very well distributed within Sofia City. More sensors should be placed to gather more data so that we can know the air pollution condition in other parts of Sofia city.


Sensor's Coverage and Distribution over Time

Distribution trend.png

Comparing with September 2017 and August 2018, we can see that the number of sensors increased over the years. More sensors are being placed in the middle of Sofia City and some sensors are being places further away. From this trend, we can foresee that more sensors will be placed further away from the centre of Sofia City to collect more data.


Working Conditions of Sensors

Number of Records Captured by Hours.png

To check the working conditions of the sensors, we need to analyse the number of records captured over time to analyse whether the sensor breakdown at any point in time. Based on the line graph above, there are a few dates which have a drastic decrease in the number of captured records. This shows that some of the sensors stopped working at that period of time. They happened during January, April, May and July. Possible reasons on why the sensors not working properly as in April to July is during the summer season, the hot weather might have caused the sensors to stop working properly.

Failure.png

Zooming into two drastic drops in April 2018 and May 2018, we realised that the sensors which malfunctioned are mostly the same group of sensors. Hence, the group of citizens who are taking care of this group of sensors should pay more attention and increase maintenance frequency to ensure that the sensor is working well to collect air pollution data.

Air Pollution Measurements

Pollution-p1-p2.png

For both p1 and p2, the centre of Sofia City shows relatively higher readings than others.

P1 changes.png

Interestingly, between May to June, the area which shows relatively higher readings were at different area compared to other months.

P2 changes.png

The same pattern occurred for P2 as well.
This is important to study and identify the factors behind these patterns so that appropriate measures can be executed to decrease the air pollution at that area. Hence, the air pollution measures differences might be time dependent.

Task 3: Identifying Relationships

There are factors which will affect the air pollution concentration level. Hence, it is important to find out the relationship between factors so that we can understand the situation better to find out the root cause of fluctuation.

Factors with no Relationship with Air Pollution Level

Nochange.png

As shown in the above graph, there is no correlation/relationship between both p1 and p2 with the following factors:

  1. Humidity
  2. Pressure
  3. Wind Speed
  4. Precipitation

Factors with Relationship with Air Pollution Level
Temperature

P1p2-temp.png

From the graph, we can deduce that the temperature levels will affect for both p1 and p2. When the temperature decrease, p1 and p2 level increases. This can be linked to our first discussion on the air pollution level is relatively higher during the winter period (November to January) when the temperatures are lower.

Dew Point

P1p2-dew.png

From the graph, we can deduce that the dew point level will affect both p1 and p2. When the dew point decease, p1 and p2 level increases.
Visibility

P1p2-viss.png

From the graph, we can deduce that the visibility will be affected by p1 and p2 levels. When p1 and p2 level increases, the visibility level will increase as well.

Conclusion

Based on our visualisation, we can conclude that:

  1. Air pollution level will worsen during the winter period as temperature decreased, the pollution level will increase
  2. Number of sensors are increasing over the years and most of the sensors are situated in the centre of Sofia City
  3. Temperature, Dew Point and Visibility are factors which have correlations with p1 and p2 values


In order to improve the air pollution condition in Sofia, it requires both efforts from the government and citizens.
Precautions that citizens can take:

  1. Always wear a mask especially during the winter period (November to January)
  2. Avoid exercise at peak periods
  3. Using the air purifier to "clean" the air around them

Possible suggestions to improve the air pollution in Sofia which can be executed by the government:

  1. Lower the prices of public transport to entice more people to take the public transport instead of driving
  2. Increase the prices of cars to deter people from buying cars or to bring in eco-friendly cars
  3. Provide subsidy for citizen to purchase electronics such as electric warmer to reduce the frequency of burning coals


References


  1. https://airlief.com/air-pollution-during-winter/
  2. https://www.tableau.com/about/blog/2018/7/new-features-beta-heatmaps-richer-formatting-options-and-more-discoverable-content
  3. https://github.com/dbarthe/geohash/


Comments

Please feel free to give some feedback!

No. Name Date Comments
1. Insert your name here Insert date here Insert comment here
2. Insert your name here Insert date here Insert comment here
3. Insert your name here Insert date here Insert comment here