IS428 AY2018-19T1 Huiyeon Kimn
Contents
Problem & Motivation
Bulgaria is suffering from a significant concern which roams around and among many different European Countries. Air pollution, known to be one of the top risk factors for health, threatens the air of Bulgaria. PM2.5 and PM10, widely found air pollutant is found ubiquitously in Bulgaria and know to be far exceeding the restrictions set by the European Union and the WHO (World Health Organization). As of the past 3 years, Bulgaria has had the highest PM2.5 concentrations among its neighboring countries, leading it to become one of the most polluted regions in the world.
With over 60 percent of the urban population in this beautiful country being exposed to dangerous particle matters, the health risk amongst Bulgarians are increasing. As such, there is an urgent need to address such a concern by analyzing the current trends and patterns of PM2.5 and PM10 concentrations so that effective measures can be taken.
We decided to create an interactive visualization, using visual platform such as Tableau to analyze the data collected over the 6 years (2013 - 2018). The platform to be create is made to satisfy these following objectives:
1. Identify patterns, events and abnormal patterns in the Citizen Science Air Quality data through pollution concentrations and other various meteorological data
2. Identify typical patterns, interesting events and trends in the past and recent by the levels of PM10 concentrations, as reported by official data
3. Analyze and identify potential associations among variables that may correlate with the air pollution.
Data Transformation and Analysis Process
We have received 4 sets of data for this analysis assignment. Each of the folders contain different records of data:
• EEA Data (time series PM10 concentrations from 2013 – 2018, recorded as official) • Air Tube Data (meteorological and concentrations from 2017-2018 in various regions) • METEO data (basic statistic summary such as wind, etc. from 2012-2018) • TOPO data (topographical data with elevation)
Official Air Quality Data (EEA)
The given data contains information about 6 stations with time range ranging from 2013 to 2018, depending on the station.
Issue: Different Stations with Different Time Range
Solution:The highlighted station of BG_5_9484 has data only from 2013 to 2015. Since the gap between the other stations and this station is quite big, it will not be meaningful nor correct to show any analysis based on the data of this Air Quality Station. Hence it will be removed from the analysis.
For the remaining dtaa points, we were able to use it with the time series as the data seemed to be correct.
Issue: Geographical Data separated into another Excel Workbook
Solution:The Excel formula shown in the above image was used to match the latitude and Longitude data to our actual data. This formula helped speed up the process by a lot.
Air Tube Data
Issue: Location Data encoded in Geohash
Solution: By writing a script in python, the encoded Geohash was converted back into Latitude and Longitude.
METEO Data
METEO Data had to be transformed in order to answer some answers in Task 3. By doing some exploratory Data Analysis, we were able to find some interesting information, which are:
- Using the map functionality in Tableau, we were able to find that the METEO data describes a location which happens to be the location of Mladost described in Task 1
- Mladost has data which is described in an Hourly manner while the METEO data is currently is in Days. As such, the Mladost data had to be aggregated in terms of Days.
First, the EEA data was manipulated using Python Scripting.
Then the METEO data's date was combined into one column by using the "Combine Column" option in Power Query Excel.
The data now are ready to be analyzed!
Interactive Visualization
The following visuals are the output of the analysis.
Task 1: Spatio-temporal Analysis of Official Air Quality
Due to the difference in averaging time in the EEA dataset where 2018, it is Hourly, while the rest in Daily, we decided to split the two data sets in to two different Dashboards.
Dashboard 1 - 2013 - 2017
Dashboard 2 - 2018
What does a typical day look like for Sofia city? |
---|
As we can see, the trend of PM10 seems to be similar across the 5 air quality stations. At 0:00AM, the PM10 starts out with a higher value and as the morning comes, the fluctuates up and down. Then we see a sudden dip in the concentration (Possibly an anomaly) around 9AM for the stations. After the dip, the conc. increases at 10AM until starts decreasing again from 11AM to 16PM. Then onwards the concentration level seems to increase again until the next morning. This is the typical conc. level per day in Sofia City. |
What does a typical day look like for Sofia city? |
---|
As we can see, the trend of PM10 seems to be similar across the 5 air quality stations. At 0:00AM, the PM10 starts out with a higher value and as the morning comes, the fluctuates up and down. Then we see a sudden dip in the concentration (Possibly an anomaly) around 9AM for the stations. After the dip, the conc. increases at 10AM until starts decreasing again from 11AM to 16PM. Then onwards the concentration level seems to increase again until the next morning. This is the typical conc. level per day in Sofia City. |