IS428 2018-19 T1 Assign Fu Yu

From Visual Analytics for Business Intelligence
Revision as of 17:31, 11 November 2018 by Yu.fu.2015 (talk | contribs)
Jump to navigation Jump to search

Problem & Motivation

Air pollution is an important risk factor for health in Europe and worldwide. A recent review of the global burden of disease showed that it is one of the top ten risk factors for health globally. Worldwide an estimated 7 million people died prematurely because of pollution; in the European Union (EU) 400,000 people suffer a premature death. The Organisation for Economic Cooperation and Development (OECD) predicts that in 2050 outdoor air pollution will be the top cause of environmentally related deaths worldwide. In addition, air pollution has also been classified as the leading environmental cause of cancer.

Air quality in Bulgaria is a big concern: measurements show that citizens all over the country breathe in air that is considered harmful to health. For example, concentrations of PM2.5 and PM10 are much higher than what the EU and the World Health Organization (WHO) have set to protect health.

Bulgaria had the highest PM2.5 concentrations of all EU-28 member states in urban areas over a three-year average. For PM10, Bulgaria is also leading on the top polluted countries with 77 μg/m3on the daily mean concentration (EU limit value is 50 μg/m3).

According to the WHO, 60 percent of the urban population in Bulgaria is exposed to dangerous (unhealthy) levels of particulate matter (PM10).

Dataset Analysis & Transformation Process

Decode the geohash column in Air Tube data files

Geohash tells the station locations. However Tableau is not able to interpret geohash as geographic data. Before Air Tube data is imported to Tableau for analysis, geohash needs to be decoded into geographical coordinates. As the two Air Tube data files- data_bg_2017.xlsx and data_bg_2018.xlsx are of big sizes and there are duplicate geohash records in the data, an Excel file containing a unique geohash list was created.

Step 1: Use "pygeohash" package to decode the geohash list and output the coordinates in an Excel file

Geohash decode yu.fu.2015.PNG

Step 2: Combine geohash list and coordinates list into one Excel file and update the coordinates, latitude, longitude in data_bg_2017.xlsx and data_bg_2018.xlsx using VLOOKUP, LEFT and RIGHT functions in Excel

Coordinates latitude longitude yu.fu.2015.png

Dataset Import Structure & Process

Interactive Visualization

Interesting & Anomalous Observations

Task 1: Spatio-temporal Analysis of Official Air Quality

1.1 Most Recent and Past Situation of Air Quality in Sofia City

Sofia city air 2018 yu.fu.2015.png
Sofia city air 2013 yu.fu.2015.pngSofia city air 2014 yu.fu.2015.pngSofia city air 2015 yu.fu.2015.pngSofia city air 2016 yu.fu.2015.pngPM10 concentration lengend yu.fu.2015.png

The above graphs display the PM10 concentration in Sofia City from January 2013 to September 2018 where data for 2017 is omitted because there is only November and December data. Dark green indicates good air quality where PM10 concentration <= 50 ug/m3 . Light green indicates satisfactory air quality where 51 ug/m3 < PM10 concentration <=100 ug/m3. Yellow indicates poor air quality where PM10 concentration > 100 ug/m3. Each heatmap represents the air quality of one measured station in Sofia City.

Trends:

  • Air quality in all the four stations is usually poor at the start and the end of a year.
  • Being the station with fewest days having good air quality in 2013, station STA-BG0052A has the most days with good air quality in 2018 compared with other stations.
1.2 A Typical Day in Sofia City

A typical day in sofia city yu.fu.2015.png

To investigate how PM10 concentrations vary from hours to hours in a day, one can simply highlight a day on the heatmap and the hourly PM10 concentration graphs will update accordingly. For example, as shown in the diagram above, on 8 January 2018, air quality in stations STA-BG0052A and STA-BG0050A is poor after 10pm while air quality in station STA-BG0073A is poor starting from 12pm and in STA-BG0040A, air quality is poor throughout the day.

Trends:

  • There is no obvious trend showing on what time of a day the PM10 concentration is higher or lower
  • Nevertheless, PM10 concentrations do vary from time to time in a day.
1.3 Anomalies in the Datesetss

  • From 2013 to 2015, hourly PM10 concentration data is not available. Data was either only collected at 00:00 once or the hour when the data was collected was not recorded from 2013 to 2015. Also, in 2016, data was only collected at 00:00 on some days. If the data was only collected at 00:00 or only collected once, it might affect the accuracy of PM10 concentration distribution across the year because PM10 concentrations are different at different time in a day. It might happen that at the time the data was collected, the PM10 concentration was too high or too low, which would not be representative of the PM10 concentration of a day.
  • For 2017, only November and December data is available. Hence changes in air quality from 2016 to 2017 could not be investigated which might give the insights of why air quality in 2018 has improved.

Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements

2.1 Sensor Coverage, Performance and Operation

  • 2.1.1 Are the sensors well distributed over the entire city?

Q3:

References