Difference between revisions of "Sofia City - Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
(Created page with "<div style=background:#314251 border:#314251> <font size = 5; color="#f4f6f7"> Sofia City - Air Quality Analysis </font> </div> <!--MAIN HEADER --> {|style="background-color:#...")
 
 
(3 intermediate revisions by the same user not shown)
Line 29: Line 29:
 
|}
 
|}
 
<br/>
 
<br/>
 +
 +
==Data Preparation==
  
  
 
==Task 1: Spatio-temporal Analysis of Official Air Quality==
 
==Task 1: Spatio-temporal Analysis of Official Air Quality==
  
Characterize the past and most recent situation with respect to air quality measures in Sofia City. What does a typical
+
===Air Quality Data Distributed===
day look like for Sofia city? Do you see any trends of possible interest in this investigation? What anomalies do you
+
The air quality data available from EEA is from 6 different air quality stations and distributed over 28 CSV files. The data files are checked and determined to have the same data columns and hence are concatenated directly.  
find in the official air quality dataset? How do these affect your analysis of potential problems to the environment?
 
 
 
==Location of Air Quality Stations==
 
 
 
The air quality stations of Sofia City are located in the following regions.
 
 
 
[[File:Task1-Map.png|frameless|600px|center]]
 
 
 
==Past and Present Air Quality Situation in Sofia City==
 
 
 
The quality of the air in Sofia City is measured by the concentration of PM10 particles in the environment. From the Density Map, it is observed that the average concentration peaks every year between November and January. But overall, the air quality appears to improve over time.
 
 
 
[[File:Task1-DensityMap.png|600px|frameless|center]]
 
 
 
The Calendar Heat Map below shows the concentration of PM10 particles by month and weekly over time. Generally, it can be observed that the air quality dips to "Poor" and "Very Poor" levels during the winter months. There appears to be no difference in the air quality between the weekends and weekdays. Based on this visualisation, the air quality appears to improve over time.
 
 
 
[[File:Task1-CalendarHeatMap.png|600px|frameless|center]]
 
 
 
==A Typical Day in Sofia City (2018)==
 
 
 
An average day in Sofia City usually consist of air quality that is classified as fair. According to official air quality, the air quality improves during the day hours between 8am to 8pm and deterioates to moderate level between 8pm to 8am.
 
 
 
[[File:Task1-AverageDay.png|600px|center]]
 
 
 
==Trends of Possible Interest==
 
 
 
===Air Quality Deteoriates during the Winter Months===
 
 
 
An interesting observation noted during the above analysis is that the air quality deteriorates consistently during the winter months over all the years this study covers. There could be good reasons for this trend which can be further pursued subsequently.
 
 
 
===No Difference in Air Quality between Weekends and Weekdays===
 
 
 
Based on the calendar heat map, it appears that there is no difference in the air quality between weekends and weekdays. This strikes me as fairly interesting because in general, we may expect more traffic and vehicles on the road during weekdays, thus resulting in higher concentration of pollutants in the surrounding air.
 
 
 
===Air Quality Appears to Improve over the Years===
 
 
 
The official air quality readings appears to indicate that the concentrate of PM10 particles in the air is dropping over the years. The air quality pollutant index had also slowly improved to Good levels during the summer months. This is an interesting trend to investigate as it could be a sign that Bulgaria's efforts (if any) to clean up the environment are taking effort, or there could be changes in the way air quality is being measured in the recent years.
 
 
 
==Anomalies in Official Air Quality Dataset==
 
  
===Missing Data===
+
===Uneven Spread of Dataset over Time===
It is observed that there is missing data in the official air quality dataset between January and November 2017. There is no way of imputing values into such a wide date range. Hence, it had been decided to leave it as it is. The missing data will show up as blank in the visualisation and analysis.  
+
It is noted that out of the 6 air quality stations, 4 of them provided data across the full date range from 2013 to 2018.  
 +
However, one of the remaining 2 (BG0054A) only contains data from 2013 to 2015. This could be due to the closing down of this station, which resulted in no further outputs of data from 2016 onwards.  
 +
The last station (BG0079A) only contains data from 1st January 2018 onwards. This could be a new station that only started operations in 2018.  
  
 
===Difference in Interval of Measurements===
 
===Difference in Interval of Measurements===
PM10 measurement readings are available in daily form between 2013 and 2016. However, it changed to hourly form between 2017 and 2018. This poses certain challenges in the analysis. Nevertheless, average by day is taken during the Tableau analysis so that the data sampling still takes place by daily in the years 2017 to 2018.
+
It is noted that the air quality measurements prior to end of 2016 are taken on a daily basis. From 2017 onwards, the measurements started to be taken on an hourly basis. This could be due to process improvements to provide data on a more granular basis.  
  
==Impact to Analysis of Potential Problems to Environment==
+
===Joining EEA table with Station Info===
 +
The air quality stations are identified with an unique string identifier, the EOL code. For easy readability, the table consisting of the monitoring stations data is joined with the EEA table, providing every reading with the corresponding station name, latitude and longitude.
  
We are unable to take year 2017 into consideration during trending exercises due to the missing data.
+
==Task 2==
  
For the difference in intervals for the PM10 measurements, this does not pose an issue as Tableau is capable of consolidating the data by da
+
==Task 3==

Latest revision as of 20:07, 18 November 2018

Sofia City - Air Quality Analysis

Background

Data Preparation

Task 1

Task 2

Task 3

 


Data Preparation

Task 1: Spatio-temporal Analysis of Official Air Quality

Air Quality Data Distributed

The air quality data available from EEA is from 6 different air quality stations and distributed over 28 CSV files. The data files are checked and determined to have the same data columns and hence are concatenated directly.

Uneven Spread of Dataset over Time

It is noted that out of the 6 air quality stations, 4 of them provided data across the full date range from 2013 to 2018. However, one of the remaining 2 (BG0054A) only contains data from 2013 to 2015. This could be due to the closing down of this station, which resulted in no further outputs of data from 2016 onwards. The last station (BG0079A) only contains data from 1st January 2018 onwards. This could be a new station that only started operations in 2018.

Difference in Interval of Measurements

It is noted that the air quality measurements prior to end of 2016 are taken on a daily basis. From 2017 onwards, the measurements started to be taken on an hourly basis. This could be due to process improvements to provide data on a more granular basis.

Joining EEA table with Station Info

The air quality stations are identified with an unique string identifier, the EOL code. For easy readability, the table consisting of the monitoring stations data is joined with the EEA table, providing every reading with the corresponding station name, latitude and longitude.

Task 2

Task 3