ISSS608 2018-19 T1 Assign Chen Jingyi Task 1

From Visual Analytics and Applications
Jump to navigation Jump to search

O706xof6f9b3e2b8.jpg Observable Effects of Bulgaria Air Pollution Crisis

Overview

Data Preparation

Task 1

Task 2

Task 3

 


Basic findings

1.Differences in monthly and yearly trends of each station
Overall: there's no significant difference among the yearly average pollutant indicator of different stations in 2013-2016, but there is a significant drop in 2018 compared to 2016.(Here we don't consider data in 2017 since so many missing values) Trend of months are almost the same in different years; highest value appear in late autumn and winter (October to January),especially there are peaks in December and January; lowest records always detected in spring and summer(March to September).

A typical day in Sofia.

2.Daily records VS hourly records
Here we dig deeper into the dataset by looking at the 2 record types separately:
For the “hour and var” type, data are collected only from 2015 and onwards(For 'Druzhba' only 2016 and onwards), and station 'Orlov Most' doesn't record this type of concentration. The records is only consecutive from Nov 2017 to Sep 2018, others are just random collections from several months, which is weird and not good for getting any trend since not enough data.

A typical day in Sofia.

There are also significant gaps among records of different months, which can range from 8.8 to 300.In 2018, there appear a peak in jan, stable and low in other months, all observation stations got highest records of concentration in Jan, 2016,

Anomalies

After deep exploration, we can gather all the unusual trends and observation in this data set:

  • Ten months' records are missing in 2017, which makes the data highly biased.
  • Station 'Orlov most' stopped collecting data after 2015, and it only records data at midnight, which is very strange.
  • 'Hour and var' type: data only consistent from Nov 2017 to Sep 2018, others just random collection from several months, which is weird.In addition, data are collected only from 2015 and onwards(Druzhba only 2016 and onwards),
  • 'Day' type: there's no 2017 data, for 2018 there's only records of April and May.
  • The different types of 'AveagingTime type' can cause inconsistencies throughout the data.

Interesting trends

Possible influences of anomalies

A typical day for Sofia city

Data visualization and application design