ISSS608 2018-19 T1 Assign Zuo Anna Task 1

From Visual Analytics and Applications
Revision as of 14:13, 18 November 2018 by Anna.zuo.2017 (talk | contribs) (Created page with "<div style=background:#dadaeb border:#A3BFB1> <b><font size = 6; color="#525252"> Sofia City:One of the most polluted cities in Europe </font></b> File:Sofia Z.jpg|440px|fra...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Sofia City:One of the most polluted cities in Europe Sofia Z.jpg

Background

Task 1

Task 2

Task 3

Conclusion

 


Spatio-temporal Analysis of Official Air Quality

Methodology

Data Overview (Know Your Data)

Station Timeseries(Original records for each year) Consolidation(Exclude title)
Station 9421 2013_Day(358), 2014_Day(365), 2015_Day(357), 2016_HD(464), 2017_HV(777), 2018_H(5965) 8,280 Records
Station 9484 2013 _Day(314), 2014_Day(342), 2015_Day(264) 917 Records
Station 9572 2013_Day(365), 2014_Day(345), 2015_DH(347), 2016_DH(453), 2017_HV(777), 2018_DH(6097) 8,378 Records
Station 9616 2013_Day(344), 2014_Day(363), 2015_DH(352), 2016_DH(465), 2017_HV(776), 2018_DH(5449) 7,744 Records
Station 9642 2013_Day(364), 2014_Day(360), 2015_DH(358), 2016_DH(514), 2017_HV(752), 2018_DH(6051) 8,393 Records
Station 60881 2018_DH(6005) 6,004 Records
Consolidate All Stations 2013 - 2018 39,715 Records

* Notes
1. DH represent for Day and Hour which means that the concentration of the pollutant was recorded either by Days or Hours over the relanvent period. If the station records the concentration by day, each day should have only one reaching. But if the station records the concentration by hour, each hour have only one reading and each day will have 24 readings.
2. HV represent Hour and Var(showed by dataset). From the data, it is noticed that both of them represent the measurment for concentration by hour.
3. Finally, we will want all the records from above six stations consolidate together, so it is more convenient to make interactive visualization in Tableau by filtering the station, year and method of the measurement(by day or by hour).

Unevenly spaced time series

Before we analyse the unique or similar patterns for the readings provided by six stations, it is quite important to make clear on what characteristic of the time series for each station that we have because the time period of the readings recorded by each station is not consistent and we need to separately compare and conclude the patterns for the group of stations which have the data on the same time period.

Period Type Group Year Station Month
Past Period
(Daily based time series)
Group 1 2013-2016 Station 9421
Station 9484
Station 9572
Station 9616
Station 9642
12 Months
Group 2 2017 Station 9421
Station 9572
Station 9616
Station 9642
Nov&Dec
Recent Situation
(Hourly based time series pattern)
Group3 2018 Station 9421
Station 9572
Station 9616
Station 9642
Station 60881
12 Month with hourly readings

Data Preparation

Step1.Format Data

PZ1.png

Step2.Consolidate Data

Step1.Separate the mutiple spreadsheets into different folder according to station names.
Step2.Consolidate all the spreadsheets in each folder to one singel excel file.
Step3. Using the VBA coding to gather all the sheets in the Consolidate file and name the new created sheet as Summary
Step4. Finally, after we get six summary sheets, we use the same VBA coding again to consolidate all the summary sheets into one. Here, we aquire all the station information into one sheet and put it into tableau and visulise different station in different period by suing filter function.

PZ2.png

Insights & Dashboard Design

Group1.“U” pattern on time series (2013-2016)

Characterize the past period with respect to air quality measures in Sofia City (2013-2016)

Overview

Over the 4-year period, there are five stations showing a similar pattern which the concentration on P10 went up extremly high both in Jan and Dec for each year, so the line is drawed like a letter “U” on the one year time series. The number which is calculated based on average concentration for each station over one year frequenly lies between 40 µg/m3 to 50 µg/m3 except the average numbers for Station 9484 which are just above 50 µg/m3 and the upper bound is also much higher than the other 4 stations. However, the good trend is that the concentration decreased continously from 2013 to 2016 indicated by the decreasing average concentration and the upper bound which is one standard devation above its one year average concentration.

G1Z.png
G2Z.png

The readings showed by Station 9484 from 2013 to 2015 is much higher than the other stations within the same time period, although the average concentration decreased from 52.4 µg/m3 to 41.0 µg/m3 in 2015 for P10.

Outliers

The outliers for the first four stations during Dec and Jan are all peaked around 400 µg/m3, while the outliers recorded by Station 4262 reached at most around 300 µg/m3 which is lower than the other stations over the period. Since the outliers has much higher readings, it will highly affect the calculation on the average concentration on P10 and lead our analysis biased if we focus too much on comparing the average concentration among these stations. In other words, the higher average amount on concentration can not perfectly prove that the pollution is quite serious in this year. Another situation is that the air condition was not too bad, but the reason for the higher number on average concentration may mainly lead by the outliers with extremely large number appearing in this year.

Trend

Description Illustration
In Jan &Nov, most stations show the upward trend on the concentration of P10, while in Oct&Dec, most stations show the downward trend on the concentration.
G5Z.png

Group2. Four Peaks Period in Dec 2017

Characterize the past situation with respect to air quality measures in Sofia City (2017)

Description Dashboard Visualization
1. Overview on the time series
When we go further to look at the concentration on P10 in Dec of year 2017, it is noticeable that the four available stations show the similar pattern over this month. The amount of concentration went up and down simultaneously with the peaks and valleys almost within the same time period.

The unusual thing is that the average amount of concentration showed by the last three stations indicates that the air condition is more serious but except Station 9421 which showing the relatively lower number on the average.
G41Z.png
2. Specific Days in Calendar
The interesting finding is that the peaks are mainly concentrated over these four short time period of December 2017. The first short period is the fist three days in December, then the second is 8th to 16th of the month, the third is 26th to 27th, lastly peaked at the end days of the month.
G3Z.png

Group3 Daily and Hourly Pattern in 2018

Characterize most recent situation with respect to air quality measures in Sofia City (2018)

Description Dashboard Visualization
Overview

In year 2018, we can go into details on analysing hourly change of the time series on the amount of concentration. There are five stations available. The common pattern for this group of stations is that most of the outliers appeared during the first quarter in 2018, but Station 9642 is a little different with a longer time period and more outliers. The pattern for this station is quite fluctuated over the first half year in 2018.





Outliers
The outliers recorded by Station 9572, 9616, 60881 is much higher with the readings at around 500 µg/m3. However, the outliers in the series recorded by Station 9421 and 9642 are lower at around 200 µg/m3. As the same situation in 2017, Station 9421 is unique because it records the concentration data with much lower average number on the concentration of P10 than it was showed by the other four stations. Although the readings in Station 9421 may indicate a good air condition for P10 in 2018, we should also need to refer to more data provided by the other four stations to evaluate the overall air condition on P10 in Sofia.

G6Z.png
G7Z.png
Daily Pattern during the months

Since the line chart can only give us the general overview about the pattern on time series, the calendar heatmap can tell us in detail about which specific day the concentration on P10 is much higher than the usual days based on the daily analysis.

The common pattern showed by all the five stations is that the air condition is much worse in Jan than the other month in 2018, and the days are concentrated in three short period in Jan. The first period of serious pollution days is 1st and 2nd, the second period is 5th to 9th, the third period with the most serious air conditions is 25th to 28th in Jan.

G8Z.png
G9Z.png
Hourly Pattern during the days (Special Cases)

From the former analysis, we already know that the concentration went higher for several special days during Jan of 2018. Now, it is time for us to detect the hourly pattern for the selected days with much higher concentration.
Highest concentration: Station 9572
The common pattern for this station based on hourly recoded concentration during the days is that the concentration went higher at night (20:00 to 0:00) and early morning (0:00 to 4:00). Accordingly, we may infer that this kind of pattern may lead by some fossil fuel plants which illegally discharge the pollutant such as P10 during the time when people is not so active.
Most fluctuant: Station 9642
The common pattern for this station over the selected days is quite serious because the concentration is continuous keep higher over the day, or the half of the day. The time lasting very long with higher concentration on P10 during the day which indicates the air condition is very harmful for the residents in Sofia City.

G15Z.pngG14Z.png