ISSS608 2018-19 T1 Assign Cao Xinjie Task1
|
|
|
|
|
|
|
Contents
Data Overview
After joining all the data together, we can find out the date of data is from 2013-01-01 to 2018-09-13. The air quantity has three data types (day, hour, var), but the hour data start from 2015. The Sofia has six Air quantity stations (40, 50, 52, 54, 73, 79), each station has different its own location, altitude and other information. The station 40A, 50A and 52A are background air pollution, the other three stations are traffic pollution.
Date Average Data
The following image shows the date average air quality in five years, x-axis means all the date in one year, y-axis means the air quality index. we can find out the during Dec and Jan the air pollution increases a lot than the other month, which is obviously.
Two reasons can explain the shaped rise of air pollution in the winter. First, Sofia needs to burn fuel for heating, the fossil fuel contains carbon which is bad for air. Second, in the winter, the car will emit more harmful particles because of inadequate fuel burning.
Traffic Influence
To verify whether the second guess is correct, I make the first three station (40A, 50A, 52A) in a group and (54A, 73A) in a group. Because the first group’s pollution type is background which means general pollution, the second group is the traffic pollution. The reason why not use the 79A station is that 79A only have the data in 2018, this chart uses the date from 2013-2017.
The figure above is using group2’s data minus group1’s. The y-axis is the month, the x-axis is the difference of average air quality (group2-group1). In the chart, we can confirm our guess, the traffic air pollution increases a lot in the winter. The air quality near the traffic is much larger than the general one in Nov to Feb.
Time Data
The figure3 is about the air pollution index of different time. The x-axis is different time and y-axis is the average number of each time. The hour data start in 2016, so we choose the filter of 2016-2018.
Through figure3 we can know that air pollution is getting better year by year. The time of day with serious air pollution is concentrated in the morning and evening rush hour. 8:00-9:00 in the morning and 17:00-19:00 in the afternoon always have a higher number of air pollution. Meanwhile, 0:00-6:00 and 11:00-14:00’s air pollution always lower than average number. So, the traffic emission can influence the pollution distribution.
Time Data by Traffic
In the figure below I also group the stations by traffic or general, we can find out in 2017 and 2018 the air pollution at 20:00-1:00 increase a lot. It’s an anomaly in the data, according to common sense, the pollution is heavier in the night than noon, but the sudden increase in 2017 has no reason. Because the high pollution in the rush hour is caused by the traffic, I cannot see the difference between the two group. The general air pollution group have the same trend with the traffic detection station.
Week Data
We use the Tableau to get the weekday, through the figure we can know that air pollution of each day is different year by year. Sometimes the weekday has higher pollution, but some years Sunday's pollution is the highest. So there isn't a general regular for the distribution of week.
Station Distribution
The distribution of six stations, I also put the altitude and concentration information on it. But we cannot find the connection between the altitude and the air pollution. The air pollution didn’t decrease with the higher altitude.
VAR Time Problem
A problem in the data is the “var” averaging time, all the var data is in 2017 and it looks like hour data but one minute more than hour type data. In the distribution of var data, we can know that the whole trend is like Figure4 but the air pollution in 3:00-5:00 is lower than figure4. Meanwhile, the pollution during 22:00 – 2:00 is still high. This problem may decrease the accuracy of the result in 2017.
Treemap
Two dimensions are years and stations. Through the map, we can find out a problem of the data is that there is no daily data in 2017. We can also find out that the air pollution in Sofia is getting better year by year, but the worst air quality place is changing by the time. In these five years, the 40A station’s air didn’t decrease a lot.