ISSS608 2018-19 T1 Assign Lim Si Ling Evelyn Task 2
|
|
|
|
|
|
Figure 1 – Coverage of data points collected by Citizens
Data Anomaly
1 – There were data points outside Sofia City but in Bulgaria. These data points will be excluded from the analysis with creation of group to separate readings taken within Sofia City with those outside Sofia City.
2 – Even within Sofia City, the data points are not evenly distributed; most of the points are found in the central region and data points were very sparsely distributed in the north eastern region. Also, number of data points increases over time.
Figure 2 - Outliers in data collected by Citizen for Pressure, Temperature, Humidity, P1 and P2.
Charts show median for the various measurement aggregated by each citizen's datapoint (i.e. geohash) for each week. Boxplot is put in place to facilitate the detection of outlier especially for Temperature and Humidity where measurements varied with respect to season (i.e. date).
Observations
1 - There are obvious outliers such as Pressure=0, Temperature lower than -100, Humidity=0 and extremely high P1 and P2 concentration values.
2 - Not all the observations for the same citizen were wrongly captured. i.e. For geohash sx3x9tpv3kv, Pressure, Temperature and Humidity had some outliers but not for P1 and P2 concentration.
Approach
Groups were created to remove outliers and median of each day will be used for the later analysis and geographical location will be grouped with hexagonal binning to cater for these outliers.
Figure 3 – Scatterplot of P1 and P2 against various measures
Observations
- Both P1 and P2 correlation with Pressure, Temperature and Humidity in the way.
- Pollutants are positively correlated with Pressure and Humidity.
- Interestingly, pollutants peaked between -5 to 10 degree celsius and is generally negatively correlated with temperature.
Figure 4 – Scatterplot of P1 and P2 against various measures
Observations
1 - P1 and P2 concentrations are highly correlated.
2 - P1 pollutant is generally higher than P2 and both pollutants were generally higher in the winter months from Dec 2017 to Jan 2018.
3 - 95 percentile P1 and P2 concentration was used to remove outliers in the reading, in addition to removal of outliers. It peaked at 8-9 Jan 2018, then followed by 27 Jan 2018.
4 - Median P1 and P2 concentration peaked on 27 Jan 2018.
Figure 5 – Time Series of the measurements and Interaction
This chart is put in place to uncover any potential relationship between each measurements and time. However, there is no obvious pattern from the 6 charts above. Temperature changes accordingly due to season.
Figure 6 – The 2 maps show how median concentration of P1 and P2 changed with respect to day and location.
Observations
1 - Median P1 and P2 concentrations were generally more sensitive to time compared to location, i.e. high values are detected during the winter months from Dec 2017 to Jan 2018.
2 - Median concentration of P1 were generally higher than median concentration of P2 across time and location.
3 - North-Eastern and North-Western regions were more prone to unhealthy range of P1 and P2 pollutant.
Figure 7 – The 3 maps show how median pressure, temperature and humidity changed with respect to day and location.
Observations
1 - North-Eastern region experienced slightly higher pressure across time.
2 - Temperature did not fluctuate much across location and subjected to season.
3 - Humidity is usually higher the inner city.
Figure 8 – P1 and P2 concentration on 16 Jan 2018
Figure 9 – Pressure, Temperature and Humidity concentration on 16 Jan 2018
As observed and mentioned earlier, P1 and P2 concentration is positively correlated with Pressure and Humidity.
Tableau Public link for Air Quality in Sofia City - Citizen Data Do note that as the data used is big and data cleansing was done in Tableau, the performance of the dashboard might be slow.
Banner Photo from Pexels