ISSS608 2018-19 T1 Assign Wu Jinglong Task2
|
|
|
|
|
|
Data preparation
For 2017 and 2018 data files, geohash decoding is done separately By running following R code, the geo location data can be abstracted. In the end two data files "NBG2017.csv" and "NBG2018.csv" are generated, after data exploration I decided not to join the data cause some of the station names can't be mapped.
Performance
We observed that there is data outlier in temperature, humidity and pressure data. Which means not all sensors are working properly at all times. Tableau filter is used to filter out unrealistic values.
Filtering is based on the following formula: - Humidity should be between 0 to 100; - Temperature shall not be lower than -50 - Pressure shall be in between 50k to 140k - The P1/P2 value shall not be more than 500
After filtering the data set is more accurate:
Sensor coverage
Location coverage
By visualizing Sofia topography dataset, we are able to identify if the sensor is located in Sofia city:
For both 2017 and 2018 city original Sensors Coverage Map, it’s not only for Sofia city. Thus we need to filter out the Sofia city readings based on Sofia topography dataset.
Data points after filter:
After the filtering, we plot a time heat map to see the data coverage in year 2017 and 2018, we find there is missing data in 2017 Oct, 2018 April and July:
Air Pollution Measurements
From below density map which represents PM 10 concentration, we can clearly see that the air pollution is mostly in the city area and north area. In the year 2018 there are more new sensors added.
Monthly view
From below monthly view from 2017 Sep to 2018 Aug (P10 concentration), we can clearly see that starting from Oct, the P10 concentration starts to hike, it will reach the peak in January, where the whole city area P10 concentration reading is high, during summer the readings are back to a low level:
Hourly view
We use the most recent 1 month data in 2018 July-Aug to generate an hourly view of density map for P2.5 and P10 concentration, we can see readings during noon is relatively lower, it will start to increase after 17:00, in early morning(4:00-5:00) it will reach the daily peak.