ISSS608 2018-19 T1 Assign-Song Chenxi Data preparation
|
|
|
|
|
Data preparation
Part A Raw data
1>Official air quality measurements (5 stations in the city) (EEA Data.zip) – as per EU guidelines on air quality monitoring see the data description.
2> Citizen science air quality measurements (Air Tube.zip), incl. temperature, humidity and pressure (many stations) and topography (gridded data).
3>Meteorological measurements (1 station) (METEO-data.zip): Temperature; Humidity; Wind speed; Pressure; Rainfall; Visibility
4>Topography data (TOPO-DATA)
Part B Data cleaning
1 EEA data
With all data in EEA zip are from 2013to 2018 separately, I concatenated data among all the time and joined with metadata as well as “sofia_topo” data.
2 Air tube data
After joining two years’ data, use R code to transform geohash to latitude and longtitude.
geocoded <- gh_decode(data$geohash)
joined_data <- cbind(data, geocoded)
write_csv(joined_data, path = "sofia-air_air-sofia/Air Tube/data_bg_2017_geocoded.csv")
3 lbsf_20120101-20180917_IP
Joined IP data with EEA data to get the value of concentration.
4 Air tube
During the EDA analysis, it is observed that the max
Value is 435 and min is -5573, which is unreasonable. So need to delete these data.
After removing the data under-5, the distribution is more normal distributed.
Part C Software Used in Analysis
· Tableau
· JMP Pro
· R