ISSS608 2018-19 T1 Assign-Song Chenxi Data preparation

From Visual Analytics and Applications
Jump to navigation Jump to search

TRIO.jpg Air pollution in Sofia

Overview

Data preparation

Task 1

Task 2

Task3

 


Data preparation

Part A Raw data

1>Official air quality measurements (5 stations in the city) (EEA Data.zip) – as per EU guidelines on air quality monitoring see the data description.

2>  Citizen science air quality measurements (Air Tube.zip), incl. temperature, humidity and pressure (many stations) and topography (gridded data).

3>Meteorological measurements (1 station) (METEO-data.zip): Temperature; Humidity; Wind speed; Pressure; Rainfall; Visibility

4>Topography data (TOPO-DATA)

Part B Data cleaning

1 EEA data

 

Group8 Figure2.png

With all data in EEA zip are from 2013to 2018 separately, I concatenated data among all the time and joined with metadata as well as “sofia_topo” data.

 

2 Air tube data

Group8 Figure3.png

 

After joining two years’ data, use R code to transform geohash to latitude and longtitude.

geocoded <- gh_decode(data$geohash)

joined_data <- cbind(data, geocoded)

write_csv(joined_data, path = "sofia-air_air-sofia/Air Tube/data_bg_2017_geocoded.csv")

 

Group8 Figure4.png

 

Group8 Figure5.png

 

3 lbsf_20120101-20180917_IP

Joined IP data with EEA data to get the value of concentration.

4 Air tube

 

Group8 Figure6.png

 

Group8 Figure7.png

 

During the EDA analysis, it is observed that the max

Value is 435 and min is -5573, which is unreasonable. So need to delete these data.

After removing the data under-5, the distribution is more normal distributed.

 

Group8 Figure8.png

 

Part C Software Used in Analysis

·       Tableau

·       JMP Pro

·       R