ISSS608 2018-19 T1 Assign Kateryna Mazurenko Methodology

From Visual Analytics and Applications
Revision as of 12:31, 18 November 2018 by Kmazurenko.2017 (talk | contribs)
Jump to navigation Jump to search

Photo verybig 155447.jpg AIR QUALITY IN BULGARIA

Introduction

Methodology

Insights

Conclusion

Dashboard

 


Data Preparation

Air Tube (air quality and meteo data measured by citizen) 1. Reverse geocode using geohash package in R - for both files (2017 and 2018 data) and saving it as csv back.

Citizen data - geohash recode

2. Data comes as two files for 2017 and 2018 year, so at first, is was concatenated using JMP software (applying function directly as data has the same structure). 3. Concatenated file: missing pattern run to check for empty rows, 4 rows were excluded and data with no missing fields was saved.

Missing pattern run

4. Run distribution to check for trustworthy values.

Citizen measurements distribution
Citizen measurements distribution meteo

5. Distribution shows a lot of negative values for temperature, pressure and humidity as well as unreasonable(impossible) maximal values. I checked for possible weather conditions in Bulgaria and applied formulas to replace these values with empty space.


I didn’t recode big values in P1/P2 measures as there is no confidence if it’s valid or not. I also regard there two measures separately in future analysis.

6. Format date and time using lubridate in R


EEA Data (official air quality measures) 1. Perform concatenation using JMP - combining all data into one file 2. Check missing pattern, check distribution how many data comes as measured hourly/daily/various

3. As it was not in uniform format, I grouped it as daily average in R. I will use average daily concentration in future analysis.


Meteo-data 1. Run missing patterns and run distributions in JMP - all the data looks trustworthy except for missing values coded as -9999 (as was said in metadata file)

2. Used formulas in JMP to replace unreasonable values with empty space - example shown.

3. Change date format using lubridate in R so it will be possible to join with other files

Topo data was taken as is


Dashboard Design

Methodology heyhey