ISSS608 2018-19 T1 Assign Kateryna Mazurenko Methodology
|
|
|
|
|
Data Preparation
Air Tube (air quality and meteo data measured by citizen)
1. Reverse geocode using geohash package in R - for both files (2017 and 2018 data) and saving it as csv back.
2. Data comes as two files for 2017 and 2018 year, so at first, is was concatenated using JMP software (applying function directly as data has the same structure).
3. Concatenated file: missing pattern run to check for empty rows, 4 rows were excluded and data with no missing fields was saved.
4. Run distribution to check for trustworthy values.
5. Distribution shows a lot of negative values for temperature, pressure and humidity as well as unreasonable(impossible) maximal values. I checked for possible weather conditions in Bulgaria and applied formulas to replace these values with empty space.
I didn’t recode big values in P1/P2 measures as there is no confidence if it’s valid or not. I also regard there two measures separately in future analysis.
6. Format date and time using lubridate in R
EEA Data (official air quality measures)
1. Perform concatenation using JMP - combining all data into one file
2. Check missing pattern, check distribution how many data comes as measured hourly/daily/various
3. As it was not in uniform format, I grouped it as daily average in R. I will use average daily concentration in future analysis.
Meteo-data
1. Run missing patterns and run distributions in JMP - all the data looks trustworthy except for missing values coded as -9999 (as was said in metadata file)
2. Used formulas in JMP to replace unreasonable values with empty space - example shown.
3. Change date format using lubridate in R so it will be possible to join with other files
Topo data was taken as is
Dashboard Design
1. Connect to data source and join all the prepared tables. I used outer join for all the cases to keep all the data in place, official stations and meteodata was joined to citizen file using date as unique id, topology was joined using latitude and longitude fields.
2. Check topology data to see if there is any use of elevation data to further analysis.
To do that:
- Bring lat and long fields to columns and rows (map will be shown)
- Uncheck aggregate values to see all the points on map
- Drag elevation to details
- All the point are on the same distance, so select shape-square in marks to cover the area at most, adjust size
- Drag elevation to color and adjust opacity
As we can see, topology data only shows that there are mountains at the south west of the city - the same information is provided by the background map itself, so I will not use topology layer further.
3. Overall pattern of official stations measurements
I want to see high-level data provided by all stations to compare their measurements as well as see the pattern through all the years.
To do that:
- Bring date: year and month into columns
- Check the format to be generic year/month
- Drag PM10 concentration into rows, check it as average
- Bring stations into filters and to colour
- As we know from EEA standards, harmful level of PM10 in the air is 50mg/m3, so add this as reference (analytics - constant line)
4. Build daily/calendar view for a station in selected year
I want to see how the concentration of PM10 changes day to day within a year period so understand patterns in details.
To do that:
- Date field (year and month) bring to rows, checking as generic format
- Day - to columns
- Station to columns
- PM10 concentration (as average) - to rows, selecting bars for a view
- Station and Year - include to filters to select one year only and one station only
- Format tooltip to make it interactive
- Format title to make it interactive
Overall view gives us possibility to check concentration for selected date (year - in filter, and day - on hover as tooltip), it also gives an understanding of trend.
5. Combine overall pattern of official stations measurements and daily/calendar view for a station into one dashboard. Add dashboard action to highlight selected date - so that I can see the exact data for selected date and how it compares to stations pattern and yearly pattern.
Add text to illustrate where the reference is taken from.
6. Meteo data in comparison to air pollution level
I want to see is there any relation between change in weather conditions and PM10 particles concentration.
To do that:
- Use exact date at columns
- Drag average concentration and temperature to rows and make them dual-axis
- Add Station selection to filters, add date range to filters
- Format axis to make user-friendly name, add units of measure
- As reference add 50mg/m3 level to show harmful level of concentration
- As reference to temperature add 12 degrees as the temperature when heating season starts in Bulgaria
- Annotate marks
- Change colours so that temperature will be red, as well as assigning the colours for the stations so they will be the same in all meteo-measurements graphs
- Format title and tooltips to make it interactive
All the steps: repeat for humidity, wind speed and precipitation. Ensure that colours for the stations are the same.
7. I will put temperature graph into separate dashboard as it shows visible trend: under lower temperatures concentration of PM10 is higher.
8. All other meteo-data will be combined into one dashboard. Add action to see the point for selected date through all the measures taken. Ensure that filters apply to all the sheets on dashboard.
9. Comparison of official data and citizen measurement.
I want to see how they relate one to another. I also want to see the difference between P1, P2 and combined data.
To do that:
- Create calculated field for P1+P2 concentration
- Add date to columns, concentrations - to rows
- Make them dual-axis, sync axis to have the same scale, edit axis to make it user-friendly, add units of measure
- Add time range to filters
- All the meteo-data available: show in tooltip
- Add average showing value as reference line for both measurement, add median to have spread understanding
- Annotate highest point to put attention
- The same steps repeat for P1 and P2 separately as we can not conclude which of these field is the right comparison to PM10.
10. Add the sheet showing P1, P2 and combined data into the dashboard, ensure filtration applies to all the sheets there. Add action to highlight the same date selected for all options.
11. Map of citizen measurements.
I want to see where on map the citizen measurements were taken place, as well as what was the concentrations and how it changed through the time.
To do that:
- Bring longitude to columns and latitude to rows, add geohash to details to see all the points on map
- Drag second longitude to columns to have to maps for comparison
- Bring date to pages to play video about situation changing in time
- Bring date also to filter to select specific date snapshot
- Drag P1 to first longitude tab, put P1 concentration both to colour and size
- For the colour: select diverging palette with reversed blue to red and center = 50 (harmful level concentration)
- Repeat the same for P2 on second map
- Edit tooltips and title
- Duplicate sheet, removing all maps details but adding data about weather, to show it as separate panel, not in tooltip
Add citizen data map and weather info-panel to dashboard, add highlight action. As a result, by click to point on map I can see the same point on second map, I also see the weather conditions.
12. For all the dashboards created: add buttons with tooltips to navigate through the infographics. Each repeated button correspond to the same dashboard - for example “meteo” always will bring to meteo stations dashboard, similar to webpage scenario.