Difference between revisions of "IS428 AY2018-19T1 Lim Li Xuan"
(Created page with "IS428_2018-19_T1_Lim_Li_Xuan ==Topic== center ==Problem and Motivation== center ==Data Cleaning (Step-By-Step)== Fi...") |
|||
Line 1: | Line 1: | ||
IS428_2018-19_T1_Lim_Li_Xuan | IS428_2018-19_T1_Lim_Li_Xuan | ||
− | ==Topic== | + | ===Topic=== |
[[File:Title.jpg|center]] | [[File:Title.jpg|center]] | ||
Line 8: | Line 8: | ||
==Data Cleaning (Step-By-Step)== | ==Data Cleaning (Step-By-Step)== | ||
− | [[File:Slide4.jpg|center]] | + | ===Union EEA DataSets=== |
− | [[File:Slide5.jpg|center]] | + | [[File:Slide4.jpg|center|thumb]] |
− | [[File:Slide6.jpg|center]] | + | |
− | [[File:Slide6r.jpg|center]] | + | [[File:Slide5.jpg|center|thumb]] |
− | [[File:Slide73.jpg|center]] | + | ===Filter=== |
+ | [[File:Slide6.jpg|center|thumb]] | ||
+ | ===Combining data with different headers=== | ||
+ | [[File:Slide6r.jpg|center|thumb]] | ||
+ | [[File:Slide73.jpg|center|thumb]] | ||
==Data Analysis== | ==Data Analysis== | ||
− | + | ===EEAData=== | |
* Had hourly readings from 2017 onwards but daily from 2013-2016 | * Had hourly readings from 2017 onwards but daily from 2013-2016 | ||
* Had missing values from Jan – Nov 2017 | * Had missing values from Jan – Nov 2017 | ||
Line 23: | Line 27: | ||
* Useful Fields: PM10, Lat, Long, Time, Stations | * Useful Fields: PM10, Lat, Long, Time, Stations | ||
− | + | ===Airtube=== | |
* Only 2 years worth of data | * Only 2 years worth of data | ||
* Possible Abnormalities due to citizen sensors and over the top readings | * Possible Abnormalities due to citizen sensors and over the top readings | ||
Line 29: | Line 33: | ||
* Readings are not only within Sofia City but Bulgaria | * Readings are not only within Sofia City but Bulgaria | ||
− | + | === METEO-DATA=== | |
* 1 Point in the country: Airport | * 1 Point in the country: Airport | ||
* Useful Fields: Lat, Long, Time, Meteorological data [Dew Point Temperature, Precipitation Amount, Wind Speed, Humidity, Surface Pressure, Temperature, and Visibility] | * Useful Fields: Lat, Long, Time, Meteorological data [Dew Point Temperature, Precipitation Amount, Wind Speed, Humidity, Surface Pressure, Temperature, and Visibility] | ||
Line 37: | Line 41: | ||
==StoryBoard 1== | ==StoryBoard 1== | ||
− | + | ===Time Series=== | |
Overview of the PM10 Concentrations collected from the respective Air Quality Stations over the Months between 2013-2018. Once a point is clicked, it expands to 1A to show distribution over days. Then we can take a closer look into which parts of the month caused the overall surge. | Overview of the PM10 Concentrations collected from the respective Air Quality Stations over the Months between 2013-2018. Once a point is clicked, it expands to 1A to show distribution over days. Then we can take a closer look into which parts of the month caused the overall surge. | ||
− | [[File:112mine.png|center]] | + | [[File:112mine.png|center|thumb]] |
− | + | ===Highlight Table=== | |
1A. Highlight table shows the intensity for each station for the entire month. Using this table, we can tell which are the regions that are more polluted than the other, or if it as a common phenomenon across the city. | 1A. Highlight table shows the intensity for each station for the entire month. Using this table, we can tell which are the regions that are more polluted than the other, or if it as a common phenomenon across the city. | ||
− | [[File:2sdfesf.png|center]] | + | [[File:2sdfesf.png|center|thumb]] |
==StoryBoard 2== | ==StoryBoard 2== | ||
+ | ===Time Series & Boxplot & Meteorology Data=== | ||
The second storyboard is separate from the first because it only has 1 year worth of data. After analyzing the trends across the years, months and days. We can now study the pattern across one day in Sofia City. | The second storyboard is separate from the first because it only has 1 year worth of data. After analyzing the trends across the years, months and days. We can now study the pattern across one day in Sofia City. | ||
On selecting a day, we can see the | On selecting a day, we can see the | ||
Line 53: | Line 58: | ||
* Min Max Average for the day | * Min Max Average for the day | ||
− | [[File:332323.png|center]] | + | [[File:332323.png|center|thumb]] |
+ | |||
Hover over to see the readings across stations in a boxplot to represent variance. | Hover over to see the readings across stations in a boxplot to represent variance. | ||
==StoryBoard 3== | ==StoryBoard 3== | ||
+ | ===Dynamic comparison of Meteorology Data against Concentration using a Time Series=== | ||
With our next DataSet, We can now look at Meteorological data. Does the PM10 Conc. Does patterns correlate with Meteorological Patterns? Could the environment be a cause of accumulation of PM10 particles, which results in high PM10 Conc. ? | With our next DataSet, We can now look at Meteorological data. Does the PM10 Conc. Does patterns correlate with Meteorological Patterns? Could the environment be a cause of accumulation of PM10 particles, which results in high PM10 Conc. ? | ||
− | [[File:4eweweq.png|center]] | + | [[File:4eweweq.png|center|thumb]] |
− | [[File:5qwewqeqwe.png|center]] | + | [[File:5qwewqeqwe.png|center|thumb]] |
==StoryBoard 4== | ==StoryBoard 4== | ||
+ | ===Terrain Map & Density=== | ||
How can we better visualize the patterns of PM10 Conc Levels? Through research, considering that Sofia is largely bound by high valleys and that likely a contributing factor to its pollution. There is, therefore, a need for us to picture it on a terrain map to indicate elevated areas. On top of this, we can also use this pages function to run through months of PM10 Conc. on a density map so that we can quickly identify the PM10 Levels. Clicking on each point will bring you to the worksheet. | How can we better visualize the patterns of PM10 Conc Levels? Through research, considering that Sofia is largely bound by high valleys and that likely a contributing factor to its pollution. There is, therefore, a need for us to picture it on a terrain map to indicate elevated areas. On top of this, we can also use this pages function to run through months of PM10 Conc. on a density map so that we can quickly identify the PM10 Levels. Clicking on each point will bring you to the worksheet. | ||
− | [[File:6eadw.png|center]] | + | [[File:6eadw.png|center|thumb]] |
This worksheet wraps up the 6 maps shown above as it shows the concentration patterns over months, regardless of year. This is useful to confirm the months that have the worst air quality. | This worksheet wraps up the 6 maps shown above as it shows the concentration patterns over months, regardless of year. This is useful to confirm the months that have the worst air quality. | ||
− | [[File:Dewddew.png|center]] | + | [[File:Dewddew.png|center|thumb]] |
==Step By Step (After Cleaning)== | ==Step By Step (After Cleaning)== | ||
− | + | ===StoryBoard 1A (Time Series)=== | |
− | StoryBoard 1A (Time Series) | ||
*Filter Year, Month,Day | *Filter Year, Month,Day | ||
*Marks Concentration | *Marks Concentration | ||
Line 76: | Line 83: | ||
*Action On Select, All fields ( year, month, day) to target sheet IB | *Action On Select, All fields ( year, month, day) to target sheet IB | ||
− | StoryBoard 1B (Highlight Table) | + | ===StoryBoard 1B (Highlight Table)=== |
*Filter Year, Month, Day | *Filter Year, Month, Day | ||
*Marks Concentration | *Marks Concentration | ||
Line 82: | Line 89: | ||
*Rows Air Qty Stn | *Rows Air Qty Stn | ||
− | StoryBoard 2A (Circular Chart for hourly data) | + | ===StoryBoard 2A (Circular Chart for hourly data)=== |
*Filter Year, Month, Day ( From EEA) Apply to worksheets 1B | *Filter Year, Month, Day ( From EEA) Apply to worksheets 1B | ||
*In order to limit to 2 years only : If you only want 2017/2018: Duplicate EEA > Extract only 2017,2018 as database > Set Additional filter on worksheet > save | *In order to limit to 2 years only : If you only want 2017/2018: Duplicate EEA > Extract only 2017,2018 as database > Set Additional filter on worksheet > save | ||
Line 93: | Line 100: | ||
*Action On Select, All fields ( year, month, day) to target sheet 2B | *Action On Select, All fields ( year, month, day) to target sheet 2B | ||
− | StoryBoard 1B (BoxPlot) | + | ===StoryBoard 1B (BoxPlot)=== |
*Filter Year, Month, Day Automatically filtered | *Filter Year, Month, Day Automatically filtered | ||
*Marks Air Quality Station | *Marks Air Quality Station | ||
Line 99: | Line 106: | ||
*Rows Year | *Rows Year | ||
− | StoryBoard 3A | + | ===StoryBoard 3A=== |
*Duplicate 1A's worksheet | *Duplicate 1A's worksheet | ||
*Filter Year, Month, Day | *Filter Year, Month, Day | ||
Line 109: | Line 116: | ||
*Action On Hover, filter Worksheets using Meteorological data by “Year” Field. | *Action On Hover, filter Worksheets using Meteorological data by “Year” Field. | ||
− | StoryBoard 3B: All-Time Series | + | ===StoryBoard 3B: All-Time Series=== |
*Filter Year, Month, Day | *Filter Year, Month, Day | ||
*Columns Year Month Day, | *Columns Year Month Day, | ||
*Rows measure | *Rows measure | ||
− | StoryBoard 3C: Cheat to select different views | + | ===StoryBoard 3C: Cheat to select different views=== |
https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_sheet_selector.htm | https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_sheet_selector.htm | ||
*Create a parameter, then a dimension, custom add values and position all views overlapping each other in the dashboard. Eventually, the view will be filtered. | *Create a parameter, then a dimension, custom add values and position all views overlapping each other in the dashboard. Eventually, the view will be filtered. | ||
− | StoryBoard 4A | + | ===StoryBoard 4A=== |
*https://www.mapbox.com | *https://www.mapbox.com | ||
[[File:LOL3SDC.png|thumb]] | [[File:LOL3SDC.png|thumb]] | ||
− | StoryBoard 4B: Create Map with Pages + Density | + | ===StoryBoard 4B: Create Map with Pages + Density=== |
*Filter Pages, year, Month | *Filter Pages, year, Month | ||
*Set filter conc max to 700. | *Set filter conc max to 700. | ||
Line 133: | Line 140: | ||
− | StoryBoard 4C: Jan-Dec Analysis + Density | + | ===StoryBoard 4C: Jan-Dec Analysis + Density=== |
*Filter | *Filter | ||
*Pages All years | *Pages All years | ||
Line 143: | Line 150: | ||
==Findings== | ==Findings== | ||
+ | [[File:Limlixuan VaA3.pdf|thumb]] |
Revision as of 00:16, 12 November 2018
IS428_2018-19_T1_Lim_Li_Xuan
Contents
- 1 Topic
- 2 Problem and Motivation
- 3 Data Cleaning (Step-By-Step)
- 4 Data Analysis
- 5 StoryBoard 1
- 6 StoryBoard 2
- 7 StoryBoard 3
- 8 StoryBoard 4
- 9 Step By Step (After Cleaning)
- 9.1 StoryBoard 1A (Time Series)
- 9.2 StoryBoard 1B (Highlight Table)
- 9.3 StoryBoard 2A (Circular Chart for hourly data)
- 9.4 StoryBoard 1B (BoxPlot)
- 9.5 StoryBoard 3A
- 9.6 StoryBoard 3B: All-Time Series
- 9.7 StoryBoard 3C: Cheat to select different views
- 9.8 StoryBoard 4A
- 9.9 StoryBoard 4B: Create Map with Pages + Density
- 9.10 StoryBoard 4C: Jan-Dec Analysis + Density
- 10 Findings
Topic
Problem and Motivation
Data Cleaning (Step-By-Step)
Union EEA DataSets
Filter
Combining data with different headers
Data Analysis
EEAData
- Had hourly readings from 2017 onwards but daily from 2013-2016
- Had missing values from Jan – Nov 2017
- Station readings also had time gaps without data
- Latitude and Longitude can be created once joined with given metadata: Inner join on station name.
- Useful Fields: PM10, Lat, Long, Time, Stations
Airtube
- Only 2 years worth of data
- Possible Abnormalities due to citizen sensors and over the top readings
- Need to convert geohash to Lat Long
- Readings are not only within Sofia City but Bulgaria
METEO-DATA
- 1 Point in the country: Airport
- Useful Fields: Lat, Long, Time, Meteorological data [Dew Point Temperature, Precipitation Amount, Wind Speed, Humidity, Surface Pressure, Temperature, and Visibility]
- Elevation in meters
- Useful fields: Lat, Long, Elevation
- Can use Density Map
StoryBoard 1
Time Series
Overview of the PM10 Concentrations collected from the respective Air Quality Stations over the Months between 2013-2018. Once a point is clicked, it expands to 1A to show distribution over days. Then we can take a closer look into which parts of the month caused the overall surge.
Highlight Table
1A. Highlight table shows the intensity for each station for the entire month. Using this table, we can tell which are the regions that are more polluted than the other, or if it as a common phenomenon across the city.
StoryBoard 2
Time Series & Boxplot & Meteorology Data
The second storyboard is separate from the first because it only has 1 year worth of data. After analyzing the trends across the years, months and days. We can now study the pattern across one day in Sofia City. On selecting a day, we can see the
- Boxplot of readings from each station with its variance.
- Meteo Data for the day
- Hourly readings
- Min Max Average for the day
Hover over to see the readings across stations in a boxplot to represent variance.
StoryBoard 3
Dynamic comparison of Meteorology Data against Concentration using a Time Series
With our next DataSet, We can now look at Meteorological data. Does the PM10 Conc. Does patterns correlate with Meteorological Patterns? Could the environment be a cause of accumulation of PM10 particles, which results in high PM10 Conc. ?
StoryBoard 4
Terrain Map & Density
How can we better visualize the patterns of PM10 Conc Levels? Through research, considering that Sofia is largely bound by high valleys and that likely a contributing factor to its pollution. There is, therefore, a need for us to picture it on a terrain map to indicate elevated areas. On top of this, we can also use this pages function to run through months of PM10 Conc. on a density map so that we can quickly identify the PM10 Levels. Clicking on each point will bring you to the worksheet.
This worksheet wraps up the 6 maps shown above as it shows the concentration patterns over months, regardless of year. This is useful to confirm the months that have the worst air quality.
Step By Step (After Cleaning)
StoryBoard 1A (Time Series)
- Filter Year, Month,Day
- Marks Concentration
- Columns Year Month
- Rows Concentration
- Action On Select, All fields ( year, month, day) to target sheet IB
StoryBoard 1B (Highlight Table)
- Filter Year, Month, Day
- Marks Concentration
- Columns Year Month Day,
- Rows Air Qty Stn
StoryBoard 2A (Circular Chart for hourly data)
- Filter Year, Month, Day ( From EEA) Apply to worksheets 1B
- In order to limit to 2 years only : If you only want 2017/2018: Duplicate EEA > Extract only 2017,2018 as database > Set Additional filter on worksheet > save
- Put them together in a dashboard
- Marks Concentration
- Columns Year Month
- Rows Concentration
- Action On Select, All fields ( year, month, day) to target sheet 2B
StoryBoard 1B (BoxPlot)
- Filter Year, Month, Day Automatically filtered
- Marks Air Quality Station
- Columns Concentration
- Rows Year
StoryBoard 3A
- Duplicate 1A's worksheet
- Filter Year, Month, Day
Affect Worksheets using the selected data source
- Marks Concentration
- Columns Year Month, Day
- Rows Concentration
- Action On Hover, filter Worksheets using Meteorological data by “Year” Field.
StoryBoard 3B: All-Time Series
- Filter Year, Month, Day
- Columns Year Month Day,
- Rows measure
StoryBoard 3C: Cheat to select different views
https://onlinehelp.tableau.com/current/pro/desktop/en-us/dashboards_sheet_selector.htm
- Create a parameter, then a dimension, custom add values and position all views overlapping each other in the dashboard. Eventually, the view will be filtered.
StoryBoard 4A
StoryBoard 4B: Create Map with Pages + Density
- Filter Pages, year, Month
- Set filter conc max to 700.
- Columns Lat
- Rows Long
- Place each map into the dashboard. > Show page controls > Synchronized. So that the selection of months in the year can be synchronised with one play button.
https://kb.tableau.com/articles/howto/synchronize-sheets-on-a-dashboard-with-a-single-page-control
StoryBoard 4C: Jan-Dec Analysis + Density
- Filter
- Pages All years
- Month
- Columns Lat
- Rows Long