Difference between revisions of "Assignment ZUOANNA Task 1"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 3: Line 3:
 
== Data Preperation ==
 
== Data Preperation ==
 
=== Know Your Data ===
 
=== Know Your Data ===
{|class="wikitable" style="text-align: center;"  
+
{|class="wikitable"  
 
|-
 
|-
 
! Station !! Timeseries(Original records for each year) !! Consolidation(Exclude title)
 
! Station !! Timeseries(Original records for each year) !! Consolidation(Exclude title)
 
|-
 
|-
| ''Station 9421''  || 2013_Day(358),2014_Day(365), 2015_Day(357),2016_HD(464),2017_HV(777),2018_H(5965)|| 8,280 Records
+
| ''Station 9421''  || 2013_Day(358), 2014_Day(365), 2015_Day(357), 2016_HD(464), 2017_HV(777), 2018_H(5965)|| 8,280 Records
 
|-
 
|-
| ''Station 9484''  || ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
+
| ''Station 9484''  || 2013 _Day(314), 2014_Day(342), 2015_Day(264) || 917 Records
 
|-
 
|-
| ''Station 9572''  || Methods for Cluster analysis.
+
| ''Station 9572''  || 2013_Day(365), 2014_Day(345), 2015_DH(347), 2016_DH(453), 2017_HV(777), 2018_DH(6097)|| 8,378 Records
 
|-
 
|-
| ''Station 9616''  || Best criterion returns the best index value according to a specified criterion.
+
| ''Station 9616''  || 2013_Day(344), 2014_Day(363), 2015_DH(352), 2016_DH(465), 2017_HV(776), 2018_DH(5449)||7,744 Records
 
|-
 
|-
| ''Station 9642''  || ggmap is a package to show the spatial data visualization. It can retrieve various online sources (e.g. Google Maps) for user to download and use as layers within the ggplot2 plotting system.
+
| ''Station 9642''  || 2013_Day(364), 2014_Day(360), 2015_DH(358), 2016_DH(514), 2017_HV(752), 2018_DH(6051)||8,393 Records
 
|-
 
|-
| ''Station 60881''  || Convert a data frame (containing a panel dataset, where rows are observations and columns are time periods) into an Edward Tufte-inspired "slopegraph" using either base or ggplot2 graphics.
+
| ''Station 60881''  || 2018_DH(6005)||6,004 Records
 +
|-
 +
| ''Consolidate All Stations''  || 2013 - 2018 ||39,715 Records
 
|-
 
|-
 
|}
 
|}
 +
'''* Note'''
 +
1. DH represent for Day and Hour which means that the Concentation of the polutant was recorded either by Days or Hours over the relanvent period. If the station records the concentration  by day, each day should have only one reacing. But if the station records the concentration by hour, each hour have only one reading and each day will have 24 readings.
 +
2. HV represent Hour and Var(showed by dataset). From the data, it is noticed that both of them represent the measurment for concentration by hour.
 +
3. Finally, we will want all the records from above six stations consolidate together, so it is more convenient to make interactive visualization in Tableau by filtering the station, year and method of the measurement(by day or by hour).

Revision as of 20:05, 15 November 2018

Spatio-temporal Analysis of Official Air Quality

Data Preperation

Know Your Data

Station Timeseries(Original records for each year) Consolidation(Exclude title)
Station 9421 2013_Day(358), 2014_Day(365), 2015_Day(357), 2016_HD(464), 2017_HV(777), 2018_H(5965) 8,280 Records
Station 9484 2013 _Day(314), 2014_Day(342), 2015_Day(264) 917 Records
Station 9572 2013_Day(365), 2014_Day(345), 2015_DH(347), 2016_DH(453), 2017_HV(777), 2018_DH(6097) 8,378 Records
Station 9616 2013_Day(344), 2014_Day(363), 2015_DH(352), 2016_DH(465), 2017_HV(776), 2018_DH(5449) 7,744 Records
Station 9642 2013_Day(364), 2014_Day(360), 2015_DH(358), 2016_DH(514), 2017_HV(752), 2018_DH(6051) 8,393 Records
Station 60881 2018_DH(6005) 6,004 Records
Consolidate All Stations 2013 - 2018 39,715 Records

* Note 1. DH represent for Day and Hour which means that the Concentation of the polutant was recorded either by Days or Hours over the relanvent period. If the station records the concentration by day, each day should have only one reacing. But if the station records the concentration by hour, each hour have only one reading and each day will have 24 readings. 2. HV represent Hour and Var(showed by dataset). From the data, it is noticed that both of them represent the measurment for concentration by hour. 3. Finally, we will want all the records from above six stations consolidate together, so it is more convenient to make interactive visualization in Tableau by filtering the station, year and method of the measurement(by day or by hour).