Difference between revisions of "Group06 Report"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(4 intermediate revisions by the same user not shown)
Line 48: Line 48:
 
* An individual score (IAQI) is assigned to the level of each pollutant and the final AQI is the highest of those 6 scores. The pollutants can be measured quite differently. PM2.5、PM10 concentration are measured as average per 24h. SO2, NO2, O3, CO are measured as average per hour. The final API value is calculated per hour according to a formula published by the MEP.[[https://en.wikipedia.org/wiki/Air_quality_index| Introduction to AQI]]
 
* An individual score (IAQI) is assigned to the level of each pollutant and the final AQI is the highest of those 6 scores. The pollutants can be measured quite differently. PM2.5、PM10 concentration are measured as average per 24h. SO2, NO2, O3, CO are measured as average per hour. The final API value is calculated per hour according to a formula published by the MEP.[[https://en.wikipedia.org/wiki/Air_quality_index| Introduction to AQI]]
 
* The following table shows the AQI category, pollutants and health breakpoints:
 
* The following table shows the AQI category, pollutants and health breakpoints:
[[image: Air03.png|500px|center]]
+
[[image: Picture1.png|500px|center]]
  
 
==== 2.Data Pre-processing ====
 
==== 2.Data Pre-processing ====
 
==== Main Steps: ====
 
==== Main Steps: ====
 +
<p align="justify">In our second dataset, there are some missing data owing to the issues like machine maintenance.So we use the r packages zoo, tidyverse to do the data preparation. </p>
 
* Missing data interpretation
 
* Missing data interpretation
<p align="justify"></p>
+
<p align="justify">Interpolation: If the two known points are given by the coordinates, the linear interpolant is the straight line between these points
 +
</p>
 
* Missing values (NAs) are replaced by linear interpolation via approx
 
* Missing values (NAs) are replaced by linear interpolation via approx
<p align="justify"></p>
+
<p align="justify">the function: approx.na in zoo package helps us to realize this interpolation.
* Six pollutants scoring
+
</p>
 +
<p align="justify">Tidyverse:  This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.</p>
 +
 
 +
* Six pollutants scoring for spider chart
 
<p align="justify">We assigned a score from one to six to each pollutant (PM10, PM2.5, NO2, SO2, CO, O3)</p>
 
<p align="justify">We assigned a score from one to six to each pollutant (PM10, PM2.5, NO2, SO2, CO, O3)</p>
 +
<p align="justify">The following table shows six categories of AQI:</p>
 +
[[image: Picture1.png|500px|center]]
  
 
== DESIGNED FRAMEWORK ==
 
== DESIGNED FRAMEWORK ==
 
==== 1.Interface ====
 
==== 1.Interface ====
 
==== <span style="border-bottom: solid 1px grey;">1.1 Page 1:</span> ====
 
==== <span style="border-bottom: solid 1px grey;">1.1 Page 1:</span> ====
 +
[[File:Page1.jpg|600px|center]]<br/>
 +
 
==== <span style="border-bottom: solid 1px grey;">1.2 Page 2:</span> ====
 
==== <span style="border-bottom: solid 1px grey;">1.2 Page 2:</span> ====
 +
[[File:Page2.png|600px|center]]
 +
 
==== <span style="border-bottom: solid 1px grey;">1.3 Page 3:</span> ====
 
==== <span style="border-bottom: solid 1px grey;">1.3 Page 3:</span> ====
 +
[[File:Page3.jpg|600px|center]]<br/>
 +
 
==== <span style="border-bottom: solid 1px grey;">1.4 Page 4:</span> ====
 
==== <span style="border-bottom: solid 1px grey;">1.4 Page 4:</span> ====
 +
[[File:Page4.png|600px|center]]<br/>
 +
 
==== 2.Analytical Visualizations ====
 
==== 2.Analytical Visualizations ====
 
==== <span style="border-bottom: solid 1px grey;">2.1 Line Chart:</span> ====
 
==== <span style="border-bottom: solid 1px grey;">2.1 Line Chart:</span> ====
 
==== Fluctuation of Beijing AQI ====
 
==== Fluctuation of Beijing AQI ====
 
[[image: Line chart.png|600px|center]]
 
[[image: Line chart.png|600px|center]]
* Description: In this line chart, we visualize the changes of AQI levels and AQI index within the specific range. Here, we select the date range from Jan 1st, 2017 to Aug 31st, 2017 to display the fluctuation of AQI. From this chart, it can be noticeable that there is a peak of AQI in the month of May, which has reached the severely polluted level of air quality.  
+
* Description: Our line chart is mainly to show the fluctuation of Beijing AQI within the selected date range, as you can see from this screenshot, it displays the changes of AQI level from Jan 1st to Aug 31st.
 +
In this line chart, I use ggplot2 to draw the line chart firstly and use the function of geom_rect to show the different AQI category colors, the color is followed by the table I mention before.
 +
 
 
* R Packages: ggplot2
 
* R Packages: ggplot2
  
Line 83: Line 100:
 
* R Packages: fmsb  
 
* R Packages: fmsb  
 
<p align="justify"><br>fmsb: radarchart() in 'fmsb' package offer a plotting function that draws radar/spider chart, similar to stars() in base package.</p>
 
<p align="justify"><br>fmsb: radarchart() in 'fmsb' package offer a plotting function that draws radar/spider chart, similar to stars() in base package.</p>
 +
<br />
 +
'''Approach'''
 +
<br />
 +
There are mainly three steps to draw a spider chart in R by using radarchart function. First, get actual data according to the selection, then get Maximum, minimum and average numbers. The third step is to compile all these data and plot. What is noticeable here is that the putting maximum and minimum values on the top is necessary because it will determine the shape and data range of the chart. As can be seen from the dataframe to be red into the  chart,the first row and second row are the Max and Min, third row shows average values and the last row is the actual record of certain hour.
 +
[[File:Sc1.png|600px|center]][[File:Sc2.png|600px|center]]
  
 
==== <span style="border-bottom: solid 1px grey;">2.4 Raster Map:</span>====
 
==== <span style="border-bottom: solid 1px grey;">2.4 Raster Map:</span>====
Line 91: Line 113:
 
<p align="justify">gstat: Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation.</p>
 
<p align="justify">gstat: Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation.</p>
 
<p align="justify">raster: Reading, writing, manipulating, analyzing and modeling of gridded spatial data</p>
 
<p align="justify">raster: Reading, writing, manipulating, analyzing and modeling of gridded spatial data</p>
 +
<br />
 +
'''Approach'''
 +
<br />
 +
First of all, IDW is the special data interpolation methond. It is a type of deterministic method for multivariate interpolation with a known scattered set of points. The assigned values to unknown points are calculated with a weighted average of the values available at the known points.So how to get this raster map. Actually the raster map is the combination of scatter chart, raster chart and map. Hence after plotting points and interpolating blank area, coloring the points with weight in the raster graph. In the final step, it used the crop and mask function in raster package to fit the graph into map.
 +
 +
[[File:Rm1.png|600px|center]]
  
 
==== <span style="border-bottom: solid 1px grey;">2.5 Geofacet Line Graphs:</span> ====
 
==== <span style="border-bottom: solid 1px grey;">2.5 Geofacet Line Graphs:</span> ====
Line 98: Line 126:
 
<p align="justify"><br>Geofacet: This R package provides geofaceting functionality for ggplot2. Geofaceting arranges a sequence of plots of data for different geographical entities into a grid that strives to preserve some of the original geographical orientation of the entities.
 
<p align="justify"><br>Geofacet: This R package provides geofaceting functionality for ggplot2. Geofaceting arranges a sequence of plots of data for different geographical entities into a grid that strives to preserve some of the original geographical orientation of the entities.
 
</p>
 
</p>
 +
<br />
 +
'''Approach'''
 +
<br />
 +
The most interesting part is to create grid because the package itself doesn’t offer many grids.This project use Grid Designer, a JavaScript app, to design a grid. Grid designer application enables everyone to draw grid by simply dragging the blocks in the blank area. This will open up a web application with an empty grid and instructions on how to fill it out. Basically you just need to paste in csv content about the geographic entities (the row and col columns are not required at this point). For example, we entered the 8 view points name into the app. Then a grid of squares with these column attributes will be populated and you can interactively drag the squares around to get the grid you want. Each cell in the grid can be composed of any kind of plot conceivable with ggplot2. With this we can see how the air quality index change in 24 hours per view point and how some of the patterns are spatially similar. Besides, grid we generated can be published and share to other people.
 +
 +
[[File:Gg1.png|600px|center]]
  
== DISCUSSION ==
 
 
== FUTURE SCOPE ==
 
== FUTURE SCOPE ==
* <p align="justify">Although we make big efforts to realize the optimal visualization for our project, we still have some steps for the further analysis:</p>
+
<br />
* <p align="justify">Realizing the predictive analysis for forecasting the future volatility of main air quality indicators according to the historical dataset.</p>
+
Although the dataset is not complicated, this project have visualized from several aspects. However, there are some future works to improve this project. <br />
* <p align="justify">Combining data about weather condition and traffic movement to better understand the origin of the air pollution.</p>
+
* Realizing the predictive analysis for forecasting the future volatility of main air quality indicators according to the historical dataset.
* <p align="justify">Exploring more view points in Beijing to better visualize the raster map.</p>
+
* Exploring more view points in Beijing to better visualize the raster map and geofacet graph.
* <p align="justify">Providing more suggestion on how to deal with haze days according to the professional health advisories.</p>
+
* Combining data about weather condition and traffic movement to better understand the origin of the air pollution.
 +
* Providing more suggestion on how to deal with haze days according to the professional health advisories.
  
 
== USER GUIDE ==
 
== USER GUIDE ==
 +
 +
{| class="wikitable"
 +
|-
 +
|<center><b>Dashboard Page</b></center>
 +
|<center><b>Guide</b></center>
 +
|-
 +
|[[File:Page1.jpg|500px]]<br/>
 +
<center>'''Bar Chart & Line Chart'''</center>
 +
|
 +
Adjust the date range observed in input control box, it will shows the barchart and linechart. The barchart shows the distribution of 6 categories among selected data range and linechart will shows the fluctuation of air quality in this data range.
 +
|
 +
|-
 +
|[[File:Page2.png|500px|centre]]
 +
<center>'''Map''' </center>
 +
|
 +
In input control, user can choose any air pollutant with any time they want to observe.The map indicates the air quality in 8 viewpoints. The warmer the color is, the more severe the area is.
 +
|
 +
|-
 +
|[[File:Page3.jpg|500px]]<br/>
 +
<center>'''Spider Chart''' </center>
 +
|
 +
In input control, user can choose any area with any time they want to observe.The spider chart will show the 6 air pollutant index together with average level. The figure in the box indicates the specific value of certain air pollutant index. In addition, the color and arrow to indicate the change compared with the figure in one hour before. The red(arrow-up), yellow(arrow-right), green(arrow-down) represent increase, remain, decrease respectively.
 +
|-
 +
|[[File:Page4.png|500px]]<br/>
 +
<center>'''Geofacet Graph'''</center>
 +
|
 +
In input control, user can choose any air pollutant and date they want to observe.The geofacet graph will show the changes of certain air pollutant within 24 hours.
 +
|}
 +
 
== REFERENCES ==
 
== REFERENCES ==
 
* [https://www.google.com.sg/search?q=smog+in+china&tbm=nws&ei=AL0fWtT7IJqevQTWvb2gBw&start=0&sa=N&biw=1229&bih=684&dpr=1.88| Motivation news screenshot]
 
* [https://www.google.com.sg/search?q=smog+in+china&tbm=nws&ei=AL0fWtT7IJqevQTWvb2gBw&start=0&sa=N&biw=1229&bih=684&dpr=1.88| Motivation news screenshot]
 
* [https://en.wikipedia.org/wiki/Air_quality_index| AQI index]
 
* [https://en.wikipedia.org/wiki/Air_quality_index| AQI index]

Latest revision as of 17:43, 3 December 2017

Air3.jpg Group 6 - How is Beijing Air Quality?

Proposal

RShinyApp

Poster

Report

 

MOTIVATION

On Nov 4th, Beijing Environmental Protection Agency released the news, owing to the adverse weather conditions and early winter heating as well as other factors, it is expected that there will be a continuous 4-day regional heavily polluted air quality in Beijing-Tianjin-Hebei and surrounding areas on November 4th, in addition, the air quality in some cities may reach serious pollution level….

“Why is China’s smog so bad now?”, a lot of people from overseas want to explore. With the rapid development of economy in China, news from China is more frequently commented in the globe. China’s air pollution has been a serious issue for more than 10 years, with the problem appealing more attention worldwide, the Chinese government has make big efforts to solve it.

Air01.PNG

China's capital Beijing is under pressure to bring average PM2.5 readings to 60 micrograms per cubic meter this year, which has decreased from 73 micrograms since last year. Nonetheless, the index is still higher than the official air quality standard value in China Mainland.

Along with the increasing escalation of air pollution, most people who are working and living in Beijing are faced with the tracheitis, pneumoconiosis, asthma, to name just a few. Nowadays, current air quality fails to meet people's expectation. Gradually, a lot of people are terrified with living and working in Beijing.

In our project, we mean to apply the visual analytics tools to better visualize the changes of air quality according its existing indicators. We will show the fluctuation of the historical AQI (Air Quality Index), the pollutant concentrations and trend charts by pollutants in the different view point in Beijing. We hope that we can try our best to show the weather condition, make people clearly know more about the surroundings they are living in as well as raise the public awareness of environmental protection.

REVIEW AND CRITIQUE OF PRIOR WORK

DATA PREPARATION

1.Dataset Introduction

  • Beijing air quality dataset was collected from Beijing Municipal Environmental Monitor Center
  • This data contains 7 kinds of air quality indexes: AQI (Air Quality Index), PM2.5, PM10, SO2, NO2, CO, O3.
  • We only choose data from 2017-01-01 to 2017-11-05 as our target dataset.
  • The selected data contains 87,361 record rows.
  • The columns including:
Air02.png

* AQI Tips *

  • AQI, which is named Air Quality Index, acts as the criteria of the air quality measurement tool.
  • AQI is calculated by 4 major air pollutants: ground level ozone, particle pollution, carbon monoxide, and sulfur dioxide.
  • An individual score (IAQI) is assigned to the level of each pollutant and the final AQI is the highest of those 6 scores. The pollutants can be measured quite differently. PM2.5、PM10 concentration are measured as average per 24h. SO2, NO2, O3, CO are measured as average per hour. The final API value is calculated per hour according to a formula published by the MEP.[Introduction to AQI]
  • The following table shows the AQI category, pollutants and health breakpoints:
Picture1.png

2.Data Pre-processing

Main Steps:

In our second dataset, there are some missing data owing to the issues like machine maintenance.So we use the r packages zoo, tidyverse to do the data preparation.

  • Missing data interpretation

Interpolation: If the two known points are given by the coordinates, the linear interpolant is the straight line between these points

  • Missing values (NAs) are replaced by linear interpolation via approx

the function: approx.na in zoo package helps us to realize this interpolation.

Tidyverse:  This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

  • Six pollutants scoring for spider chart

We assigned a score from one to six to each pollutant (PM10, PM2.5, NO2, SO2, CO, O3)

The following table shows six categories of AQI:

Picture1.png

DESIGNED FRAMEWORK

1.Interface

1.1 Page 1:

Page1.jpg


1.2 Page 2:

Page2.png

1.3 Page 3:

Page3.jpg


1.4 Page 4:

Page4.png


2.Analytical Visualizations

2.1 Line Chart:

Fluctuation of Beijing AQI

Line chart.png
  • Description: Our line chart is mainly to show the fluctuation of Beijing AQI within the selected date range, as you can see from this screenshot, it displays the changes of AQI level from Jan 1st to Aug 31st.

In this line chart, I use ggplot2 to draw the line chart firstly and use the function of geom_rect to show the different AQI category colors, the color is followed by the table I mention before.

  • R Packages: ggplot2

2.2 Bar Chart:

Frequency of Beijing AQI Levels

Barchart.png
  • Description: in this bar chart, we mainly tend to display the frequency of AQI levels within selected date range. As you can see from the above paragraph, there are 6 official AQI levels for us to confirm the condition of air pollution. It can be observed that "Moderate" level of air quality accounts for highest frequency from Jan 1st to Aug 31st, 2017.
  • R Packages: ggplot2

2.3 Spider Chart:

Spider chart.png
  • Description: The spider chart shows the severity of each pollutant. The larger the score is, more sever the pollutants are. The blue shade shows the pollutant levels under selected date, hour and area and yellow shade represents the average level for all the areas under selected date and hour. The box above the graph indicates the exact pollutant index. Compared with index in one hour before selected hour, the red, yellow, green represents increase, maintain and decrease separately.
  • R Packages: fmsb


fmsb: radarchart() in 'fmsb' package offer a plotting function that draws radar/spider chart, similar to stars() in base package.


Approach
There are mainly three steps to draw a spider chart in R by using radarchart function. First, get actual data according to the selection, then get Maximum, minimum and average numbers. The third step is to compile all these data and plot. What is noticeable here is that the putting maximum and minimum values on the top is necessary because it will determine the shape and data range of the chart. As can be seen from the dataframe to be red into the chart,the first row and second row are the Max and Min, third row shows average values and the last row is the actual record of certain hour.

Sc1.png
Sc2.png

2.4 Raster Map:

Raster chart.png
  • Description: Since the dataset only have 8 view points, in order to predict values at locations where no measurements have been made, we used “inverse distance weighted” method to interpolate air quality indexes of all main urban areas in Beijing.
  • R Packages: ggplot2,maptools, gstat, raster


maptools:Tools for Reading and Handling Spatial Objects.

gstat: Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation.

raster: Reading, writing, manipulating, analyzing and modeling of gridded spatial data


Approach
First of all, IDW is the special data interpolation methond. It is a type of deterministic method for multivariate interpolation with a known scattered set of points. The assigned values to unknown points are calculated with a weighted average of the values available at the known points.So how to get this raster map. Actually the raster map is the combination of scatter chart, raster chart and map. Hence after plotting points and interpolating blank area, coloring the points with weight in the raster graph. In the final step, it used the crop and mask function in raster package to fit the graph into map.

Rm1.png

2.5 Geofacet Line Graphs:

Geofacet graph.png
  • Description: Geofaceting arranges a sequence of plots of data for different geographical entities into a grid that preserves some of the geographical orientation. It can be seen from the picture that the location of those line graphs just fit their corresponding geographical coordinates.
  • R Packages: ggplot2, geofacet


Geofacet: This R package provides geofaceting functionality for ggplot2. Geofaceting arranges a sequence of plots of data for different geographical entities into a grid that strives to preserve some of the original geographical orientation of the entities.


Approach
The most interesting part is to create grid because the package itself doesn’t offer many grids.This project use Grid Designer, a JavaScript app, to design a grid. Grid designer application enables everyone to draw grid by simply dragging the blocks in the blank area. This will open up a web application with an empty grid and instructions on how to fill it out. Basically you just need to paste in csv content about the geographic entities (the row and col columns are not required at this point). For example, we entered the 8 view points name into the app. Then a grid of squares with these column attributes will be populated and you can interactively drag the squares around to get the grid you want. Each cell in the grid can be composed of any kind of plot conceivable with ggplot2. With this we can see how the air quality index change in 24 hours per view point and how some of the patterns are spatially similar. Besides, grid we generated can be published and share to other people.

Gg1.png

FUTURE SCOPE


Although the dataset is not complicated, this project have visualized from several aspects. However, there are some future works to improve this project.

  • Realizing the predictive analysis for forecasting the future volatility of main air quality indicators according to the historical dataset.
  • Exploring more view points in Beijing to better visualize the raster map and geofacet graph.
  • Combining data about weather condition and traffic movement to better understand the origin of the air pollution.
  • Providing more suggestion on how to deal with haze days according to the professional health advisories.

USER GUIDE

Dashboard Page
Guide
Page1.jpg
Bar Chart & Line Chart

Adjust the date range observed in input control box, it will shows the barchart and linechart. The barchart shows the distribution of 6 categories among selected data range and linechart will shows the fluctuation of air quality in this data range.

Page2.png
Map

In input control, user can choose any air pollutant with any time they want to observe.The map indicates the air quality in 8 viewpoints. The warmer the color is, the more severe the area is.

Page3.jpg
Spider Chart

In input control, user can choose any area with any time they want to observe.The spider chart will show the 6 air pollutant index together with average level. The figure in the box indicates the specific value of certain air pollutant index. In addition, the color and arrow to indicate the change compared with the figure in one hour before. The red(arrow-up), yellow(arrow-right), green(arrow-down) represent increase, remain, decrease respectively.

Page4.png
Geofacet Graph

In input control, user can choose any air pollutant and date they want to observe.The geofacet graph will show the changes of certain air pollutant within 24 hours.

REFERENCES