Group09 proposal

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Proposal   Poster   Application   Research Paper   User Guide  


Motivation

Our primary motivation for doing this project would be to provide a useful visualization for the effectiveness of environmental policy implementation, as well as showing the relationship of GDP per Capita versus air quality. This visualisation will help us evaluate which countries are both thriving financially and active in managing air pollution. We are very motivated by this problem because one of our team members worked closely with the founder of Nomadlist.com, the No. 1 website for digital nomads. While working on it, he noticed that Air Quality Information (AQI) heavily affected the way people looked at cities – if they were worth living and working in. We believe that as climate change unfolds, AQI, in addition to economic measures like GDP per Capita, will be one of the key indicators that helps anyone decide if economic development in that country is happening at the expense of air quality and thus, the livability of the nation.

Problem

Currently, NomadList which uses a variant of the dataset we have, and displays everything in a tabular format with so many different colors. Other sites like the World Health Organisation provide separate data sets for AQI and GDP per Capita in table format, but this is not effective in showing the relationships or spotting trends in the data.

Picture1.png

In addition, existing data visualisations that try to show the data we are showing are not easily understandable or usable. For example, the Our World in Data visualisation here, is just a line graph for multiple countries. The data is neither complete nor does it show trends easily.

Objective

We aim to use a combination of visualisations, such as maps, interactive graphs as well as a more comprehensive dataset in order to show the relationships better. In addition, we aim to show policy implementation timeframes in combination with these visualisations to get a gauge of how well each country implements policy.

By showing these relationships, we hope to be able to identify which countries are doing a poor job of managing their air quality, and test several hypotheses, such as an increase in GDP per Capita necessarily leading to worse air quality. We also aim to identify which policies and which types of environmental policy are the most effective in managing air quality in developing nations across time.

Data Description and What Data We're Using

We are using data from The World Bank as well as OECD. We aim to merge these datasets by year and merge the countries in the dataset with latitude and longitude data available in the map_world function in the ggplot2 library in R.

The World Bank data is easy to use since it is already a package in R. We aim to analyse data across both datasets from 1995 to the most recent entries available; this applies to the OECD dataset as well. The World Bank Data is already nationalised, but the OECD dataset further splits each country into regions. Fortunately, the OECD dataset also provides the total for each country, so we will filter out the rest and use that to compare with the World Bank dataset.

Background Survey & References

Our project focuses on the correlations between economic development and political policy and the delta in the quantity of pollutants that results in, as well as any inverse relationships (i.e. does the quantity of pollutants affect economic development?). In that sense, compared to some of the visualisations shown, we won't be showing as much data. Rather, we want to make it as concise as possible while showing the relationships we want to illustrate.

Sketch Description
Our World In Data Visualisation
Notably, there exists a visualisation from Our World In Data here. This is just a simple line graph plotting air quality vs GDP.

Strengths Using just one line, the visualisation shows a relationship between GDP per capita and AQI; each data point can be moused over in order to see the year. By showing the data this way, we can spot inverse/proportional relationships quickly and compare multiple countries.

Weaknesses

Unfortunately, this dataset is very limited and the graph is unclear as to how these relationships form given the limits of the data. The use of a single line means there is a limit to how many countries you can show at once.

Takeaways We aim to improve this through the use of a map, which should help users visualise each country. We will color code the various measures on the map in order to show the severity of air pollution on the map itself, along with the GDP data.

AQI Singapore.jpg
We've also extensively referred to the website AQICN.

Strengths The website shows a variety of data on one screen and can show the air quality in different areas. Overall data is shown and important points are large and colour-coded.

Weaknesses As you can see in the image, the website is rather busy and the visualisations can be quite clunky if you don't know what you're looking for. Information overload is a real problem and if you want to drill down to various data points it will take a while to identify the correct data points.

Hazegazer.jpg
A notable example of another of the visualisations we're referencing to do is Hazegazer

Strengths Their website is more focused on demographics and the Indonesian haze crisis, but we aim to have that level of clarity when showing hotspots of cities and the GDPs and how they correlate. Not only are hotspots shown, but images, hashtags and videos are shown to provide human context to the data.

Weaknesses However, there are many options that a new user might not comprehend. To get the hotspots showing, one needs to find a few buttons that will select the dataset to be shown. Even then, it is not always clear as to what the data is trying to show. Even determining the timeframe of the data is hard.

NY Times Visualisation
NYTimes has another visualisation that plots just pollutant data over time.

Strengths For this particular aspect of our project, this is a good reference. The many data points lend well to the style of visualisation; a dense bar chart. The shading and reference chart brings forward the impact that this data has; how toxic and polluted each city is.

Weaknesses Very few, for the purposes that NYTimes is trying to push, this visualisation is effective and criticism would just be superficial in this case.

In a 2017 paper, Guillaume Vandenbroucke and Heting Zhu both argue that ‘We find that pollution in the United States, measured by particulate matter or CO2 emissions, rises with economic activity, but at a noticeably slower pace.' Given the GDP of a country/city and the pollutant data we have, we will test this trend across more than just the United States.

In a chapter by Ying Li and Ke Chen, they note that over 70 years of China’s history, ‘Control policies have been largely ineffective and air quality in the majority of the nation has not been significantly improved and even worsened in many urban areas’. For each policy and the timeframe that they are being implemented over, we’d like to see if this claim is true and show a correlation between the policy and pollutants.

References

https://research.stlouisfed.org/publications/economic-synopses/2017/06/23/measures-of-pollution
https://www.intechopen.com/books/energy-management-for-sustainable-development/a-review-of-air-pollution-control-policy-development-and-effectiveness-in-china
http://hazegazer.org/home
https://aqicn.org

Sketches

Sketch Description
AQI vs Policy.png
We plan to track various policy implementations and their time frame, correlating to the Air Quality Index of that city. We want to track if the policies the city or state implements are truly effective in reducing or controlling air pollution.
Pollution Hotspots vs GDP.png
We want to track if there truly is a correlation between GDP and AQI. We aim to show the GDP per capita of various cities around the world as well as their air quality index on a map. In addition,

we will show the delta in GDP vs the delta in AQI.

Key Technical Challenges & Approach

With the amount of data we have and the new platforms we have to learn, we anticipate a large challenge ahead in tackling this project.

Potential Challenges Solution
Not being familiar with R and R-Shiny.

We are all more used to programming in Python, React, Javascript, etc.

Set up a group chat for the class to discuss about R and R-Shiny

Compile a list of useful resources we can all share with each other Pair program if we really have to Refer to R and R Shiny documentation Look at other people’s projects and how they did it (similar to how one learns from open source projects)

Dataset is not tagged and is not sorted by country; it is sorted only by city. Find a convenient source of country vs city data and write a python script to organize and tag the data appropriately.
Correlation is not causation: We might find other factors that distort our findings. Our dataset is large: we are able to scour through multiple cities to see if the trends we are predicting reflect in multiple cities.

Storyboard

Image Description
GDP vs AQI
We assign a colour scale to the GDP per capita per country; in this case, the scale goes from red to green, green being higher.. In addition, we use the air quality index data provided by OECD to overlay a shaded cloud onto the country.. A worse air quality index will lead to a coloured layer overlaid on the country in a darker shade of red, and a better one will result in a darker shade of green. The measures, such as the PM25 data or mean population exposure can be selected, along with the year. The map can be zoomed in and hovered over to see the raw values for the AQI and GDP per Capita. A line graph will also be present, allowing you to select the country and view their GPD vs AQI data over time.

The aim of this visualisation is to determine a correlation between GDP and air pollution. Our hypothesis is that a higher GDP per capita leads to more air pollution as the country produces more value.

Policy Effectiveness Revised
Governments around the world implement policies that have a direct or indirect impact on the environment. We want to visualize the effectiveness of such policies in curbing the pollution levels in that country by doing a pre-post analysis.

This time series chart will show the pollutant levels in a city over time, as well as an indicator for when a policy was implemented. It will also show trendlines to visualize where the pollutant levels were heading at the time of implementation (negative forecast) and how the trend changed after implementation of the policy, as well as a positive forecast.

Pollution and GDP over time
The third story focuses on comparing AQI against GDP within the country (within side by side boxplot) and then across countries (multiple side-by-side boxplots)

By using box plots, we can see the IQR, median and outliers. This will allow anyone to filter out which countries might be more interesting to explore. For instance, Beijing was supposedly very polluted in the developing 2000s because factories were built around it. It was hard for city dwellers to stay in Beijing without developing health isssues. However, in recent years, policies have been implemented to remove factories around Beijing and shift it elsewhere. The AQI should increase and exploring how this impacts GDP would be interesting too.

Timeline

Timeline g9.jpg

Comments & Feedback