Group23 Proposal
MAKE THE WORLD A BETTER PLACE TO “BREATHE”
|
|
|
|
|
|
Background
Every year, World Bank website will publish updated World Development Indicator that they’ve collected from multiple channels. That will be the core data resource for our analysis. The dataset contains 1,591 indicators among 263 countries. Indicators include aspects like environment, economic policy & debt, infrastructure, financial sector, public sector, private sector & trade, social protection & labour, education, health, gender, poverty and social protection & labour.
In our cases, some of the indicators can be used as a reference to air pollution, like CO2 emissions, PM2.5 air pollution, Nitrous oxide emissions etc. And we will not exclude any of those indicators without statistically proven insignificant.
Data Preparation:
The raw data is presented in the form stacked with countries, for time series analysis, we need to transform it into long format, where there will only be 3 columns: country, indicator and value.
Methodologies and Techniques
Descriptive Statistics
The data will be grouped into regions and different time periods. Descriptive statistics will help to differentiate among regions during each period, or whether it’s going through industrialization, wars or even civilization revolutions.
Variable Selection & Clustering
For now, there are 1,591 indicators in our dataset, intuitively, some of the indicators are highly correlated or even resemble. Variable selection is necessary or else the whole analysis may bias towards some highly-weighted variables.
To eliminate highly-correlated variables, correlation matrix and stepwise regression may come in handy. Also, application of variable clustering will help us to reduce dimension of variables and access us to measure attributions altogether.
Time series & Panel Analysis
To monitor on sensitivity of air quality to each indicator in time, panel analysis may be applied to corresponding changes. With panel analysis results, we will be able to quantify how does selected factors affect air quality at a certain level.
Geographical Visualization
Geographical visualization will give us a clearer picture of how air pollution distributed around the world and, enable us to detect details or patterns of migration.