Difference between revisions of "Group03 Proposal"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 30: Line 30:
 
|}
 
|}
  
<div style="padding: 17px">
+
 
</div>
+
=Key Motivations=
<div style="background: #D1F765; padding: 20px; line-height: 0.3em; text-indent: 16px;letter-spacing:0.1em;font-size:20px"><font color=#040504 face="Garamond"> KEY MOTIVATION </font></div>
+
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|
 
<div style="border-left: #D1F765 solid 6px;font-family: Garamond; padding: 0px 30px 0px 18px; ">
 
 
First launched in 1995, the <b>Corruption Perceptions Index (CPI)</b> has been widely credited with putting the issue of corruption on the forefront of the international policy agenda. Transparency International (TI), is an international non-governmental organization based in Berlin, Germany which acts to combat global corruption and prevent criminal activities arising from corruption.
 
First launched in 1995, the <b>Corruption Perceptions Index (CPI)</b> has been widely credited with putting the issue of corruption on the forefront of the international policy agenda. Transparency International (TI), is an international non-governmental organization based in Berlin, Germany which acts to combat global corruption and prevent criminal activities arising from corruption.
 
   
 
   
Line 43: Line 39:
 
The CPI currently ranks 176 countries on a scale from 100 (very clean) to 0 (highly corrupt). Denmark is the least corrupt country in the world, ranking consistently high among international financial transparency, while the most corrupt country in the world is North Korea, remaining on 8 out of 100 since 2012.
 
The CPI currently ranks 176 countries on a scale from 100 (very clean) to 0 (highly corrupt). Denmark is the least corrupt country in the world, ranking consistently high among international financial transparency, while the most corrupt country in the world is North Korea, remaining on 8 out of 100 since 2012.
  
In our project, we married the data set from Transparency International on their CPI records for specifically 2016 versus the World Bank data set through the years, which contains economical, agricultural, social, environmental data of the same countries. We will seek to find out if there is indeed any correlations between the perceived corruption level of a country, and its internal conditions.
+
In our project, we married the data set from Transparency International on their CPI records (from 2012 to 2016) versus the World Bank data set through the years, which contains economical, agricultural, social, environmental data of the same countries. We will seek to find out if there is indeed any correlations between the perceived corruption level of a country, and its internal conditions.
 +
<br>
 
<br>
 
<br>
|-
 
|}
 
<!-- END OF KEY MOTIVATION --->
 
  
<div style="padding: 17px">
+
=Objectives (Questions we like to answer)=
</div>
+
 
<div style="background: #D1F765; padding: 20px; line-height: 0.3em; text-indent: 16px;letter-spacing:0.1em;font-size:20px"><font color=#040504 face="Garamond"> OBJECTIVES (QUESTIONS WE LIKE TO ANSWER)</font></div>
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|
 
<div style="border-left: #D1F765 solid 6px;font-family: Garamond; padding: 0px 30px 0px 18px; ">
 
 
It has been a challenge to validate whether CPI is an accurate index to represent corruption.  
 
It has been a challenge to validate whether CPI is an accurate index to represent corruption.  
  
Line 73: Line 62:
  
 
* There is a correlation between CPI and economic growth (through GDP)
 
* There is a correlation between CPI and economic growth (through GDP)
* There is a correlation between CPI and the rate / amount of foreign investment
+
* There is a correlation between CPI and education
* There are correlations between CPI and the following factors
+
* There is a correlation between CPI and gender equality
** Urban / Rural mix (e.g. % of agricultural land)
 
** Environmental conditions (e.g. CO2 emissions level)
 
** Education level (e.g. education expenditure for primary, secondary, tertiary, educational attainment)
 
** Literacy Rates (e.g. between adults and youths)
 
** Debt Level (e.g. amount used to service debts)
 
** Tourism (e.g. international tourism arrivals and departures, expenditures and receipts)
 
** Mortality rates (e.g. male / female / neo-natal)
 
** Populations numbers
 
** Employment Details (e.g. unemployment rates, male / female rates)
 
 
* Attempt to debunk any stereotypes and myths we may have for individual countries
 
* Attempt to debunk any stereotypes and myths we may have for individual countries
 
<br>
 
<br>
|-
 
|}
 
<!-- END OF OBJECTIVES --->
 
  
<div style="padding: 17px">
+
=Data Sources=
</div>
+
 
<div style="background: #D1F765; padding: 20px; line-height: 0.3em; text-indent: 16px;letter-spacing:0.1em;font-size:20px"><font color=#040504 face="Garamond"> DATA SOURCES </font></div>
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|
 
<div style="border-left: #D1F765 solid 6px;font-family: Garamond; padding: 0px 30px 0px 18px; ">
 
 
The data came from two sources.  
 
The data came from two sources.  
  
The first one came from:  
+
The first one came from Transparency International (2015 CPI as an example):
https://www.kaggle.com/transparencyint/corruption-index
+
https://www.transparency.org/cpi2015#downloads
  
 
The data set contains the following important columns:
 
The data set contains the following important columns:
* CPI 2016 Rank
+
* CPI 2012 - 2016 Rank
 
* Country  
 
* Country  
 
* Country Code
 
* Country Code
 
* Region
 
* Region
* Corruption Perceptions Index
+
* Corruption Perceptions Index from 2012 to 2016
  
 
The second data set from the World Bank came from:  
 
The second data set from the World Bank came from:  
Line 114: Line 86:
 
This data set is a collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates
 
This data set is a collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates
 
However, due to the huge amount of data, we only kept the data for countries which appeared in the CPI data set and only indices from 2006 to 2016.  
 
However, due to the huge amount of data, we only kept the data for countries which appeared in the CPI data set and only indices from 2006 to 2016.  
 +
The fields we are looking at are the World Development Indicators (WDI).
  
 
The filtered dataset for the World Bank data was <b>259,750</b> rows across <b>171</b> countries.  
 
The filtered dataset for the World Bank data was <b>259,750</b> rows across <b>171</b> countries.  
 
<br>
 
<br>
|-
+
<br>
|}
 
<!-- END OF DATA SOURCES --->
 
  
<div style="padding: 17px">
+
=Methodology=
</div>
 
<div style="background: #D1F765; padding: 20px; line-height: 0.3em; text-indent: 16px;letter-spacing:0.1em;font-size:20px"><font color=#040504 face="Garamond"> METHODOLOGY </font></div>
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|
 
<div style="border-left: #D1F765 solid 6px;font-family: Garamond; padding: 0px 30px 0px 18px; ">
 
  
 
The first factor to assess CPI is to understand the methodology of calculating the index. The CPI scores and ranks countries and territories around the world on the perceived level of corruption in the public sector. CPI is an aggregate index, which draws on relevant questions from several different data sources that capture business and expert views.
 
The first factor to assess CPI is to understand the methodology of calculating the index. The CPI scores and ranks countries and territories around the world on the perceived level of corruption in the public sector. CPI is an aggregate index, which draws on relevant questions from several different data sources that capture business and expert views.
Line 137: Line 102:
 
# Report a measure of uncertainty: The CPI is accompanied by a standard error and confidence interval associated with the score, which capture the variation in scores of the data sources available for that country/territory.
 
# Report a measure of uncertainty: The CPI is accompanied by a standard error and confidence interval associated with the score, which capture the variation in scores of the data sources available for that country/territory.
  
We can also further analyse the CPI pre and post-2012 to see if there is an impact to the overall index score by country.  
+
We can also further analyse the individual survey scores to see if there is an impact to the overall index score by country.  
 +
<br>
 
<br>
 
<br>
|-
 
|}
 
<!-- END OF METHODOLOGY --->
 
  
<div style="padding: 17px">
+
=Tools and Packages=
</div>
+
 
<div style="background: #D1F765; padding: 20px; line-height: 0.3em; text-indent: 16px;letter-spacing:0.1em;font-size:20px"><font color=#040504 face="Garamond"> TOOLS AND PACKAGES </font></div>
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|
 
<div style="border-left: #D1F765 solid 6px;font-family: Garamond; padding: 0px 30px 0px 18px; ">
 
 
R Studio, Tableau (only for preliminary EDA) and associated R libraries will be used:
 
R Studio, Tableau (only for preliminary EDA) and associated R libraries will be used:
  
Line 159: Line 117:
 
* plotly
 
* plotly
 
<br>
 
<br>
|-
 
|}
 
<!-- END OF TOOLS AND PACKAGES --->
 
  
<div style="padding: 17px">
+
=References to Related DataViz=
</div>
+
 
<div style="background: #D1F765; padding: 20px; line-height: 0.3em; text-indent: 16px;letter-spacing:0.1em;font-size:20px"><font color=#040504 face="Garamond"> REFERENCES TO RELATED DATAVIZ </font></div>
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
|-
 
|
 
<div style="border-left: #D1F765 solid 6px;font-family: Garamond; padding: 0px 30px 0px 18px; ">
 
 
The references we are using for this project:
 
The references we are using for this project:
 
# E.V., T. I. (n.d.). Transparency International. Retrieved from https://www.transparency.org/research/cpi
 
# E.V., T. I. (n.d.). Transparency International. Retrieved from https://www.transparency.org/research/cpi
 
# World Bank Open Data. (n.d.). Retrieved from https://data.worldbank.org/
 
# World Bank Open Data. (n.d.). Retrieved from https://data.worldbank.org/
 
<br>
 
<br>
|-
+
<br>
|}
 
<!-- END OF REFERENCES --->
 
  
  

Latest revision as of 21:43, 12 August 2018

Corruption1.jpg

Perceiving Evil: The Study of the Corruption Perception Index

Proposal

Poster

Application

Report

Conclusion & Comments

 


Key Motivations

First launched in 1995, the Corruption Perceptions Index (CPI) has been widely credited with putting the issue of corruption on the forefront of the international policy agenda. Transparency International (TI), is an international non-governmental organization based in Berlin, Germany which acts to combat global corruption and prevent criminal activities arising from corruption.

TI publishes the CPI, annually ranking countries "by their perceived levels of corruption, as determined by expert assessments and opinion surveys. The CPI generally defines corruption as "the misuse of public power for private benefit".

The CPI currently ranks 176 countries on a scale from 100 (very clean) to 0 (highly corrupt). Denmark is the least corrupt country in the world, ranking consistently high among international financial transparency, while the most corrupt country in the world is North Korea, remaining on 8 out of 100 since 2012.

In our project, we married the data set from Transparency International on their CPI records (from 2012 to 2016) versus the World Bank data set through the years, which contains economical, agricultural, social, environmental data of the same countries. We will seek to find out if there is indeed any correlations between the perceived corruption level of a country, and its internal conditions.

Objectives (Questions we like to answer)

It has been a challenge to validate whether CPI is an accurate index to represent corruption.

A study in 2002 found a “strong and significant correlation” between CPI and 2 proxies: black market activity and overabundance of regulation. But it is hard to find any clear indicators of black market activities and regulations.

There were some claims by other studies as well:

  • Researchers found a correlation between higher CPI and higher long-term economic growth
  • There is an increase of 1.7% in GDP for every unit increase in a country GPI’s score
  • There is a “power-law” dependence linking higher CPI score with higher rates of foreign investment in a country

There is also criticism in the usage of CPI’s methodology, some flaws pointed are:

  • Corruption is too complex to be captured by a single score. The nature of corruption in rural Kansas will, for instance, be different than in the city administration of New York, yet the Index measures them in the same way
  • By measuring perceptions of corruption, as opposed to corruption itself, the Index may simply be reinforcing stereotypes and clichés
  • The Index only measures public-sector corruption, leaving out private actors

The objective of our study is to find out if:

  • There is a correlation between CPI and economic growth (through GDP)
  • There is a correlation between CPI and education
  • There is a correlation between CPI and gender equality
  • Attempt to debunk any stereotypes and myths we may have for individual countries


Data Sources

The data came from two sources.

The first one came from Transparency International (2015 CPI as an example): https://www.transparency.org/cpi2015#downloads

The data set contains the following important columns:

  • CPI 2012 - 2016 Rank
  • Country
  • Country Code
  • Region
  • Corruption Perceptions Index from 2012 to 2016

The second data set from the World Bank came from: https://datacatalog.worldbank.org/dataset/world-development-indicators

This data set is a collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates However, due to the huge amount of data, we only kept the data for countries which appeared in the CPI data set and only indices from 2006 to 2016. The fields we are looking at are the World Development Indicators (WDI).

The filtered dataset for the World Bank data was 259,750 rows across 171 countries.

Methodology

The first factor to assess CPI is to understand the methodology of calculating the index. The CPI scores and ranks countries and territories around the world on the perceived level of corruption in the public sector. CPI is an aggregate index, which draws on relevant questions from several different data sources that capture business and expert views.

In 2012, there is an updated methodology in calculating CPI. The following steps are followed to calculate the CPI:

  1. Select Data Sources: carefully selected from numerous independent institutions
  2. Standardise Data Sources: It is standardised to a scale of 0-100 where a 0 equals the highest level of perceived corruption and 100 equals the lowest level of perceived corruption. This is done by subtracting the mean of the data set and dividing by the standard deviation and results in z-scores, which are then adjusted to have a mean of approximately 45 and a standard deviation of approximately 20 so that the data set fits the CPI’s 0-100 scale.
  3. Calculate the average: For a country or territory to be included in the CPI, a minimum of three sources must assess that country. A country’s CPI score is then calculated as the average of all standardised scores available for that country. Scores are rounded to whole numbers.
  4. Report a measure of uncertainty: The CPI is accompanied by a standard error and confidence interval associated with the score, which capture the variation in scores of the data sources available for that country/territory.

We can also further analyse the individual survey scores to see if there is an impact to the overall index score by country.

Tools and Packages

R Studio, Tableau (only for preliminary EDA) and associated R libraries will be used:

  • shiny
  • plyr
  • shinydashboard
  • tidyr
  • ggplot2
  • plotly


References to Related DataViz

The references we are using for this project:

  1. E.V., T. I. (n.d.). Transparency International. Retrieved from https://www.transparency.org/research/cpi
  2. World Bank Open Data. (n.d.). Retrieved from https://data.worldbank.org/




Back to Project Group Page

Go back.png