Group01 Report

From Visual Analytics and Applications
Revision as of 21:11, 3 December 2017 by Danny.lim.2016 (talk | contribs)
Jump to navigation Jump to search
Grp01 headerImage.png


WORLD DEVELOPMENT INDICATORS: A NEW VISUAL PERSPECTIVE
A web-based analytics application to visualize countries development across the globe


OVERVIEW

PROPOSAL

POSTER

APPLICATION

REPORT


Visual Design Framework

Grp01 design framework.png

Raw Data

Grp01 datastructure.png

Data Tables

Grp01 datatable.png

Visual Structure

Circlize in R

For development of the visual application, R will be used as the base code with the focus on using Circlize package to create the graphical visualization for WDI data.


Grp01 circlize01.png

The reason for selecting the Circlize package is that Circular layout is an efficient way for the visualization of massive amounts of information. This package provides an implementation of circular layout generation in R and provide the flexibility the use low-level and high-level graphics functions as defined by the team for specific purposes. Together with the seamless connection between the powerful computational and visual environment in R, it gives the team more convenience and freedom to design figures for better understanding complex patterns behind multiple dimensional data.



Grp01 circlize04.png

Circular visualization is commonly used in Genomics and related omics fields due to its efficiency in revealing associations in high dimensional genomic data. Other used case includes visualizing human movement related to Global Migration where movement can be plotted using Chord diagram function within the circular plot. In fact, this project where the team is undertaking will possibly be the first known project to use circular plot to represent the multi-dimensional data of the World Development Indicators.


Data structure and display

  • A circular layout is composed of sectors, tracks and cells.
  • As illustrated in the figure below, the pink circle is known as track, the blue section represents a sector and the intersection of a sector and a track is called a cell.
  • Within each cell circlize allows the data to be displayed in the form of line graph, bar chart, histogram, scatterplot, heatmap, etc.
  • For the inner-most track, Circlize also allows chord diagram to rendered to visualize movement as well as dendrogram to be rendered for clustering purposes.

Grp01 circlize03.pngGrp01 circlize02.pngGrp01 circlize05.png

Visual Representation

WDI data will be mapped into 3 circular plot in for 3 different type of visual representation as illustrated below:

Circular Plot (SeriesTrend)

Grp01 circularPlot01.png


Circular Plot (Country Trend)

Grp01 circularPlot02.png


Circular Plot (Series Comparison)

Grp01 circularPlot03.png


CASE STUDY: Ministry of Health Data Analyst

User: Data analyst from Ministry of Health

Objective: Understanding Health of people living in Singapore and how it is compared globally


We start from Country Trend model and filtered health data for Singapore from 2007 to 2015. We analyze the series one by one and find abnormal trend against our expectation.

The series with unexpected or abnormal trends are:

  • SP.DYN.TFRT.IN : Fertility rate, total (births per woman)
  • SP.DYN.CDRT.IN : Death rate, crude (per 1,000 people)
  • SP.DYN.CBRT.IN : Birth rate, crude (per 1,000 people)
  • SH.ANM.CHLD.ZS : Prevalence of anemia among children (% of children under 5)
  • SH.ANM.NPRG.ZS : Prevalence of anemia among non-pregnant women (% of women ages 15-49)
  • SH.ANM.ALLW.ZS: Prevalence of anemia among children (% of children under 5)
  • SH.PRG.ANEM: Prevalence of anemia among pregnant women (%)

700px

After finding the series need to spot on, we use Compare Series model to compare Singapore against other countries with same set of series.

For example, we pick Fertility rate, total (births per woman), Death rate, crude (per 1,000 people) and Birth rate, crude (per 1,000 people) for 2015. Singapore is one of the countries that have low fertility rate, low to medium death rate and low birth rate. Similar countries and areas are United Arab Emirates, Qatar, Macau, Korea, Liechtenstein, and Hong Kong. They generally are small counties or regions.

If we read the circle with clockwise direction, first segment are counties with high fertility rate, death rate and birth rate with the darkest green. Following segments also have high rates, but slight lower than 1st segment. Most of them are from Africa. 3rd segment are mostly counties from middle east and Asia with medium fertility rate, death rate and birth rate. The 4th segment are counties with low fertility rate and birth rate but high death rate. Most of them are from developed countries. Last segment contains countries from America and Asia with low to medium fertility rate, death rate and birth rate.

Singapore is a developed country. It faces problems that other developed countries also face, which is the low fertility rate/birth rate. However, Singapore didn’t encounter a high death rate as other developed countries.


700px


To further investigate Singapore’s birth rate and death rate, we use ‘Series Trend’ model. As fertility rate and birth rate are closely related, we didn’t check fertility rate in this example.


700px
Birth rate, crude (per 1,000 people)


From the above birth rate circlize chart, Singapore is in the biggest segment which contains half of the counties. Singapore has a fluctuating but generally downward trend of birth rate. If we randomly pick other countries such as United State and Iran which are shown below. they also have a downward trend of birth rate. In conclusion, half of the world have the same downward trend, it is not Singapore alone.


700px
Singapore vs. USA vs. Iran (birth rate)

700px
Death rate, crude (per 1,000 people)


As for death rate, there’s no big cluster that contains most of the countries which means death rate trend are quite different among countries. Even the cluster Singapore belongs to doesn’t show a generic trend. However, we can find some Africa or developing countries are improving their death rate since it is easier to control the high death rate, e.g. Congo. As for some developed countries, they have fluctuating trend but lower death rate such as USA.


700px
Singapore vs. USA vs. Congo (death rate)


As a data analyst for MOH, he/she can find following things from WDI data for some selected health topics

  • Singapore government is trying to improve birth rate. However, it will be a challenge since a lot of countries are suffering the same problem;
  • Singapore death rate is generally increasing. However, the overall death rate is still considered at low healthy level.

Limitation and Future Works

Limitation of the current application

The World Development Indicators visualization application are currently having the following limitations.

  • Due to the incumbent restriction of the Shiny package, maximum file size allowed to be uploaded is 30MB which does not allow the application to load the full 56 years of data into the application unless manual remove of data is performed after downloading it from World Bank Data. Additionally, the application is also designed to only accept .csv file.
  • Circular Plot is restricted by it’s 360 degrees view it is not advisable to selected too many data variables to be display all at once.
  • The data range of the selected series indicator must not have any "blank" values, else the application will automatically exclude it from the final visual representation.
  • The application currently only support trends analysis through the heatmap colour gradient representation. There is no other visual representation of the absolute values in other graphical form.
  • As hierarchical clustering will be performed, the No. of Cluster cannot be less than the No. of Sector values selected to allow clustering to happen.


Future improvement of the application

The World Development Indicators visualization application has a lot of potential to be extended and enhanced further.

  • The data file upload function could be replaced with an automated API call to read future data uploaded by World Bank data to allow for real-time analysis and exploration.
  • The data model could be enhanced to automatically include new series indicators that World Bank might add from time to time.
  • The application can also be enhanced to allow for country grouping filtering such as region, Incomes Levels, Lending Groups, etc. classified under the World Bank database.
  • Allow for different visualization for in depth analysis instead of the current fixed box plot and line chart representation.

References