Group10 Analysis Report

From Visual Analytics and Applications
Revision as of 00:07, 14 August 2018 by Xjsong.2017 (talk | contribs)
Jump to navigation Jump to search

Voice Driven Visual Analysis of Global Migration

Introduction

Proposal

Project Report

Poster

Application

Back to Main]

 


Data Preparation

The dataset for building the visual models for this analysis have been obtained from World Bank Open Data repository. We have taken the bilateral matrix of migrant stock counts and the remittance amounts contributed by the migrants for the years 1990-2015 in 5-year intervals. To added dimensionality to the analysis we also obtained the GDP per capita (USD), GDP (USD), Population, Migrants as a % of population from the World Bank’s WDI (World development Index) series data. Lastly, we have used a GeoJSON file with polygon data for all countries of the world to visualize the spatial distribution for analysis. The data was fairly clean and the cleaning steps involved only replacing the cells that were filled with dots in cells indicating no migrant stock from the row origin to column destination country to blanks for readability into R. The rows containing the sub-region, continent, world level aggregation of migrant stock to the countries were removed.

The other data preparation involved reshaping the data to suit creating the visualizations. For the chord diagram, a bilateral matrix with non-zero cells of migrant stock numbers and zeroed diagonal matrix was created.

For the choropleth map, the migrant stock data was binned to percentiles based on the origin/destination country.

For performing the quadrant analysis, the migrant stock inverse percentiles are calculated to each origin country dynamically for every destination.

For the corrplot, the bilateral balance index model was applied to the data by categorizing the countries based on their development levels.

Voice Integration Methodology


Annyang is a tiny JavaScript library for performing Speech Recognition and SpeechKITT is a GUI for the user to interact with Speech Recognition. Our project integrates both Annyang and Speech KITT using JavaScript to send the voice input to R Shiny dashboard for visual analysis.

Load R shiny app in Chrome browser and activate Annyang​, the functions defined to recognize the user's voice input are loaded onto Annyang and the javascript is included into R shiny as a tag.

Annyang1.png


The user speaks the voice input in the defined command format and the input value is parsed into shiny using the shiny.oninput method.

Annyang2.png


The parsed input which is the user's voice converted to text is received in shiny by using an observe command which is used to update the state of the dashboard input element using the corresponding oninput method.

Annyang3.png


Chord1.gif


Map1 1.gif


Tab4.gif


G10Overview.gif



Visualization Methodology

1. Chord Diagram

The chord diagram visualises the inter-relationships between entities. The connections between entities are used to display that they share something in common. This makes Chord Diagrams ideal for comparing the similarities within a dataset or between different groups of data. Nodes are arranged along a circle, with the relationships between points connected to each other either using arcs or Bezier curves. Values are assigned to each connection, which is represented proportionally by the size of each arc. Colour can be used to group the data into different categories, which aids in making comparisons and distinguishing groups.

The ‘chorddiag’ package allows to create interactive chord diagrams using the JavaScript visualization library D3 from within R using the html widgets interfacing framework.

G10chorddiagram.png

The above figure has 2 connected chord diagrams. The chord diagram on the left visualizes migration flows on a continent or a sub region level. Once a flow is selected, the second chord diagram reacts to the change and displays countries accordingly. The country flow migration flow can then be visualized.

2. Bidirectional Trend chart

The trend chart is a graphical representation of time series data showing the trend line that reveals a general pattern of change. The chord diagram gives the bidirectional flow of migrants for a year, so we use a line chart to observe the trend of the migrant stock over the years. The input to the line chart is from the chord diagram when a flow line is selected. The line chart displays the bidirectional migrant stock trend for the two countries in the flow line selected.

3. Choropleth Map

A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. Choropleth maps provide an easy way to visualize how a measurement varies across a geographic area or show the level of variability within a region. A heat map is similar but does not use geographic boundaries.

Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. 'leaflet' package makes it easy to integrate and control Leaflet maps in R. Choropleth Maps display divided geographical regions that are coloured in relation to a data variable. This provides a way to visualise values over a geographical area, which can show variation across the displayed location.

G10line1.png

In this case, we use choropleth map to display the migration and remittance data and filtering out the top 10 migrants/remittance countries after selecting the origin country using select input. When compared to chord diagram, which can only show the value of the migrants, in this map, we can also see the location of each country to investigate the detail of why people migrate. In this map, the darker the colour the higher the value. Therefore, we can find out the countries with large number of migrants, but with small amount of remittance.

4. Correlation Plot

Corrplot contains algorithms to do matrix reordering and is good at details, including choosing colour, text labels, colour labels, layout. The corrplot package is a graphical display of a correlation matrix, confidence interval.

Bilateral balance index is a measure of strength of balance in a bi-directional flow data. It ranges from 0 to 1, with 1 indicating an equally strong flow in both directions.

For any country pair i-j, the bilateral balance is calculated as:

G10formula1.png

Mij = Migrant stock in j from i in a time period
Mji = Migrant stock in i from j in a time period


The corr plot is used to study the bilateral balance in the migration between countries of same development types. Darker cells in the matrix indicate higher bilateral balance between two countries.

G10corr.png


5. Quadrant Plot

Quadrant Analysis is used to see the relationship between two data variables analysed on four quadrants.

Plotly's R graphing library makes interactive, publication-quality graphs online. Plotly allows users to Zoom, Pan, and Hover Controls in Plotly Graphs. Also, when the user hovers over a point on a scatter plot, the tooltip shows information about this point.

G1011.png


This plot investigates the relationship between migration and remittance percentile for a selected origin country. The bubbles are coloured by region or income level or development type and sized by the GDP Per Capita of the destination country.

6. Bar chart

Bar charts are used to visualize the relationship between one categorical variable and one continuous variable. We have used it in it’s few forms to visualize part of whole, trend and migrant remittance relationships plotted using ggplot wrapped on to the plotly package, since we do not require any of the plotly filter features, we have removed all the unnecessary non-data ink. The below 100% stacked bar chart shows the trend change in the part whole relationship of the world’s total migrants as a proportion of their current country of residence by income level. There is an increasing trend in the migrants moving towards high income countries.

G10bar1.png


The below dodged bar chart presents a gender bias analysis, when a flow and year is picked, bars indicating the trend of men and women migrant stock numbers over years is plotted. The selected year is highlighted in red, using ggplot’s grammar of graphics by superimposing multiple layers by varying fill and border functions.

G10bar2.png


The below dodged bar chart is implemented using plotly’s event data method to captured a user selected subset of input data from an existing plotly plot to interactively visualize the migration and remittance numbers from an origin country across all selected destination countries.

G10bar3.png