Difference between revisions of "Group 3 Report"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 22: Line 22:
  
 
=Tools used=
 
=Tools used=
 +
The application is built using R language. The advantage of R is its ability in analysis. This would be beneficial especially in the future model evaluation.
 +
Listed below are the R packages we used in building the application:
 +
Rgdal package<br>
 +
Rgdal is a widely used package for reading shapefiles. The shapefile we are using is “CHN_adm_shp”, which is provided by ESRI. The selected level is “CHN_adm2”, meaning prefecture level region. <br>
 +
 +
Package rmapshaper<br>
 +
The ms_simplify function allows us to simplify the outline of the shapefile. The simplified shapefile will largely improve the processing efficiency when we are building statistic maps.<br>
 +
 +
GWmodel package<br>
 +
Package Gwmodel is an R package which allows users to realize spatial data analysis. The package provides GW summary statistics, GW principle component analysis, GW discriminant analysis and various GW regression models [--R GW model documentation]. The advantage is that it integrates parameter estimates with adjust test results which gives p value as significance indicator of the model.<br>
 +
 +
Leaflet package<br>
 +
This is the R version of leaflet.js, which is an interactive map interface using htmlwidgets enabling users to adjust and interact with the map objects.<br>
 +
 +
Shiny package<br>
 +
Shiny package provides a reactive web application so that users can manipulate the data and focus on the analysis. <br>
  
 
=Methodology=
 
=Methodology=

Revision as of 19:24, 2 December 2017

Background

Imbalance of economic development has become a long-lasting issue in China. Benefited by the geographic location as well as the national policy deployed in 1980's, east coast areas in China, especially Shanghai, Zhejiang and Jiangsu have grown at an incredible speed during last few decades. The economic growth in east China shows a geographic radiation pattern, and contributors for GDP are different in every area.
In the project, we will use regression model and focus on researching the different GDP indicators in east China.We will use R to build an interaction application so that users could feel easy to explore their interested economic contributors.

Data description

The data set we are using includes 2 parts:
1. GDP and indicators data
We have downloaded the statistic data of 78 regions in China.The variables includes GDP volume (including total GDP and GDP for each industry ) and more than 20 variables that we think might be potentially influence the GDP volume.
2. Shape files
The shapefile of Chinese region (CHN_adm_shp) is available on ERSI (which is an organization providing geographic information system). The shapefile includes 3 levels. In our project, we are using the 2nd level of the shapefile ("prefecture-level city").

Analysis flow

In our research, both linear regression model and geo-weighted regression model will be used in analyzing the effects of each indicators.
The analysis flow basically includes 3 parts:
1. Data exploration and variable selection
Includes variable correlation and distribution matrix which enables users to exclude those highly-related variables in the regression model. Par-coordinate chart is also provide for users to have a general impression of these correlations.
2. Modelling and visualization
In the out of the regression model, we will display the parameter estimates of each variable, as well as its significant level, which is calculated by the p value.
3. Data analysis
We will use the interface to analyze the different effects from selected indicators.

Tools used

The application is built using R language. The advantage of R is its ability in analysis. This would be beneficial especially in the future model evaluation. Listed below are the R packages we used in building the application: Rgdal package
Rgdal is a widely used package for reading shapefiles. The shapefile we are using is “CHN_adm_shp”, which is provided by ESRI. The selected level is “CHN_adm2”, meaning prefecture level region.

Package rmapshaper
The ms_simplify function allows us to simplify the outline of the shapefile. The simplified shapefile will largely improve the processing efficiency when we are building statistic maps.

GWmodel package
Package Gwmodel is an R package which allows users to realize spatial data analysis. The package provides GW summary statistics, GW principle component analysis, GW discriminant analysis and various GW regression models [--R GW model documentation]. The advantage is that it integrates parameter estimates with adjust test results which gives p value as significance indicator of the model.

Leaflet package
This is the R version of leaflet.js, which is an interactive map interface using htmlwidgets enabling users to adjust and interact with the map objects.

Shiny package
Shiny package provides a reactive web application so that users can manipulate the data and focus on the analysis.

Methodology