Group 3 Report

From Visual Analytics and Applications
Revision as of 19:20, 2 December 2017 by Zjchen.2016 (talk | contribs)
Jump to navigation Jump to search

Background

Imbalance of economic development has become a long-lasting issue in China. Benefited by the geographic location as well as the national policy deployed in 1980's, east coast areas in China, especially Shanghai, Zhejiang and Jiangsu have grown at an incredible speed during last few decades. The economic growth in east China shows a geographic radiation pattern, and contributors for GDP are different in every area.
In the project, we will use regression model and focus on researching the different GDP indicators in east China.We will use R to build an interaction application so that users could feel easy to explore their interested economic contributors.

Data description

The data set we are using includes 2 parts:
1. GDP and indicators data
We have downloaded the statistic data of 78 regions in China.The variables includes GDP volume (including total GDP and GDP for each industry ) and more than 20 variables that we think might be potentially influence the GDP volume.
2. Shape files
The shapefile of Chinese region (CHN_adm_shp) is available on ERSI (which is an organization providing geographic information system). The shapefile includes 3 levels. In our project, we are using the 2nd level of the shapefile ("prefecture-level city").

Analysis flow

In our research, both linear regression model and geo-weighted regression model will be used in analyzing the effects of each indicators.
The analysis flow basically includes 3 parts:
1. Data exploration and variable selection
Includes variable correlation and distribution matrix which enables users to exclude those highly-related variables in the regression model. Par-coordinate chart is also provide for users to have a general impression of these correlations.
2. Modelling and visualization
In the out of the regression model, we will display the parameter estimates of each variable, as well as its significant level, which is calculated by the p value.
3. Data analysis
We will use the interface to analyze the different effects from selected indicators.

Tools used

Methodology