Difference between revisions of "Group 3 Report"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 15: Line 15:
 
The analysis flow basically includes 3 parts:<br>
 
The analysis flow basically includes 3 parts:<br>
 
1. Data exploration and variable selection<br>
 
1. Data exploration and variable selection<br>
Includes variable correlation matrix which enables users to exclude those highly-related variables in the regression model. Par-coordinate chart is also provide for users to have a general impression of these correlations.<br>
+
Includes variable correlation and distribution matrix which enables users to exclude those highly-related variables in the regression model. Par-coordinate chart is also provide for users to have a general impression of these correlations.<br>
 
2. Modelling and visualization<br>
 
2. Modelling and visualization<br>
The output of the data has
+
In the out of the regression model, we will display the parameter estimates of each variable, as well as its significant level, which is calculated by the p value.<br>
 
3. Data analysis<br>
 
3. Data analysis<br>
  

Revision as of 19:18, 2 December 2017

Background

Imbalance of economic development has become a long-lasting issue in China. Benefited by the geographic location as well as the national policy deployed in 1980's, east coast areas in China, especially Shanghai, Zhejiang and Jiangsu have grown at an incredible speed during last few decades. The economic growth in east China shows a geographic radiation pattern, and contributors for GDP are different in every area.
In the project, we will use regression model and focus on researching the different GDP indicators in east China.We will use R to build an interaction application so that users could feel easy to explore their interested economic contributors.

Data description

The data set we are using includes 2 parts:
1. GDP and indicators data
We have downloaded the statistic data of 78 regions in China.The variables includes GDP volume (including total GDP and GDP for each industry ) and more than 20 variables that we think might be potentially influence the GDP volume.
2. Shape files
The shapefile of Chinese region (CHN_adm_shp) is available on ERSI (which is an organization providing geographic information system). The shapefile includes 3 levels. In our project, we are using the 2nd level of the shapefile ("prefecture-level city").

Analysis flow

In our research, both linear regression model and geo-weighted regression model will be used in analyzing the effects of each indicators.
The analysis flow basically includes 3 parts:
1. Data exploration and variable selection
Includes variable correlation and distribution matrix which enables users to exclude those highly-related variables in the regression model. Par-coordinate chart is also provide for users to have a general impression of these correlations.
2. Modelling and visualization
In the out of the regression model, we will display the parameter estimates of each variable, as well as its significant level, which is calculated by the p value.
3. Data analysis

Tools used

Methodology