Difference between revisions of "Group09 proposal"

From ISSS608-Visual Analytics and Applications
Jump to navigation Jump to search
Line 91: Line 91:
  
 
== Visualization ==
 
== Visualization ==
Time series <br>: The‘highcharter',‘plotly',‘viridis',‘scatterplot3' and‘ggplot2’packages will be employed to create two interactive time series line charts, revealing how the historical and future global CO2 emission and temperature change during different period.   
+
'''Exploratory Data Analysis'''
 +
The‘highcharter',‘plotly',‘viridis',‘scatterplot3' and‘ggplot2’packages will be employed to create two interactive time series line charts, revealing how the historical and future global CO2 emission and temperature change during different period.   
 
# World map: Same packages as mentioned above will be applied to generate two interactive world maps, showing and comparing the CO2 emission in different country and year respectively.
 
# World map: Same packages as mentioned above will be applied to generate two interactive world maps, showing and comparing the CO2 emission in different country and year respectively.
  

Revision as of 21:09, 29 February 2020

Abstract

Objective

Data Source

This data from the ' China Health and Nutrition Survey '. Click here to see the data.

About Data Set

The survey took place over a 7-day period using a multistage, random cluster process to draw a sample of about 7,200 households with over 30,000 individuals in 15 provinces and municipal cities that vary substantially in geography, economic development, public resources, and health indicators. In addition, detailed community data were collected in surveys of food markets, health facilities, family planning officials, and other social services and community leaders.

Variables

(Dependent variable) Health
(Independent variable) Income inequality, Income Variables, Individual controls, Occupation and Sector
The table below shows the description of main variables that we will be using for our analysis:

Variable Description
Health
Blood pressure Binary variable. Defined as 0 if the blood pressure of object within normal range, else 1. A normal blood pressure is defined as at or below 120/80 mmHg.
WHR The waist-hip ratio. Binary variable. Defined as 1 if the ratio above limit, else 0. A normal WHR is defined as at or below 0.80 for women and 0.90 for men.
MAMC Mid-arm muscle circumference. Binary variable. Defined as 1 if this figure is abnormal and 0 otherwise. individuals with 20.88 or more for women and 22.77 or more for men are coded normal.
Overweightness Binary variable. Defined as 1 if respondent is overweight, else 0. Non-overweight population in China is defined as below a BMI of 25 kg/m2.
Income inequality
Gini The Gini-coefficient in the county level, sensitive to changes at middle income levels.
Theil L The mean logarithm deviation of the Generalized Entropy (Theil) indices, which is sensitive to changes at the bottom income levels.
Theil T The Theil index and is sensitive to changes in upper income levels.
Theil V The half the squared coefficient of variation of Theil index.
Income variables
Individual income The sum of each individual's income source, by adding up all individual income and revenue, minus individual expenditures. Household subsidies and other income that cannot be allocated to individuals in the household are not considered as a part of individual income.
County mean income(ind.) Captures the degree of economic development in a county-level unit, calculated by averaging individual income in a county/city for all observations in the CHNS. “Ind” refer to individual.
Household income The sum of all individual incomes in a household.
County mean income(hh.) Calculated by averaging household income in a county/city for all observation in the CHNS. “hh” refer to household.
Individual controls
Age The age of respondent.
Gender Binary variable. Defined male as 0 and female as 1.
Married Binary variable. Married as 1, unmarried as 0.
Majority If the nationality of object is Han, then defined as “1”, else 0.
Years of education Calculated from the beginning of primary school, 6 years of primary school graduation, 9 years of junior high school graduation, 12 years of high school graduation, and 16 years of university graduation.
urban Binary variable. If respondent holds urban household registration then defined as 1, else 0.
Occupation
Services class Includes “senior professional/technical”, “administrator/executive/manager” and “army officer/police officer”.
Non-manual worker Includes “junior professional technical” and “office staff”.
Skilled worker/supervisor Includes “skilled worker” and “ordinary soldier, policeman”, “driver” and “athlete, actor, musician”.
Semi-/non-skilled worker Includes “non-skilled worker” and “service worker”.
Farmer As originally defined by CHNS data.
Others The rest of original occupation covered by CHNS data.
Sector
State Includes “government”, “state service/institute” and “state-owned enterprise”.
Collective Includes “small collective enterprise” and “large collective enterprise”.
Family farming As original variable “family contract farming” of CHNS data.
Individual enterprise As variable “private, individual enterprise”, which originally defined by CHNS
Private three-cap Enterpr. The same as “three- capital enterprise” in CHNS data.
Others Includes “unknown” data in CHNS.

Visualization

Exploratory Data Analysis The‘highcharter',‘plotly',‘viridis',‘scatterplot3' and‘ggplot2’packages will be employed to create two interactive time series line charts, revealing how the historical and future global CO2 emission and temperature change during different period.

  1. World map: Same packages as mentioned above will be applied to generate two interactive world maps, showing and comparing the CO2 emission in different country and year respectively.

Methodology

Process.png

Three different approaches will be utilized to predict the global CO2 emissions and temperature in the next 10 years:

  1. Holt exponential smoothing: By applying this approach, consequently each relevant variables’ (e.g. gas fuel, liquid fuel and solid fuel) future value will be obtained. And we can use them to predict the future CO2 emission by employing the linear regression model.
  2. SARIMA: Seasonal Autoregressive Integrated Moving Average (SARIMA) model, an extension of ARIMA that explicitly supports univariate time series data with a seasonal component will be applied to conduct the prediction. We can use it gain the annual CO2 emission in the future with a lower and upper bound.
  3. Auto-Regression: The Auto-Regression model describes the relationship between current values and the historical values. And it uses the historical time data as the variable to predict its future value. The factors that influence the CO2 emission, such as solid fuel and gas fuel, can be predicted by Auto-Regression model. As a result, the future global CO2 emission will be predicted by employing the linear regression model.

After completing all prediction methods mentioned above, we intend to compare their result respectively with the actual CO2 emission in recent years as an evaluation and determine which of them is the best fit one.

Critics of Existing Works

Team Members

References