ISSS608 2016 17T1 Group2 Proposal
|
|
|
|
Background
Cancer remains an important cause of death around the world. One of 5 people is killed by cancer each year in the United States, and the number in the world is 100-350 per 100000. The risk of cancer increases significantly with age and many cancers occur more commonly in developed countries. Rates are increasing as more people live to an old age and as lifestyle changes occur in the developing world.
Data Source
Our data is extracted from https://wonder.cdc.gov/cancer.html.
The dataset includes the following variables:
- Cancer Sites: There are 20 leading cancer sites and 4 hierarchies in total.
- Age_Group: The group standard was 5 years in one group. The details are <1 year, 1-4 years, 5-9 years, 10-14 years, 15-19 years, 20-24 years, 25-29 years, 30-34 years, 35-39 years, 40-44 years, 45-49 years, 50-54 years, 55-59 years, 60-64 years, 65-69 years, 70-74 years, 75-79 years, 80-84 years, 85+ years.
- Region: The United States is split into 4 regions: Northeast, Midwest, South and West.
- States: All 50 states and the District of Columbia are represented for all years. Data for Puerto Rico are available for years 2006 and later.
- Sex: Female and male.
- Race: There are 4 racial categories included in the data: "American Indian or Alaska Native," "Asian or Pacific Islander," "Black or African American," and "White."
- Incidence Counts: The number of diagnoses of cancer in living persons.
- Death Counts: The number of deaths.
- Age-Adjusted Rates for Incidence and Mortality: Age-adjusted rates are calculated with age distribution ratios from the world standard million population, and the rates are shown per 100,000 population.
An age-adjusted rate is a weighted average of the age-specific (crude) rates, where the weights are the proportions of persons in the corresponding age groups of a standard million population. The potential confounding effect of age is reduced when comparing age-adjusted rates computed using the same standard million population.
Objectives
- Explore relations among demographics(e.g. gender, age and others) and various cancers.
- Provide people related information about cancers.
Timeline
Challenges
- Data Acquisition: The website we acquired our data has maximum limitation, and at most 5 variables could be got once.
- Visualization Process: Data should be transformed into suitable form.