Group 4 - Data Preparation
ISSS608 Visual Analytics and Applications - Project
EXPLORING AND VISUALIZING SPATIO-TEMPORAL PATTERNS OF SELF-INFLICTED DEATHS IN INDIA FROM 2001-2012 USING R
Contents
Dataset Overview
The dataset – “Suicides in India” is downloaded from Kaggle uploaded by Rajanand Ilangovan. The dataset contains the suicides committed from 2001 – 2012. The structure of dataset is shown in the screenshot provided below:
The variables and their description:
Variable | Description |
---|---|
State | Lists the 29 states and 7 union territories in India |
Year | Shows the year ranging from 2001 – 2012 |
Type_code | Categories as cause of act, means adopted, educational profile, social status and professional profile of the victim |
Type | Describes the Type_code in detail |
Gender | Either male or female |
Age-group | Shows the age range of the victims |
Total | Provides the number of victims |
R Packages Necessary
Many useful R function comes in packages (free libraries of code written by R's active user community). We use these pre-defined packages to make our work easy and not all packages not pre-installed in RStudio. So, when we need to use any package we can install them manually from RStudio. The packages used for our analysis are as follows:
- ggplot2
- RColorBrewer
- scales
- Shiny
- plotly
- Shinythemes
- tmap
- Shinydashboard
- tidyverse
- ggmap
- devTools
- d3treeR
Data Preparation
Tidy State name
Certain States' name appear abbreviated, renaming them with their full forms could be highly helpful in interpretation.
Tidy Type variable
Moreover, there is discrepancy in 'Type' column for 2 values which are corrected as shown:
Tidy Causes Type code
There are certain causes which are not properly captured and are not specific enough to be added in the analysis. Also, the States column includes aggregations which are should be removed. A separate data frame is created after removing these variables.