Difference between revisions of "Group 4 - Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
<div style="background:#2b8474; border:#2b8474; text-align:center; font-family:Century Gothic;">
 
<div style="background:#2b8474; border:#2b8474; text-align:center; font-family:Century Gothic;">
[[Image:group_4_title.jpg|172px|left]]  
+
[[Image:group_4_title.jpg|126px|left]]  
<font  size = 6; color="#FFFFFF">ISSS608 Visual Analytics and Applications - Project <br><br> <b>EXPLORING AND VISUALIZING SPATIO-TEMPORAL PATTERNS OF SELF-INFLICTED DEATHS IN INDIA FROM 2001-2012 USING R</b><br></font>
+
<font  size = 5; color="#FFFFFF">ISSS608 Visual Analytics and Applications - Project <br><br> EXPLORING AND VISUALIZING SPATIO-TEMPORAL PATTERNS OF SELF-INFLICTED DEATHS IN INDIA FROM 2001-2012 USING R<br></font>
 
</div>
 
</div>
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
Line 13: Line 13:
  
 
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2c9985; text-align:center;" width="20%" |  
 
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2c9985; text-align:center;" width="20%" |  
[[Group 4 - Data Exploration and Analysis| <font color="#FFFFFF">Data Exploration and Analysis</font>]]
+
[[Group 4 - Report| <font color="#FFFFFF">Report</font>]]
  
 
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2c9985; text-align:center;" width="20%" |  
 
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2c9985; text-align:center;" width="20%" |  
Line 41: Line 41:
 
| Year || Shows the year ranging from 2001 – 2012
 
| Year || Shows the year ranging from 2001 – 2012
 
|-
 
|-
| Type_code || Categorizes as cause of act, means adopted, educational profile, social status and professional profile of the victim
+
| Type_code || Categories as cause of act, means adopted, educational profile, social status and professional profile of the victim
 
|-
 
|-
 
| Type || Describes the Type_code in detail
 
| Type || Describes the Type_code in detail
Line 51: Line 51:
 
| Total || Provides the number of victims
 
| Total || Provides the number of victims
 
|}
 
|}
 +
 +
<br/>
 +
 +
== R Packages Necessary ==
 +
Many useful R function comes in packages (free libraries of code written by R's active user community). We use these pre-defined packages to make our work easy and not all packages not pre-installed in RStudio. So, when we need to use any package we can install them manually from RStudio. The packages used for our analysis are as follows:
 +
 +
*ggplot2
 +
*RColorBrewer
 +
*scales
 +
*Shiny
 +
*plotly
 +
*Shinythemes
 +
*tmap
 +
*Shinydashboard
 +
*tidyverse
 +
*ggmap
 +
*devTools
 +
*d3treeR
 +
 +
== Data Preparation ==
 +
===Tidy State name===
 +
Certain States' name appear abbreviated, renaming them with their full forms could be highly helpful in interpretation. <br/>
 +
 +
[[Image:G4prep1.JPG | 600px | border="0"]] <br/>
 +
 +
[[Image:G4prep2.JPG | 600px | border="0"]]
 +
 +
===Tidy Type variable===
 +
Moreover, there is discrepancy in 'Type' column for 2 values which are corrected as shown: <br/>
 +
 +
[[Image:G4prep3.JPG | 750px | border="0"]] <br/>           
 +
 +
[[Image:G4prep5.JPG | 600px | border="0"]]
 +
 +
===Tidy Causes Type code===
 +
There are certain causes which are not properly captured and are not specific enough to be added in the analysis. Also, the States column includes aggregations which are should be removed. A separate data frame is created after removing these variables.<br/>
 +
 +
[[Image:G4prep4.JPG | 750px | border="0"]]<br/>

Latest revision as of 09:32, 7 August 2017

Group 4 title.jpg

ISSS608 Visual Analytics and Applications - Project

EXPLORING AND VISUALIZING SPATIO-TEMPORAL PATTERNS OF SELF-INFLICTED DEATHS IN INDIA FROM 2001-2012 USING R

Home

Data Preparation

Report

Poster

R Shiny Application

 


Dataset Overview

Group 4 states.png

The dataset – “Suicides in India” is downloaded from Kaggle uploaded by Rajanand Ilangovan. The dataset contains the suicides committed from 2001 – 2012. The structure of dataset is shown in the screenshot provided below:
G4 Pic1.png

The variables and their description:

Variable Description
State Lists the 29 states and 7 union territories in India
Year Shows the year ranging from 2001 – 2012
Type_code Categories as cause of act, means adopted, educational profile, social status and professional profile of the victim
Type Describes the Type_code in detail
Gender Either male or female
Age-group Shows the age range of the victims
Total Provides the number of victims


R Packages Necessary

Many useful R function comes in packages (free libraries of code written by R's active user community). We use these pre-defined packages to make our work easy and not all packages not pre-installed in RStudio. So, when we need to use any package we can install them manually from RStudio. The packages used for our analysis are as follows:

  • ggplot2
  • RColorBrewer
  • scales
  • Shiny
  • plotly
  • Shinythemes
  • tmap
  • Shinydashboard
  • tidyverse
  • ggmap
  • devTools
  • d3treeR

Data Preparation

Tidy State name

Certain States' name appear abbreviated, renaming them with their full forms could be highly helpful in interpretation.

border="0"

border="0"

Tidy Type variable

Moreover, there is discrepancy in 'Type' column for 2 values which are corrected as shown:

border="0"

border="0"

Tidy Causes Type code

There are certain causes which are not properly captured and are not specific enough to be added in the analysis. Also, the States column includes aggregations which are should be removed. A separate data frame is created after removing these variables.

border="0"