APA Project Management

From Analytics Practicum
Revision as of 19:53, 23 April 2017 by Prekshaapu.2013 (talk | contribs)
Jump to navigation Jump to search

APA logo.png

HOME

 

PROJECT OVERVIEW

 

METHODOLOGY

 

FEATURE ENGINEERING

 

CLASSIFICATION MODELLING

 

DOCUMENTATION

 

OTHER PROJECTS

 


Defining Target Labels
Work Network contains data for each employee relationship in the company that depicts the strength of their work-based relationship. To add business value, the following rules are applied to bin the target variable:

Survey Values Target Labels Count Percentage
0,1 No Relation 137 58.33%
1.5, 2, 2.5 Weak Relation 490 16.67%
3, 3.5, 4 Moderate Relation 73 16.31%
4.5, 5 Strong Relation 140 8.69%

Many employees while filling the survey treated 1 as the minimum value (instead of blank/ no response). Therefore, both 0 and 1 are considered no relation. Since, 4 bins are defined, if the classification model can predict these categories well, email data will be a strong representation of work network. Table above shows the distribution of the target label. The results are very skewed with 58.33% of the data instances belonging to ‘No Relation’ category. Since the data is skewed, a validation column is created to create test and train data using the stratified sampling technique. This ensures that the distribution of the target label category remains same in both target and test data points.

Prediction Screening
Using SAS JMP’s in-built predictor screening, it is observed that some features are stronger predictors of the target variable compared to other calculated features. This observation is important for the application of penalty type under model fitting for classification algorithms such as Neural Network.