APA Project Management

Defining Target Labels
Work Network contains data for each employee relationship in the company that depicts the strength of their work-based relationship. To add business value, the following rules are applied to bin the target variable:


Survey Values	Target Labels	Count	Percentage
0,1	No Relation	137	58.33%
1.5, 2, 2.5	Weak Relation	490	16.67%
3, 3.5, 4	Moderate Relation	73	16.31%
4.5, 5	Strong Relation	140	8.69%

Many employees while filling the survey treated 1 as the minimum value (instead of blank/ no response). Therefore, both 0 and 1 are considered no relation. Since, 4 bins are defined, if the classification model can predict these categories well, email data will be a strong representation of work network. Table above shows the distribution of the target label. The results are very skewed with 58.33% of the data instances belonging to ‘No Relation’ category. Since the data is skewed, a validation column is created to create test and train data using the stratified sampling technique. This ensures that the distribution of the target label category remains same in both target and test data points.

Prediction Screening
Using SAS JMP’s in-built predictor screening, it is observed that some features are stronger predictors of the target variable compared to other calculated features. This observation is important for the application of penalty type under model fitting for classification algorithms such as Neural Network.

APA Project Management

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools