Difference between revisions of "APA Final Progress"
Line 243: | Line 243: | ||
Figure above shows that Hana Owens is connected to high level employees like Manish Goel and Adesh Goel. <br> | Figure above shows that Hana Owens is connected to high level employees like Manish Goel and Adesh Goel. <br> | ||
'''Degree Centrality'''<br> | '''Degree Centrality'''<br> | ||
− | Degree centrality was applied to the social network created using email data where the edges had no weights and the graph below was derived. It can be observed that degree centrality is similar to out degree and in degree results. Thus, the study will just consider degree centrality for social network comparison instead of focusing on in-degree and out-degree. | + | Degree centrality was applied to the social network created using email data where the edges had no weights and the graph below was derived. It can be observed that degree centrality is similar to out degree and in degree results. Thus, the study will just consider degree centrality for social network comparison instead of focusing on in-degree and out-degree. <br> |
[[Image:deg.png|400px]]<br> | [[Image:deg.png|400px]]<br> | ||
'''Closeness Centrality''' <br> | '''Closeness Centrality''' <br> | ||
− | Closeness centrality was applied to the social network created using email data where the edges had no weights and the Figure below was derived.It can be observed that Closeness Centrality shows very less variation among the employees. Thus, the study will not consider closeness centrality for social network comparison. | + | Closeness centrality was applied to the social network created using email data where the edges had no weights and the Figure below was derived.It can be observed that Closeness Centrality shows very less variation among the employees. Thus, the study will not consider closeness centrality for social network comparison.<br> |
[[Image:clo.png|400px]]<br> | [[Image:clo.png|400px]]<br> | ||
'''Correlation Analysis'''<br> | '''Correlation Analysis'''<br> | ||
Line 256: | Line 256: | ||
{|style="width:100%;vertical-align:top;margin-top:20px;" | {|style="width:100%;vertical-align:top;margin-top:20px;" | ||
|- | |- | ||
− | |style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff> | + | |style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Feature Engineering + Survey </font></div><br/> |
Line 272: | Line 272: | ||
<br> | <br> | ||
− | <big>'''You may view the'''</big>[https://smusg.asia.qualtrics.com/jfe/form/SV_6eVxySZKg8NAW2N <font face ="Century Gothic" color="#00C5CD"><strong><i><big>survey here.</big></i></strong></font>] | + | <big>'''You may view the'''</big> [https://smusg.asia.qualtrics.com/jfe/form/SV_6eVxySZKg8NAW2N <font face ="Century Gothic" color="#00C5CD"><strong><i><big>survey here.</big></i></strong></font>] <br> |
+ | |||
+ | {|style="width:100%;vertical-align:top;margin-top:20px;" | ||
+ | |- | ||
+ | |style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Classification Modelling </font></div><br/> | ||
+ | |||
+ | '''Defining Target Labels''' | ||
+ | Work Network contains data for each employee relationship in the company that depicts the strength of their work-based relationship. To add business value, the following rules are applied to bin the target variable: | ||
+ | {| class="wikitable" | ||
+ | |+ | ||
+ | |- | ||
+ | |Survey Values | ||
+ | |Target Labels | ||
+ | |Count | ||
+ | |Percentage | ||
+ | |- | ||
+ | |0,1 | ||
+ | |No Relation | ||
+ | |137 | ||
+ | |58.33% | ||
+ | |- | ||
+ | |1.5, 2, 2.5 | ||
+ | |Weak Relation | ||
+ | |490 | ||
+ | |16.67% | ||
+ | |- | ||
+ | |3, 3.5, 4 | ||
+ | |Moderate Relation | ||
+ | |73 | ||
+ | |16.31% | ||
+ | |- | ||
+ | |4.5, 5 | ||
+ | |Strong Relation | ||
+ | |140 | ||
+ | |8.69% | ||
+ | |} | ||
+ | |||
+ | Many employees while filling the survey treated 1 as the minimum value (instead of blank/ no response). Therefore, both 0 and 1 are considered no relation. Since, 4 bins are defined, if the classification model can predict these categories well, email data will be a strong representation of work network. | ||
+ | Table above shows the distribution of the target label. The results are very skewed with 58.33% of the data instances belonging to ‘No Relation’ category. Since the data is skewed, a validation column is created to create test and train data using the stratified sampling technique. This ensures that the distribution of the target label category remains same in both target and test data points. <br> | ||
+ | |||
+ | '''Prediction Screening''' | ||
+ | Using SAS JMP’s in-built predictor screening, it is observed that some features are stronger predictors of the target variable compared to other calculated features. This observation is important for the application of penalty type under model fitting for classification algorithms such as Neural Network. |
Revision as of 15:20, 22 April 2017
Data Email Data
Cleaning Email Data
4. Removing unnecessary columns such as:
After the cleaning of data, there were 29,797 rows of data with no missing data instances.
Staff and Email Data Comparison
Cleaning survey data
|