Difference between revisions of "APA Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
(Changed tab color)
Line 5: Line 5:
 
<font face="Century Gothic">
 
<font face="Century Gothic">
 
{| style="background-color:#FFFFFF; color:#66ffcc padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
 
{| style="background-color:#FFFFFF; color:#66ffcc padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; border-left:2px solid #66ffcc; background:#FFFFFF; text-align:center;" width="8%" |  
+
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; border-left:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="8%" |  
[https://wiki.smu.edu.sg/ANLY482/ANLY482_AY2016-17_T2_Group17 <font face ="Century Gothic" color="#66ffcc"><strong>HOME</strong></font>]
+
[https://wiki.smu.edu.sg/ANLY482/ANLY482_AY2016-17_T2_Group17 <font face ="Century Gothic" color="#000000"><strong>HOME</strong></font>]
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="15%" |   
+
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#ffffff; text-align:center;" width="15%" |   
[[APA_Project Overview|<font face ="Century Gothic" color="#000000"><strong> PROJECT OVERVIEW</strong></font>]]
+
[[APA_Project Overview|<font face ="Century Gothic" color="#66ffcc"><strong> PROJECT OVERVIEW</strong></font>]]
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="12%" |
 
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="12%" |

Revision as of 16:12, 25 February 2017

APA logo.png

HOME

 

PROJECT OVERVIEW

 

METHODOLOGY

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 

FEATURE ENGINEERING

 

OTHER PROJECTS

 
Motivation And Project Overview

People Analytics has been rated as the second-biggest overall capability gap in organizations by the Deloitte university press in 20151. Through people analytics, companies are able to find better hires, improve retention, and find more suitable leaders. This has a direct impact on direction of the organization and hence its growth. Our team has a great opportunity to delve into Social Network Analysis, a fast-growing research field in Analytics through this project.
In this project, our focus is to develop various metrics that would quantify the collaboration between employees, identify the most influential employees and give managers a high-level view of these statistics to maintain a collaborative and efficient workplace. Currently at the company, these metrics are computed based on various sets of data that are primarily collected via pulse surveys. The survey data collection process is slow and makes it difficult for managers to view real-time insights. As an alternative, our team would be computing these metrics based on only email communication data. Since the data is always present in the IT system, an automated data pipeline can be created to compute the metrics and view them on a custom dashboard. We would also be involved in feature engineering to create an unbiased email network before the calculation of metrics.
A primary metric that our team would explore and test for value is a hybrid centrality to calculate an influential score. We are exploring a new equation that combines various

Objective

  1. Perform Feature Engineering to create a new ‘Trust Score’ algorithm. A trust score is an aggregate weightage shows the strength of communication tie between two employees in a social network.
  2. Develop a dashboard that displays various metrics that would quantify the collaboration between employees, identify the most influential employees and give managers a high-level view of these statistics to maintain a collaborative and efficient workplace.
  3. Research and validate the potential of a Hybrid Centrality (potentially a combination of betweeness and degree) calculated from email communication data as a measure of influence score.

Data

We are provided with an excel sheet containing a huge set of email exchange log via the TrustSphere domain. The data consists of 14 columns as described below:

Column Explanations
Date Timestamp of the email
Remote IP If the email exchange is external then this column shows the external person's email
Remote The TrustSphere employee who is receiving or sending the email
Remote Domain Always TrustSphere
Local Email address of the person sending the email
Local Domain Domain of the person who is sending the email
Originator Inbound, outbound or internal (if you’re receiving the email, sending it or if the email is between 2 TrustSphere employees)
Direction Always TrustSphere in this case
Domain Group Email Header (Subject Line)
Subject Type of message: email/im (instant messaging)/voice/sms
Inbound Count Number of emails received
Outbound Count Number of emails sent
Size Size of the message (number of characters)
Msgid Encoded Message ID
Data Statistics
Number of rows 121,154
Date Range 11/26/2016 8:00 am to 02/01/2017 00:00 am


METHODOLOGY

Method1.JPG