YSR Project Overview
Contents
Scope
Access to TrustSphere’s datasets will allow the team to build a system from scratch using previously unused raw data to better understand turnover and attrition rules.
The minimum research points we would like to address:
- Understand the number of relationships an employee will have at different periods of time in his or her working life
- Measure the speed of growth at which employee relationships grow in a company
- Correlations between the sizes of internal and external relationships employees have
- Through social network analysis, calculate the likelihood of an employee in an informal group leaving a company upon the exit of another closely tied employee
- Identification of metrics that can help predict the likelihood of an employee leaving
It is important to note that the scope of this project is fluid and can be furthered to address additional questions TrustSphere might have regarding the dataset.
Motivation and Objectives
In addition to our initial scope of research, we increased our scope to encompass the following: a) New Employee immersion: Onboarding, also known as organizational socialization, refers to the mechanism through which new employees acquire the necessary knowledge, skills, and behaviors to become effective organizational members and insiders. Effective onboarding allows an employee to better integrate into a company thereby transforming them into an asset faster. A key metric to understand organizational immersion is the number of internal relationships the employee has made at different points of time. By benchmarking the average speed of internal relationship growth, a company can assess the effectiveness of their onboarding programs on particular employees. b) Influencer identification for adoption of new enterprise level initiatives: with the launch of new enterprise level initiatives, one of the key concerns that arises is employee level adoption of these strategic changes3. Identifying key influencers in an organization and enrolling them as champions for enterprise level change is one way to increase adoption. Through SNA, these nodes of activity can easily be discovered. c) Contextual employee performance levels: The number-one predictive element of an individual’s success in an organization is the number, the quality, and the depth of social capital—the personal relationships among those that they do business with. By creating metrics (through insights gained from SNA), a geographic and department-level system average can be created to understand employees that are underperforming or overperforming. d) Levels of collaboration: Organisations that encourage employee productivity through collaboration across networks rather than simple individual task completion will require to actively monitor collaboration silos in an organization.
Data
The dataset given to us was pulled from the outlook (mail server used at TrustSphere) database. The data basically is an exchange of emails. The dataset consists of 890,000 email exchanges and 13 variables out of which we found the following 8 to be relevant:
- Date: includes the date and time of a particular email being sent
- Originator: identifies the originator of an email thread
- Direction: indicates the direction of the email sent
- Domain Group: identifies the company to which an email address belongs to
- Inbound Count: number of emails being received from a particular address
- Outbound Count: number of emails being sent to a particular address
- Size: the size of email in bytes
- MsgID: unique identifier of a particular email thread
Review of Previous Work
Our preliminary course of action had us importing the cleaned communication log data file into Tableau. The following describes the areas covered in our initial analysis:
a) Understanding the number of active employees in the organization on a monthly basis
We defined an active employee as one attending work for that particular month. It is important to note that an employee on leave for that month would be counted as a non-active employee. We assumed that an active employee would send at least 1 email per month. The insights gained from these graphs will allow managers to understand how to increase efficiency (For eg. Optimal planning of work based on trends behind workforce participation across different months over a year).
This was done on a temporal axis of months from May, 2014 to Dec, 2015. The term Active is used to describe email activity in specific as that is what is relevant in finding social ties based on our hypothesis that higher number of emails signifies a stronger network in an organization.
As we see below, there is a steady increase from somewhere around 40 all the way up to 45 in a particular month.
b) Mapping the absolute number of emails sent per person per month through the use of a Heatmap
The absolute volume of emails sent and received by an employee per month is one of many indicators of the workload of an employee. For illustration purposes, we used a heatmap to describe workload per month for each employee. This information can help:
- Managers allocate work across employees in a single department better
- Understand responsiveness of employees to collaboration - an employee that receives in excess to what he or she sends could be assumed to not collaborate well. However, these insights will only flag potential issues for a manager - to get down to the root cause, more information will be needed.
c) Measuring growth or decline of the size of an employee’s network
The number of relationships an employee has at different points of time can provide an array of insights for a manager. We have assumed that a single communication flow (either forward or backward) will constitute a relationship in an organization. For the purpose of this project we focus on two insights:
- For a new employee, the number of relationships at different points of time is a suitable estimate for his or her immersion into the organization. A slow growth or decline of a new employees network could indicate a failure to integrate well into a company.
- Number of relationships can be used as a base for comparison of employees on the same hierarchical level. For example, Sahil (a marketing executive with 50+ relationships) can be said to be doing better compared to Ananya (a marketing executive with 30+ relationships).
d) Understanding all-organization network performers
Through an initial look at our client organization’s network centrality measures for the overall organization, we understood the following insights:
- Betweenness, a measure of influence in the organization, indicated that the client company had three highly influential individuals. These three individuals were C-Suite level employees. According to research, “perceived power distance” between executives at the top of the organization and employees at the front lines are reasons behind poor adoption of corporate level initiatives.
- Degree, an overall measure of number of relationships employees have, proved a majority of the employees are well immersed into the organization.
Methodology
Over the course of the last few weeks, our client expressed interest in generating actionable insights as opposed to creating dashboards. His rationale lay in the fact that interactive dashboards have not been the best tool for creating action in the realm of people analytics. Hence, our methodology has taken a shift as we have started focusing on conducting in-depth Social Network Analysis to identify insights that could potentially help the client in developing his product further.
We conducted secondary research to learn more about terms and processes involved in social network analysis and other advancements in the field. Through our research, we identified certain core measures, such as:
- Degree
Degree is the simplest of the node centrality measures by using the local structure around nodes only. In a binary network, the degree is the number of ties a node has. In a directed network, a node may have a different number of outgoing and incoming ties, and therefore, degree is split into outdegree and indegree, respectively.
- Betweenness centrality
Betweenness centrality is an indicator of a node's centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node. A node with high betweenness has great influence over what flows -- and does not -- in the network.
- Closeness
Closeness is defined as the inverse of farness, which in turn, is the sum of distances to all other nodes (Freeman, 1978). The intent behind this measure was to identify the nodes which could reach others quickly. A main limitation of closeness is the lack of applicability to networks with disconnected components: two nodes that belong to different components do not have a finite distance between them. Thus, closeness is generally restricted to nodes within the largest component of a network.
We have been able to generate these values for all the subjects in our network using Alteryx. Our next step would involve running further analyses on JMP PRO and Tableau to find actionable insights for the client. Our final step would be to display our results using Gephi, to visualise the social network and highlight our findings.
Data Preparation for Social Network Analysis
Following from our methodology, the next logical step is to try and manipulate the data to visualize and analyze the social networks that exist through these email exchanges.
A quick search online showed us that most of the social network analysis tools available require your data to be in either of these forms:
- An adjacency matrix: Receiver Names become column headers and the first value of every subsequent row is the Sender Name forming a square matrix. A snapshot of our adjacency matrix can be seen in Figure x. As you will see, Names in the column headers are Receivers and row headers are the Senders. For example, Alistair in row 3 has sent a total of 648 emails to Adesh.
- Nodes and Edges lists: Some tools required us to use nodes and edges.
-A Nodes list is one that defines all the nodes (Employee Names in our case) in the network.
-An Edges list is one that defines the connections (Emails sent from and to in our case) in the network.
To analyze social networks, we considered using the following tools:
- R (library: igraph): Used to create routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more. We considered using this because our analysis requirements fit well into what igraph has to offer.
- Alteryx (Predictive Analysis using R): The network analysis tool provides a way to visually interact with all kinds of data. It requires the above discussed Nodes and Edges tables to do the analysis using a workflow UI. Alteryx also gives you access to key network data such as betweenness, degree and closeness. This data can further be used on software such as JMP Pro and Tableau to run regressions to predict how employees behave or will behave based on their network statistics.
- Gephi: We will use Gephi towards the end of our project to visualize our social networks. This tool too requires nodes and edges tables similar to Alteryx.
Looking Ahead
In the coming weeks we look to further immerse ourselves in deeper analysis of our clients social network. We look to add a further layer of context by comparing group (geographical, hierarchical and departmental) metrics to understand performance levels across different boards.
Our current analysis has been limited to communication flows within the organization. This limited view is in line with our client’s requirements of us. However, in the coming weeks we look to include external communication flows into our analysis. It is important to note that the analysis we look to run with external networks will be different to the analysis run previously. Coupled with research we hope to concrete some of the key metrics that are flagged when an employee leaves an organization.
References
http://www.mckinsey.com/insights/organization/power_to_the_new_people_analyticsa
https://en.wikipedia.org/wiki/People_analytics
https://www.crunchbase.com/organization/trustsphere#/entity
https://en.wikipedia.org/wiki/Betweenness_centrality
http://toreopsahl.com/tnet/weighted-networks/node-centrality/