Difference between revisions of "YSR Final Progress"

From Analytics Practicum
Jump to navigation Jump to search
Line 200: Line 200:
 
===Employee Influencer Rankings===
 
===Employee Influencer Rankings===
  
The final visualization was a graph comparing the influencer scores for the entire organization. Employees were mapped on a chart from 0 to 200 and broken down by department to distinguish employees with higher influencer scores and our so called gate keepers and highly connected individuals.  
+
The final visualization was a graph comparing the influencer scores for the entire organization. Employees were mapped on a chart from 0 to 200 and broken down by department to distinguish employees with higher influencer scores and our so called gate keepers and highly connected individuals.
  
 +
[[Image:Influencer_Ratings.png|center|500px]]
  
 
==<div style="background: #27AEC4; line-height: 0.3em; border-left: #27AEC4 solid 13px;"><div style="border-left: #45E98F solid 5px; padding:15px;"><font face ="Century Gothic" color= "white" size="5">Methodology</font></div></div>==
 
==<div style="background: #27AEC4; line-height: 0.3em; border-left: #27AEC4 solid 13px;"><div style="border-left: #45E98F solid 5px; padding:15px;"><font face ="Century Gothic" color= "white" size="5">Methodology</font></div></div>==

Revision as of 22:13, 17 April 2016


HOME

 

TEAM

 

PROJECT OVERVIEW

 

FINAL PROGRESS

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 



Introduction

TrustSphere is the widely recognized market leader in Relationship Analytics. TrustSphere enables forward thinking organizations to unlock the inherent value of their own networks using their proprietary next generation technology. TrustSphere’s unique selling point lies in its ability to calculate relationship strengths between two people based simply on their communication flows. The solutions provide real-time intelligence and insights which help clients across the globe improve salesforce effectiveness, enterprise-wide sales insights and corporate governance. Their offerings currently are limited to Sales and Risk. Currently, TrustSphere is developing new technology that delivers insights into an organization's’ most valuable asset, its people. To set the ball rolling, TrustSphere engaged our team, YSR, to help deliver preliminary insights on its own employees. With our insights and the respective methodologies used to deduce these insights, TrustSphere can move forward in the construction of their own product offering.

The complexity of this project lay in two areas: The insights we were looking to deliver had to be matched with contextual social science in order for them to be relevant and significant. We had severe limitations in the availability of data: we were provided with only communication metadata.

Motivation: The Importance of People Analytics

Understanding and managing the core of an organization - its employees - has become imperative to business decision-makers across the world. In a global survey conducted by PwC, 34% of of CEO’s are “extremely concerned” about the management of talent. Workforce diversity has stalled1, high turnover trends are continuing, labor productivity has reduced since last year and high potential promotion numbers have dropped.

Key decision makers have begun to increase investment into HR technologies to help combat these worrying trends in the workplace. HR costs per employee have seen an increase from $1,610 per employee in 2011 to $2,015 in 2015.

Our Approach

Our understanding of employee structures and the data provided helped define our approach to this project. Relationships and networks created and maintained by employees, are a rich source of data which can be used to support better HR decision making. Until now, this data has been invisible to stakeholders who could benefit most.

By performing SNA and other exploratory data analysis, we seek to enable TrustSphere’s Human Resource department’s decision-making via insights generated from employee communications, relationships and networks.


Literature Review

As the realm of People Analytics is a relatively new space, there are few organisations delving into understanding relationships with the use of data generated insights. Additionally, the firms are limited in terms of the three V’s (Volume, Variety and Velocity) of the data and the insights it offers. A report published by PWC on the Trends in People Analytics states that advancement in People Analytics is on the rise with the emergence of a refined data governance approach, the building of comparison data into analytical tools and also the use of predictive data to take action. What is lacking is integrated and comprehensive data collection methods in organisations to provide for a robust analysis. There are few organisations that are acting upon their findings to close the gaps by addressing root causes, deciding to deal with employees on an individual level, integrating the results to existing technologies and updating the data frequently. In the pursuit of creating flatter organisational structures to promote efficiency and flexibility, there has been an increase in informal employee networks rather than formal structures. Hence, the authors concede that there is a greater need for employee social network analysis. It reveals the level of engagement of key employees that can affect the morale and productivity of the environment, either positively or negatively.

An article published by McKinsey urges organisations to tap on the power of influencers to better harness their positive traits and increase the odds of success for the organisation. An informal influencer can be broadly defined as someone who is respected and approached for advice or information. The use of influencers in a work environment could prove to be useful for implementing changes within the organisation, right from planning and direction, to execution.

Further research revealed that until recently, it had been hard to test whether the performance of organisations could depend on information flow between employees. Sources such as e-mail logs, web-based calendars and online work collaborations make it possible to track information flows within firms and assess performance of teams, departments and the entire organisation as a whole. However, the question remained as to what could be done with the limited data that we were able to obtain from our client. Duncan Watts and his team had a more comprehensive data set, with employee surveys to supplement their e-mail data and organisation charts. It is important to note that e-mail data allows for more real time and timely feedback, compared to employee surveys. While our data was lacking in terms of employee feedback, we set to define our research parameters based on what we had access to, and with the takeaway that “computational social science works well only when powerful computation is matched with careful social science”. There are a number of key centrality measures that can be gathered from Social Network Analysis data to help us better understand interactions. The following are a few of them:

  • Degree

Degree is the simplest of the node centrality measures by using the local structure around nodes only. In a binary network, the degree is the number of ties a node has. In a directed network, a node may have a different number of outgoing and incoming ties, and therefore, degree is split into outdegree and indegree, respectively.

  • Betweenness centrality

Betweenness centrality is an indicator of a node's centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node. A node with high betweenness has great influence over what flows -- and does not -- in the network.

  • Closeness

Closeness is defined as the inverse of farness, which in turn, is the sum of distances to all other nodes (Freeman, 1978). The intent behind this measure was to identify the nodes which could reach others quickly. A main limitation of closeness is the lack of applicability to networks with disconnected components: two nodes that belong to different components do not have a finite distance between them. Thus, closeness is generally restricted to nodes within the largest component of a network.

  • Eigen Vector centrality

A measure of how connected an individual is to those who are highly connected.

Methodology - Iteration 1

Initial Data Cube and Data cleaning

Initial dataset given to us was pulled from the TrustSphere Outlook server. It contained 890,000 email exchanges and 13 variables, namely:

  1. Date: Timestamp of when it was sent
  2. IP Address: IP Address used
  3. Local: Server used for external communications
  4. Remote: Receiver email address
  5. Local Domain: Sender email address
  6. Remote Domain: Nature - Internal or External
  7. Originator: Too many missing values to make any sense of the column
  8. Direction: Too many missing values to make any sense of the column
  9. Domain Group: Too many missing values to make any sense of the column
  10. Inbound Count: Too many missing values to make any sense of the column
  11. Outbound Count: Too many missing values to make any sense of the column
  12. Size: Size of the email, but irrelevant to analysis
  13. Msg ID: Too many missing values to make any sense of the column

The data included both internal and external communications denoted by variable “Remote Domain”. We chose to move forward with only internal interactions because getting context for external data would be extremely difficult and was removed from our scope. We then went on and imputed variables 6 to 13 because they had too many missing values. We also deleted IP Address and Local because they would not add any value to the analysis done.

Additional Data

We now requested our client to give us a Staff-List to populate Sender/Receiver positions and Department and finally ended up with the following:

  1. Date: Timestamp of when it was sent
  2. Remote (Renamed to Receiver): Receiver email address
  3. Local Domain (Renamed to Sender): Sender email address
  4. Sender FName: first name extracted from email address
  5. Receiver FName: first name extracted from email address
  6. Sender Position
  7. Receiver Position
  8. Sender Group
  9. Receiver Group

Social Network Analysis

To analyze social networks, we considered using the following tools: R (library: igraph):​ Used to create routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more. We considered using this because our analysis requirements fit well into what igraph has to offer. Alteryx​ ​(Predictive Analysis using R)​: The network analysis tool provides a way to visually interact with all kinds of data. It requires the above discussed Nodes and Edges tables to do the analysis using a workflow UI. Alteryx also gives you access to key network data such as ​betweenness, degree ​and ​closeness.​ This data can further be used on software such as ​JMP Pro​ and ​Tableau​ to run regressions to predict how employees behave or will behave based on their network statistics. Gephi​: We used Gephi towards the end of our project to visualize our social networks. This tool too requires nodes and edges tables similar to Alteryx.

Use Of Alteryx

Alteryx workflow.png

We used the Alteryx Network Analysis tool to generate our preliminary Social Network Graph and to output key network statistics measures like Betweenness, Degree Centrality and Closeness Centrality. Alteryx helped us extract key network statistics that were used in further analysis. The data required was in the form of two separate nodes and edges lists.

Alteryx SNA.png

We found that the graph produced was insightful but the real value of the tool to us were the centrality measures.

We then worked on these network statistics to compute what we called our influencer score: The following formula was used to scale Betweenness and Degree:

                                       (new max - new min)
 Scaled Value  =      --------------------------------------------------
                      (old max - old min) x (value - old max) + (new max)

These scaled values (on scales of 0 - 100) were then summed up to calculate our Influencer Score on a total scale of 200.

Methodology - Iteration 2

Iteration 1 was limited to static analysis of communication flows within the organization. Even though this was in line with our client’s requirements, we realized that bar graphs and scatter plots were just not enough and that we could do more. We took on our supervisors feedback head on and went on to create an interactive dashboard from scratch using d3.js and other supporting libraries.

Additional Data

The next step was to calculate the following metrics, namely:

Relationship score: Based on an internally calculated metric by our client, the relationship score is weighted based on:

  1. Volume of email exchange
  2. Age of relationship
  3. Response reciprocity
  4. Frequency of exchange
  5. Recency of last communication

Influencer Score: Calculated as a sum of scaled betweenness and degree of each employee in the network on a scale of 0 - 200.

Designing Visualizations

At this point we delved into the wonderful world of D3.js. We carefully considered which charts, graphs and diagrams would help us best represent the employee network.

The Chord Diagram

A] Rationale:

We chose to use a Chord Diagram to visualize our employee email data. A chord diagram is a graphical method of displaying the inter-relationships between data in a matrix. The data is arranged radially around a circle with the relationships between the points typically drawn as arcs connecting the data together. A chord diagram is the most aesthetically pleasing way of visualizing directed relations among entities in a space constrained area.

B] Data, Process & Interactivity:

The data required for our chord was a square matrix denoting from and to number of emails along with an edges list giving information about each connection like labels and total number of emails. Ancillary information (on clicking a name) was displayed on a card on the right highlighting things like closest contact, influencer score and position in the company.


The employees are coloured by department which the legend at the bottom helps identify. A hover over an employee's name shows the number of emails sent over the time period of the data. Hovering over the arcs will give you the number of emails sent and received from the employee that arc represents.

Data-Linked Bar Chart and Chord Diagram

A] Rationale:

This visualization was chosen to perform a deeper analysis of within department interactions. The bar chart shows the different departments in the organization and the breakdown of number of emails sent by each department.


B] Data, Process & Interactivity:

Once a bar is clicked, a drill down is activated to show the employees in that department and the number of emails sent by each person. The chord also changes accordingly to help the user understand exactly the interaction in each department.

The data required for this visualization was in the form of a key value file where departments served as keys and members as values.


Social Network Graph

A Social network using Force Atlas 2 layout was chosen to visualize betweenness and closeness amongst our employees. The data required for this was a nodes and edges list as input to Gephi, a social network visualization tool. The resulting graph was exported as a .gexf file and fed as input to SigmaJS, a javascript library dedicated to graph drawing on the web. The resulting graph showed employees with departments as colours and distance as the closeness centrality.


Bullet Graph

This graph was made in order to interactively visualize employee relationship scores across different Levels in the organization. We created levels for departments and coded them as follows:

A: C - Suite B: Directors C: Managers D: Associates E: Interns F: Admin

We then mapped each employee to their respective receivers broken down by level and gender. Through this, you get an overview of the employee's performance in relation to the others ticked in the filter, to the constant value 30, a good relationship score set by the client and 60% and 80% marks of 30.

We then went on to notice that males generally have higher average relationship scores than females and did an analysis of variance to confirm significance. Results of which will be discussed in the next section.


Employee Influencer Rankings

The final visualization was a graph comparing the influencer scores for the entire organization. Employees were mapped on a chart from 0 to 200 and broken down by department to distinguish employees with higher influencer scores and our so called gate keepers and highly connected individuals.

Influencer Ratings.png

Methodology

Over the course of the last few weeks, our client expressed interest in generating actionable insights as opposed to creating dashboards. His rationale lay in the fact that interactive dashboards have not been the best tool for creating action in the realm of people analytics. Hence, our methodology has taken a shift as we have started focusing on conducting in-depth Social Network Analysis to identify insights that could potentially help the client in developing his product further.

We conducted secondary research to learn more about terms and processes involved in social network analysis and other advancements in the field. Through our research, we identified certain core measures, such as:


We have been able to generate these values for all the subjects in our network using Alteryx. Our next step would involve running further analyses on JMP PRO and Tableau to find actionable insights for the client. Our final step would be to display our results using Gephi, to visualise the social network and highlight our findings.


References

http://www.mckinsey.com/insights/organization/power_to_the_new_people_analyticsa

https://en.wikipedia.org/wiki/People_analytics

https://www.crunchbase.com/organization/trustsphere#/entity

https://en.wikipedia.org/wiki/Betweenness_centrality

http://toreopsahl.com/tnet/weighted-networks/node-centrality/