Motivation And Project Overview
Human Resource Analytics is the idea of using data in the organizational context to understand different factors about employees such as their degree of collaboration and influence. Known researcher Rob Cross has also said “Organizational Network Analysis provides a powerful means of making invisible patterns of information flow and collaboration, visible.” These factors are generally computed based on various sets of data that are primarily collected via pulse surveys. The data collection process is slow because pulse surveys must be distributed at regular intervals to receive updated insights. However, this is not a viable option as it is not only a repetitive process but also makes it difficult for managers to view real-time insights.
This study explores and investigates whether subject lines and frequency of emails exchanged between employees can be used as a representative resource for analyzing organizational networks, specifically, the work network. We define work network as the network of employees with whom one interacts with on a daily basis for work purposes.
Objective
The objective of this study is to understand if subject lines and frequency of emails exchanged between employees is a representative resource for analyzing organizational networks, specifically, work network.
At the company, these factors are computed based on various sets of data that are primarily collected via pulse surveys. The survey data collection process is slow and makes it difficult for managers to view real-time insights. As an alternative, our team wants to check if email interactions can be used to compute these factors based on only email communication data. Through feature engineering, an unbiased email network is created which is compared against the work network derived from a survey.
Data
We are provided with an excel sheet containing a huge set of email exchange log via the TrustSphere domain. The data consists of 14 columns as described below:
Column Explanations
Date
|
Timestamp of the email
|
Remote IP
|
If the email exchange is external then this column shows the external person's email
|
Remote
|
The TrustSphere employee who is receiving or sending the email
|
Remote Domain
|
Always TrustSphere
|
Local
|
Email address of the person sending the email
|
Local Domain
|
Domain of the person who is sending the email
|
Originator
|
Inbound, outbound or internal (if you’re receiving the email, sending it or if the email is between 2 TrustSphere employees)
|
Direction
|
Always TrustSphere in this case
|
Domain Group
|
Email Header (Subject Line)
|
Subject
|
Type of message: email/im (instant messaging)/voice/sms
|
Inbound Count
|
Number of emails received
|
Outbound Count
|
Number of emails sent
|
Size
|
Size of the message (number of characters)
|
Msgid
|
Encoded Message ID
|
Data Statistics
Number of rows
|
121,154
|
Date Range
|
11/26/2016 8:00 am to 02/01/2017 00:00 am
|
Literature Review
Rob Cross and Karen Stephenson have studied organizational networks for a long time, and their findings and theories are essential to the objective of this study. Their discoveries are important to evaluate the patterns and outcomes found from this study of organizational email data networks.
Rob Cross has done extensive work on organizational network analysis looking for methods to improve collaboration between company units and find ways to break silos. Cross (2004) states – “Organizational Network Analysis (ONA) can provide an x-ray into the inner workings of an organization – a powerful means of making invisible patterns of information flow and collaboration in strategically important groups visible.”
According to Cross’s (2000) research, the level of collaboration can impact employee stress, the distribution of collaborative work within teams and more. He found that employees who are willing to help beyond their scope, gradually develop a resourceful reputation, and hence are included into projects of higher impact. However, Cross said, such employees eventually become bottlenecks as other employees become over-reliant and have no substantial progress without the facilitation of that one employee, or because that employee is overloaded with work and is unable to deliver equivalently.
Karen Stephenson, also known as “The Organization Woman”, recognized six main types of knowledge network in an organization –
- Daily network: whom we see day-to-day
- Wider social network: whom we actively stay in touch with
- Innovation network: with whom we test out new ideas
- Expert network: to whom we go for expertise and knowledge
- Strategic network: to whom we go for guidance and advice
- Learning network: who help us move from what we know to new knowledge and expertise
Her famous book - Quantum Theory of Trust (2006) – shows that manipulating networks of trust can control the flow of information in an organization due to the link between networks of trust and tacit knowledge. Her research aimed at observing communication patterns, identifying bottlenecks and silos and understanding team dysfunctions.
This study involves term analysis of employee email subject lines, which can be considered as short documents of text. A relevant paper by Mika Timonen (2013) on term weighting in short documents contains a variety of methods to determine the importance of terms and extracting keywords from a document. The most appropriate and relevant method to use for our study for term weighting is inverse document frequency (IDF).
Methodology
Click on METHODOLOGY for more details.
Conclusion
Email data is not a strong representative for Work Network when divided into multiple relationship strength segments. However, it is a good representation of Work Network when divided into only two relationship strength segments.
By just using email data to predict work relation strengths, companies can save large amounts of time, manpower and money spent on survey analysis to retrieve similar data. Further, it is possible to update the analysis is real time by changing the dates of the email data.
Limitation
The whole analysis has been done on email subject lines and distribution of the emails. However, the highest quality of information is present in the body of the emails. We didn’t have access to the email bodies which is the biggest limitation of the study. Future work should focus on deeper text mining on the email body to quantify a more accurate representation of work related emails. This could play a huge factor in differentiating work relationships between any two employees.
|
|
|
|
|
|
|