Difference between revisions of "APA Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
 
(24 intermediate revisions by 3 users not shown)
Line 5: Line 5:
 
<font face="Century Gothic">
 
<font face="Century Gothic">
 
{| style="background-color:#FFFFFF; color:#66ffcc padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
 
{| style="background-color:#FFFFFF; color:#66ffcc padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; border-left:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="20%" |  
+
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; border-left:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="8%" |  
 
[https://wiki.smu.edu.sg/ANLY482/ANLY482_AY2016-17_T2_Group17 <font face ="Century Gothic" color="#000000"><strong>HOME</strong></font>]
 
[https://wiki.smu.edu.sg/ANLY482/ANLY482_AY2016-17_T2_Group17 <font face ="Century Gothic" color="#000000"><strong>HOME</strong></font>]
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#FFFFFF; text-align:center;" width="20%" |   
+
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#ffffff; text-align:center;" width="15%" |   
 
[[APA_Project Overview|<font face ="Century Gothic" color="#66ffcc"><strong> PROJECT OVERVIEW</strong></font>]]
 
[[APA_Project Overview|<font face ="Century Gothic" color="#66ffcc"><strong> PROJECT OVERVIEW</strong></font>]]
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="20%" |
+
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="12%" |
[[APA_Final_Progress|<font face ="Century Gothic" color="#000000"><strong> FINAL PROGRESS</strong></font>]]
+
[[APA_Final_Progress|<font face ="Century Gothic" color="#000000"><strong> METHODOLOGY</strong></font>]]
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="20%" |
+
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="18%" |
[[APA_Project_Management|<font  face ="Century Gothic" color="#000000"><strong>PROJECT MANAGEMENT </strong></font>]]
+
[[APA_Feature Engineering|<font  face ="Century Gothic" color="#000000"><strong> FEATURE ENGINEERING</strong></font>]]
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="20%" |
+
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:4px solid #66ffcc; border-top:4px solid #66ffcc; background:#66ffcc; text-align:center;" width="18%" |
 +
[[APA_Project_Management|<font  face ="Century Gothic" color="#000000"><strong>CLASSIFICATION MODELLING </strong></font>]]
 +
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 +
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="12%" |
 
[[APA_Documentation|<font  face ="Century Gothic" color="#000000"><strong> DOCUMENTATION</strong></font>]]
 
[[APA_Documentation|<font  face ="Century Gothic" color="#000000"><strong> DOCUMENTATION</strong></font>]]
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc; text-align:center;" width="20%" |
+
| style="padding:0.3em; font-family:Helvetica; font-size:120%; border-bottom:2px solid #45c198; border-top:2px solid #45c198; background:#66ffcc; text-align:center;" width="17%" |
 +
[[ANLY482_AY2016-17_Term_2|<font  face ="Century Gothic" color="#000000"><strong>OTHER PROJECTS</strong></font>]]
 +
| style="border-bottom:2px solid #66ffcc; border-top:2px solid #66ffcc; background:#66ffcc;" width="1%" | &nbsp;
 +
| style="font-family:Helvetica; font-size:120%; border-bottom:4px solid #45c198; border-top:4px solid #45c198; background:#66ffcc; text-align:center;" width="20%" |
 
|}
 
|}
  
 
{|style="width:100%;vertical-align:top;margin-top:20px;"
 
{|style="width:100%;vertical-align:top;margin-top:20px;"
 
|-
 
|-
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Motivation</font></div><br/>
+
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Motivation And Project Overview</font></div><br/>
 
<p>
 
<p>
People Analytics has been rated as the second-biggest overall capability gap in organizations by the Deloitte university press. Through people analytics, companies are able to find better hires, improve retention, and find more suitable leaders. This has a direct impact on direction of the organization and hence its growth. In this project, we will keep a focus on four main categories for the analysis, answering several questions and developing several metrics under these categories. Given below are the four categories, with some questions we are aim to answer. Over the course of the project, we will include more relevant questions in the project scope. Thus, the business leaders are interested in keeping a check on this area of the firm and ask various questions such as: <br>
+
Human Resource Analytics is the idea of using data in the organizational context to understand different factors about employees such as their degree of collaboration and influence. Known researcher Rob Cross has also said “Organizational Network Analysis provides a powerful means of making invisible patterns of information flow and collaboration, visible.” These factors are generally computed based on various sets of data that are primarily collected via pulse surveys. The data collection process is slow because pulse surveys must be distributed at regular intervals to receive updated insights. However, this is not a viable option as it is not only a repetitive process but also makes it difficult for managers to view real-time insights.  
<b> Network Strengths </b> <br>
+
<br>
* Find the number of relationships internally and externally (distilled by strength and date – later dates indicate there has been recent communication) of all employees. This insight will provide an understanding of which employees, departments and locations are best at building and nurturing a large number of relationships internally and externally.
+
This study explores and investigates whether subject lines and frequency of emails exchanged between employees can be used as a representative resource for analyzing organizational networks, specifically, the work network. We define work network as the network of employees with whom one interacts with on a daily basis for work purposes.  
 
 
<b> Influence </b> <br>
 
* Identify top 10 employees who influence information flow within the organization. This insight can help identify agents for change as well as pinpoint employees that are overly relied on in the organization’s structure.
 
* Find the social networks of junior employees with colleagues in managerial positions. This insight will give an idea of which employee is potential turned to for advice or trusted issues by managers.  
 
 
 
<b> Collaboration </b> <br>
 
* Interaction within and between departments/employees, geographies. It will also highlight individual employees that collaborate well within the organization.
 
* Quantify the value an employee has on the internal and external network. It can be used to preempt the impact a departing employee will have on the network (the number of relationships that will be lost as a result of the employee leaving).
 
* Quantify a manager’s effectiveness at building relationships with his or her team and the whole organization. This behavior then can be benchmarked against ideal performers in the organization.
 
* Identify potential leaders within the organization. Research  has shown that employees that build strong relationships with an organization’s various departments often possess leadership potential.
 
* Find employees who like to work in silos.
 
 
 
<b> Email Analysis </b> <br>
 
* Find the average number of emails sent and received by an employee on a daily or weekly basis. How connected are these employees? Calculate a departmental average to understand collaboration between departments as well.
 
Answers for such questions usually rely more on qualitative observations as not many standard metrics can be directly used to track these details. Due to its general derivation, the results are often considered unreliable and hence non-actionable. The main motivation of our project is to assist TrustSphere to derive reliable and actionable insights with our data-driven approach.
 
 
 
 
</p>
 
</p>
  
Line 52: Line 42:
 
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Objective</font></div><br/>
 
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Objective</font></div><br/>
 
<p>
 
<p>
The primary objective of our project is to use social network analysis theories to create a hybrid centrality scoring method, along with other metrics to assess networks, influence, collaboration and email activity of employees. Additionally, we will build a comprehensive dashboard to provide TrustSphere with a clear and structured platform to view the generated metrics.  
+
The objective of this study is to understand if subject lines and frequency of emails exchanged between employees is a representative resource for analyzing organizational networks, specifically, work network.
 +
At the company, these factors are computed based on various sets of data that are primarily collected via pulse surveys. The survey data collection process is slow and makes it difficult for managers to view real-time insights. As an alternative, our team wants to check if email interactions can be used to compute these factors based on only email communication data. Through feature engineering, an unbiased email network is created which is compared against the work network derived from a survey.
 
</p>
 
</p>
  
Line 59: Line 50:
 
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Data</font></div><br/>
 
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Data</font></div><br/>
 
<p>
 
<p>
We are provided with an excel sheet containing a huge set of email exchange log via the TrustSphere domain. The data provided is clean (Screenshot of the data is shown below).<br>
+
We are provided with an excel sheet containing a huge set of email exchange log via the TrustSphere domain. The data consists of 14 columns as described below:
We will be collecting more data through a survey sent out to all employees of TrustSphere.
+
{| class="wikitable"
 +
|+Column Explanations
 +
|-
 +
|Date
 +
|Timestamp of the email
 +
|-
 +
|Remote IP
 +
|If the email exchange is external then this column shows the external person's email
 +
|-
 +
|Remote
 +
|The TrustSphere employee who is receiving or sending the email
 +
|-
 +
|Remote Domain
 +
|Always TrustSphere
 +
|-
 +
|Local
 +
|Email address of the person sending the email
 +
|-
 +
|Local Domain
 +
|Domain of the person who is sending the email
 +
|-
 +
|Originator
 +
|Inbound, outbound or internal (if you’re receiving the email, sending it or if the email is between 2 TrustSphere employees)
 +
|-
 +
|Direction
 +
|Always TrustSphere in this case
 +
|-
 +
|Domain Group
 +
|Email Header (Subject Line)
 +
|-
 +
|Subject
 +
|Type of message: email/im (instant messaging)/voice/sms
 +
|-
 +
|Inbound Count
 +
|Number of emails received
 +
|-
 +
|Outbound Count
 +
|Number of emails sent
 +
|-
 +
|Size
 +
|Size of the message (number of characters)
 +
|-
 +
|Msgid
 +
|Encoded Message ID
 +
|}
  
<br><br>
+
{| class="wikitable"
[[Image:Data sample.png|800px]]
+
|+Data Statistics
<br><br>
+
|-
 
+
|'''Number of rows'''
The data consists of the following attributes: <br>
+
|121,154
&emsp; 1. <b> Date</b>: date the email was sent/received <br>
+
|-
&emsp; 2. <b>Originator address</b>: email address of the sender <br>
+
|'''Date Range'''
&emsp; 3. <b>Recipient address</b>: email address of the recipient <br>
+
|11/26/2016 8:00 am to 02/01/2017 00:00 am
&emsp; 4. <b>Direction</b>: <br>
+
|}
&emsp; &emsp; a. ‘Inbound’ – email received by an employee of TrustSphere from an external sender <br>
+
<br>
&emsp; &emsp; b. ‘Outbound’ – email sent by an employee of TrustSphere to external recipient <br>
 
&emsp; &emsp; c. ‘Internal’ – email exchanged within TrustSphere employees <br>
 
&emsp; 5. <b>Type</b>: <br>
 
&emsp; &emsp; a. ‘em’ – message sent via email <br>
 
&emsp; &emsp; b. ‘im’ – message sent via instant messaging <br>
 
&emsp; 6. <b>Size</b>: number of characters in the message <br>
 
&emsp; 7. <b>Msg ID</b>: unique ID given to every emailing chain <br>
 
&emsp; 8. <b>Email Subject</b>: subject of the email <br>
 
 
</p>
 
</p>
  
 
{|style="width:100%;vertical-align:top;margin-top:20px;"
 
{|style="width:100%;vertical-align:top;margin-top:20px;"
 
|-
 
|-
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>METHODOLOGY</font></div><br/>
+
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Literature Review</font></div><br/>
 
<p>
 
<p>
 
+
Rob Cross and Karen Stephenson have studied organizational networks for a long time, and their findings and theories are essential to the objective of this study. Their discoveries are important to evaluate the patterns and outcomes found from this study of organizational email data networks.
We plan to take a social network analysis (SNA) approach to analyze the data since the goal is to analyze different attributes (preference to work in silos, importance, popularity) of the actors (employees) and the relationship between them (collaboration via email). This view of the data makes it an ideal social networks data when each email address would represent a node and every email would be the relationship between both the nodes.
+
<br><br>
 +
Rob Cross has done extensive work on organizational network analysis looking for methods to improve collaboration between company units and find ways to break silos. Cross (2004) states – “Organizational Network Analysis (ONA) can provide an x-ray into the inner workings of an organization – a powerful means of making invisible patterns of information flow and collaboration in strategically important groups visible.
 
<br><br>
 
<br><br>
<center>
+
According to Cross’s (2000) research, the level of collaboration can impact employee stress, the distribution of collaborative work within teams and more. He found that employees who are willing to help beyond their scope, gradually develop a resourceful reputation, and hence are included into projects of higher impact. However, Cross said, such employees eventually become bottlenecks as other employees become over-reliant and have no substantial progress without the facilitation of that one employee, or because that employee is overloaded with work and is unable to deliver equivalently.  
[[Image:SN diagram.png|300px]]
 
</center>
 
 
<br><br>
 
<br><br>
There are various measures of SNA proposed over the years that help determine the role and importance of a node in the network. The following are a few examples: <br><br>
+
Karen Stephenson, also known as “The Organization Woman”, recognized six main types of knowledge network in an organization –
• Degree centrality <br>
+
<ul><li>Daily network: whom we see day-to-day</li>
• Closeness centrality <br>
+
<li>Wider social network: whom we actively stay in touch with</li>
• Betweenness centrality <br>
+
<li>Innovation network: with whom we test out new ideas</li>
• Eigenvector centrality<br>
+
<li>Expert network: to whom we go for expertise and knowledge</li>
 +
<li>Strategic network: to whom we go for guidance and advice</li>
 +
<li>Learning network: who help us move from what we know to new knowledge and expertise </li></ul>
 +
<br>
 +
Her famous book - Quantum Theory of Trust (2006) – shows that manipulating networks of trust can control the flow of information in an organization due to the link between networks of trust and tacit knowledge. Her research aimed at observing communication patterns, identifying bottlenecks and silos and understanding team dysfunctions.
 
<br>
 
<br>
Our first step would be to explore the network using these centralities, identifying hubs, brokers and groups as well as delving into other SNA concepts discussed by various academic researchers. We would be using softwares such as Gephi, and modelling packages from R such as weighted network packages for these analysis.<br><br>
+
<br>
Our goal is to come up with our own hybrid centrality score that would quantify an overall importance of each node in the network. Using the insights from step one, we will be creating multiple surveys (source of additional data) for the employees at TrustSphere to find influences (for the hybrid centrality). During the process, we will be referencing to the work of Karen Stephenson and Rob Cross, both of whom specialize in the field of organizational social networks.<br><br>
+
This study involves term analysis of employee email subject lines, which can be considered as short documents of text. A relevant paper by Mika Timonen (2013) on term weighting in short documents contains a variety of methods to determine the importance of terms and extracting keywords from a document. The most appropriate and relevant method to use for our study for term weighting is inverse document frequency (IDF).
In the end, along with the hybrid centrality score algorithm, we will be delivering a comprehensive dynamic dashboard visualizing the most relevant measures that we identify during our project. <br>
 
 
</p>
 
</p>
  
 
{|style="width:100%;vertical-align:top;margin-top:20px;"
 
{|style="width:100%;vertical-align:top;margin-top:20px;"
 
|-
 
|-
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>SCOPE OF WORK</font></div><br/>
+
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Methodology</font></div><br/>
 +
 
 
<p>
 
<p>
1. Create a hybrid centrality score as an overall comprehensive measure of the network <br>
+
<center>
2. Identify Silos<br>
+
Click on [[APA_Final_Progress|<font face ="Century Gothic" color="#00C5CD"><strong><i>METHODOLOGY</i></strong></font>]] for more details.<br><br>
3. Assess Collaboration<br>
+
[[File:MethodNew.jpg|800px]]
&emsp; a. Within departments<br>
+
</center>
&emsp; b. Within different geographical regions<br>
 
&emsp; c. Within projects<br>
 
4. Assess Influence, Network Strength and Email collaboration <br>
 
5. Develop a dynamic dashboard to visualize relevant measures<br>
 
6. The Scope is fluid and will become more specific as the project progresses <br>
 
 
</p>
 
</p>
 +
 +
{|style="width:100%;vertical-align:top;margin-top:20px;"
 +
|-
 +
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Conclusion</font></div><br/>
 +
 +
Email data is not a strong representative for Work Network when divided into multiple relationship strength segments. However, it is a good representation of Work Network when divided into only two relationship strength segments.
 +
By just using email data to predict work relation strengths, companies can save large amounts of time, manpower and money spent on survey analysis to retrieve similar data. Further, it is possible to update the analysis is real time by changing the dates of the email data.
 +
<br>
 +
 +
{|style="width:100%;vertical-align:top;margin-top:20px;"
 +
|-
 +
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Limitation</font></div><br/>
 +
 +
The whole analysis has been done on email subject lines and distribution of the emails. However, the highest quality of information is present in the body of the emails. We didn’t have access to the email bodies which is the biggest limitation of the study. Future work should focus on deeper text mining on the email body to quantify a more accurate representation of work related emails. This could play a huge factor in differentiating work relationships between any two employees.

Latest revision as of 23:27, 23 April 2017

APA logo.png

HOME

 

PROJECT OVERVIEW

 

METHODOLOGY

 

FEATURE ENGINEERING

 

CLASSIFICATION MODELLING

 

DOCUMENTATION

 

OTHER PROJECTS

 
Motivation And Project Overview

Human Resource Analytics is the idea of using data in the organizational context to understand different factors about employees such as their degree of collaboration and influence. Known researcher Rob Cross has also said “Organizational Network Analysis provides a powerful means of making invisible patterns of information flow and collaboration, visible.” These factors are generally computed based on various sets of data that are primarily collected via pulse surveys. The data collection process is slow because pulse surveys must be distributed at regular intervals to receive updated insights. However, this is not a viable option as it is not only a repetitive process but also makes it difficult for managers to view real-time insights.
This study explores and investigates whether subject lines and frequency of emails exchanged between employees can be used as a representative resource for analyzing organizational networks, specifically, the work network. We define work network as the network of employees with whom one interacts with on a daily basis for work purposes.

Objective

The objective of this study is to understand if subject lines and frequency of emails exchanged between employees is a representative resource for analyzing organizational networks, specifically, work network. At the company, these factors are computed based on various sets of data that are primarily collected via pulse surveys. The survey data collection process is slow and makes it difficult for managers to view real-time insights. As an alternative, our team wants to check if email interactions can be used to compute these factors based on only email communication data. Through feature engineering, an unbiased email network is created which is compared against the work network derived from a survey.

Data

We are provided with an excel sheet containing a huge set of email exchange log via the TrustSphere domain. The data consists of 14 columns as described below:

Column Explanations
Date Timestamp of the email
Remote IP If the email exchange is external then this column shows the external person's email
Remote The TrustSphere employee who is receiving or sending the email
Remote Domain Always TrustSphere
Local Email address of the person sending the email
Local Domain Domain of the person who is sending the email
Originator Inbound, outbound or internal (if you’re receiving the email, sending it or if the email is between 2 TrustSphere employees)
Direction Always TrustSphere in this case
Domain Group Email Header (Subject Line)
Subject Type of message: email/im (instant messaging)/voice/sms
Inbound Count Number of emails received
Outbound Count Number of emails sent
Size Size of the message (number of characters)
Msgid Encoded Message ID
Data Statistics
Number of rows 121,154
Date Range 11/26/2016 8:00 am to 02/01/2017 00:00 am


Literature Review

Rob Cross and Karen Stephenson have studied organizational networks for a long time, and their findings and theories are essential to the objective of this study. Their discoveries are important to evaluate the patterns and outcomes found from this study of organizational email data networks.

Rob Cross has done extensive work on organizational network analysis looking for methods to improve collaboration between company units and find ways to break silos. Cross (2004) states – “Organizational Network Analysis (ONA) can provide an x-ray into the inner workings of an organization – a powerful means of making invisible patterns of information flow and collaboration in strategically important groups visible.”

According to Cross’s (2000) research, the level of collaboration can impact employee stress, the distribution of collaborative work within teams and more. He found that employees who are willing to help beyond their scope, gradually develop a resourceful reputation, and hence are included into projects of higher impact. However, Cross said, such employees eventually become bottlenecks as other employees become over-reliant and have no substantial progress without the facilitation of that one employee, or because that employee is overloaded with work and is unable to deliver equivalently.

Karen Stephenson, also known as “The Organization Woman”, recognized six main types of knowledge network in an organization –

  • Daily network: whom we see day-to-day
  • Wider social network: whom we actively stay in touch with
  • Innovation network: with whom we test out new ideas
  • Expert network: to whom we go for expertise and knowledge
  • Strategic network: to whom we go for guidance and advice
  • Learning network: who help us move from what we know to new knowledge and expertise


Her famous book - Quantum Theory of Trust (2006) – shows that manipulating networks of trust can control the flow of information in an organization due to the link between networks of trust and tacit knowledge. Her research aimed at observing communication patterns, identifying bottlenecks and silos and understanding team dysfunctions.

This study involves term analysis of employee email subject lines, which can be considered as short documents of text. A relevant paper by Mika Timonen (2013) on term weighting in short documents contains a variety of methods to determine the importance of terms and extracting keywords from a document. The most appropriate and relevant method to use for our study for term weighting is inverse document frequency (IDF).

Methodology

Click on METHODOLOGY for more details.

MethodNew.jpg

Conclusion

Email data is not a strong representative for Work Network when divided into multiple relationship strength segments. However, it is a good representation of Work Network when divided into only two relationship strength segments. By just using email data to predict work relation strengths, companies can save large amounts of time, manpower and money spent on survey analysis to retrieve similar data. Further, it is possible to update the analysis is real time by changing the dates of the email data.

Limitation

The whole analysis has been done on email subject lines and distribution of the emails. However, the highest quality of information is present in the body of the emails. We didn’t have access to the email bodies which is the biggest limitation of the study. Future work should focus on deeper text mining on the email body to quantify a more accurate representation of work related emails. This could play a huge factor in differentiating work relationships between any two employees.