APA Final Progress

Preliminary analysis

Before Cleaning
Our data consists of 14 columns and 121154 rows of data as described below:

Column Explanations
Date	Timestamp of the email
Remote IP	If the email exchange is external then this column shows the external person's email
Remote	The TrustSphere employee who is receiving or sending the email
Remote Domain	Always TrustSphere
Local	Email address of the person sending the email
Local Domain	Domain of the person who is sending the email
Originator	Inbound, outbound or internal (if you’re receiving the email, sending it or if the email is between 2 TrustSphere employees)
Direction	Always TrustSphere in this case
Domain Group	Email Header (Subject Line)
Subject	Type of message: email/im (instant messaging)/voice/sms
Inbound Count	Number of emails received
Outbound Count	Number of emails sent
Size	Size of the message (number of characters)
Msgid	Encoded Message ID

Exploration of network : filtered for internal employees only
Looked for trends based on size of message : no correlation
Eigenvector centrality analysis : Found biased data- Although the network generated showed certain employees to have high influence, when we showed our results to the client, they mentioned that those individuals aren’t actually that influential. We understood that this was because the ties were given equal weightage.
Thus, we must weigh the ties differently using subject line weighting, reply rate, whether the email is a reply, forward or cc, hierarchy of email senders or recipients etc.

Blue = high eigenvector; White = mid; Red = low; Size of node = outdegree

Navigation menu