Difference between revisions of "APA Final Progress"

From Analytics Practicum
Jump to navigation Jump to search
(Add Column information)
Line 30: Line 30:
 
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Preliminary analysis</font></div><br/>
 
|style="vertical-align:top;width:30%;" | <div style="background: #10d0e5; padding: 13px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"> <font color= #ffffff>Preliminary analysis</font></div><br/>
 
<p>
 
<p>
 +
 +
'''Before Cleaning'''<br>
 +
Our data consists of 14 columns and 121154 rows of data as described below:
 +
{| class="wikitable"
 +
|+Column Explanations
 +
|-
 +
|Date
 +
|Timestamp of the email
 +
|-
 +
|Remote IP
 +
|If the email exchange is external then this column shows the external person's email
 +
|-
 +
|Remote
 +
|The TrustSphere employee who is receiving or sending the email
 +
|-
 +
|Remote Domain
 +
|Always TrustSphere
 +
|-
 +
|Local
 +
|Email address of the person sending the email
 +
|-
 +
|Local Domain
 +
|Domain of the person who is sending the email
 +
|-
 +
|Originator
 +
|Inbound, outbound or internal (if you’re receiving the email, sending it or if the email is between 2 TrustSphere employees)
 +
|-
 +
|Direction
 +
|Always TrustSphere in this case
 +
|-
 +
|Domain Group
 +
|Email Header (Subject Line)
 +
|-
 +
|Subject
 +
|Type of message: email/im (instant messaging)/voice/sms
 +
|-
 +
|Inbound Count
 +
|Number of emails received
 +
|-
 +
|Outbound Count
 +
|Number of emails sent
 +
|-
 +
|Size
 +
|Size of the message (number of characters)
 +
|-
 +
|Msgid
 +
|Encoded Message ID
 +
|}
 +
 
<ul>
 
<ul>
 
<li> Exploration of network : filtered for internal employees only</li>
 
<li> Exploration of network : filtered for internal employees only</li>

Revision as of 21:33, 22 February 2017

APA logo.png

HOME

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 

FEATURE ENGINEERING

 
Preliminary analysis

Before Cleaning
Our data consists of 14 columns and 121154 rows of data as described below:

Column Explanations
Date Timestamp of the email
Remote IP If the email exchange is external then this column shows the external person's email
Remote The TrustSphere employee who is receiving or sending the email
Remote Domain Always TrustSphere
Local Email address of the person sending the email
Local Domain Domain of the person who is sending the email
Originator Inbound, outbound or internal (if you’re receiving the email, sending it or if the email is between 2 TrustSphere employees)
Direction Always TrustSphere in this case
Domain Group Email Header (Subject Line)
Subject Type of message: email/im (instant messaging)/voice/sms
Inbound Count Number of emails received
Outbound Count Number of emails sent
Size Size of the message (number of characters)
Msgid Encoded Message ID
  • Exploration of network : filtered for internal employees only
  • Looked for trends based on size of message : no correlation
  • Eigenvector centrality analysis : Found biased data- Although the network generated showed certain employees to have high influence, when we showed our results to the client, they mentioned that those individuals aren’t actually that influential. We understood that this was because the ties were given equal weightage.
  • Thus, we must weigh the ties differently using subject line weighting, reply rate, whether the email is a reply, forward or cc, hierarchy of email senders or recipients etc.


Prel analysis 1.jpg


Blue = high eigenvector; White = mid; Red = low; Size of node = outdegree