Difference between revisions of "ISSS608 2017-18 T3 Assign LUO Haoran Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 45: Line 45:
  
 
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
 
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
<b>2. Letter from Insider</b><br>
+
<b>3. Letter from Insider</b><br>
 
[[File:Letter from insider.png|600px]]
 
[[File:Letter from insider.png|600px]]
 
* There is a letter from the insider in which he/she shares a suspicious employee list.
 
* There is a letter from the insider in which he/she shares a suspicious employee list.
Line 52: Line 52:
 
</div>
 
</div>
 
</div>
 
</div>
 
  
 
==<div style="background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; font-family: Helvetica;  border-left:8px solid #0091b3"><font color= #000000><strong>Data Aggregation</strong></font></div>==
 
==<div style="background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; font-family: Helvetica;  border-left:8px solid #0091b3"><font color= #000000><strong>Data Aggregation</strong></font></div>==

Revision as of 23:57, 7 July 2018

MC3 2018.jpg

VAST Challenge 2018 MC3:
Catch the Unsub – Who hurts Eurasian Pipit?

Overview

Data Preparation

Methodology

Insights


Raw Data Overview

1. Event Data Tables Overview
1. raw data overview 1.jpg

  • In total there are 9 event data tables, recording calls, emails, purchases and meetings across the company.
  • Insider has already marked out suspicious calls, emails, purchases and meetings and made them isolated data tables.
  • No column name for all 9 data tables. However, the format of them are the same.
    • 1. Source: contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting
    • 2. Etype: contains a number designating what kind of connection is made.
      • a. 0 is for calls
      • b. 1 is for emails
      • c. 2 is for purchases
      • d. 3 is for meetings
    • 3. Destination: contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being invited to a meeting.
    • 4. Time stamp: is in seconds starting on May 11, 2015 at 14:00.


2. Company Index Overview
1. raw data overview 2.jpg

  • The company index shows the name of everyone in the company and their associated ID#. There are 642,631 individuals in the index.

3. Letter from Insider
Letter from insider.png

  • There is a letter from the insider in which he/she shares a suspicious employee list.

Data Aggregation

1. Add Column Name
1. add column name.jpg

  • Applied the same command to 9 event data tables to add column names.
  • Corresponding Column Name: Source, Etype, Destination, TimeStamp


2. Convert TimeInterval to TimeStamp
2. convert to time date.jpg

  • Changed TimeInterval in original data tables into TimeStamp in format of dd/mm/yyyy hh:mm:ss.
  • Initial time is 11/05/2015 14:00:00.


3. Save New Data Tables
3. save new data table.jpg

  • Saved new data tables with TimeStamp to the path.

4. Data Modification in JMP
Data in JMP.png

  • Relabeled values in Etype column.
  • Added names for sources and targets according to SourceID (company index) and TargetID (company index).
  • created new columns indicating different time dimensions.