Difference between revisions of "ISSS608 2017-18 T3 Assign LUO Haoran Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
(No difference)

Latest revision as of 14:35, 8 July 2018

MC3 2018.jpg

VAST Challenge 2018 MC3:
Catch the Unsub – Who hurts Eurasian Pipit?

Overview

Data Preparation

Methodology

Insights


Raw Data Overview

1. Event Data Tables Overview
1. raw data overview 1.jpg

  • In total there are 9 event data tables, recording calls, emails, purchases and meetings across the company.
  • Insider has already marked out suspicious calls, emails, purchases and meetings and made them isolated data tables.
  • No column name for all 9 data tables. However, the format of them are the same.
    • 1. Source: contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting
    • 2. Etype: contains a number designating what kind of connection is made.
      • a. 0 is for calls
      • b. 1 is for emails
      • c. 2 is for purchases
      • d. 3 is for meetings
    • 3. Destination: contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being invited to a meeting.
    • 4. Time stamp: is in seconds starting on May 11, 2015 at 14:00.


2. Company Index Overview
1. raw data overview 2.jpg

  • The company index shows the name of everyone in the company and their associated ID#. There are 642,631 individuals in the index.

3. Letter from Insider
Letter from insider.png

  • There is a letter from the insider in which he/she shares a suspicious employee list.

Data Aggregation

1. Add Column Name
1. add column name.jpg

  • Applied the same command to 9 event data tables to add column names.
  • Corresponding Column Name: Source, Etype, Destination, TimeStamp


2. Convert TimeInterval to TimeStamp
2. convert to time date.jpg

  • Changed TimeInterval in original data tables into TimeStamp in format of dd/mm/yyyy hh:mm:ss.
  • Initial time is 11/05/2015 14:00:00.


3. Save New Data Tables
3. save new data table.jpg

  • Saved new data tables with TimeStamp to the path.


4. Data Modification in JMP
Data in JMP.png

  • Relabeled values in Etype column.
  • Added names for sources and targets according to SourceID (company index) and TargetID (company index).
  • created new columns indicating different time dimensions.


5. Create Nodes and Edges Data Using Suspicious Employee List

5.1 Edges_Suspicious Event Data

Edges susevent.png

  • Select out all events initiated or accepted by the suspicious employees.
  • Concatenate data tables together and get a merged data table as Edges_SusEvent.



5.2 Nodes_Suspicious Employee Data

Nodes susevent.png

  • By using the Edges table created, all the employees involved in the suspicious behavior can be filtered out.
  • Except for 20 employees the insider points out, there might be more people doing bad things.