Difference between revisions of "ISSS608 2017-18 T3 Assign LUO Haoran Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
(Replaced content with "kk")
Line 1: Line 1:
<div style=background:#343031 border:#E55322>
+
kk
[[Image:MC3 2018.jpg|left|250px]]
 
<font size = 6; font face: "Optima";font color = #FFFFF0>VAST Challenge 2018 MC3: <br>Catch the Unsub – Who hurts Eurasian Pipit?</font>   
 
</div>
 
<!--Navigation-->
 
{| style="background-color: :#1B338F;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
| style="font-family: Optima; font-size:130%; solid #000000; background: #343031; text-align: center;" width="25%" |
 
;
 
[[ISSS608_2017-18_T3_Assign_Luo_Haoran|<font color="#FFFFF0">Overview</font>]]
 
| style="font-family: Optima; font-size:130%; solid #000000; background: #343031; text-align:center;" width="25%" |
 
;
 
[[ISSS608_2017-18_T3_Assign_LUO_Haoran_Data Preparation|<font color="#FFFFF0">Data Preparation</font>]]
 
| style="font-family: Optima; font-size:130%; solid #000000; background: #343031; text-align: center;" width="25%" |
 
;
 
[[ISSS608_2017-18_T3_Assign_LUO_Haoran_Methodology|<font color="#FFFFF0">Methodology</font>]]
 
| style="font-family: Optima; font-size:130%; solid #000000; background: #343031; text-align: center;" width="25%" |
 
;
 
[[ISSS608_2017-18_T3_Assign_LUO_Haoran_Insights|<font color="#FFFFF0">Insights</font>]]
 
|}
 
 
 
 
 
<!-- Body -->
 
==<div style="background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; font-family: Helvetica;  border-left:8px solid #0091b3"><font color= #343031><strong>Raw Data Overview</strong></font></div>==
 
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
 
 
 
<b>1. Event Data Tables Overview</b><br>
 
[[File:1. raw data overview 1.jpg|600px]]
 
*In total there are 9 event data tables, recording calls, emails, purchases and meetings across the company.
 
*Insider has already marked out suspicious calls, emails, purchases and meetings and made them isolated data tables.
 
*No column name for all 9 data tables. However, the format of them are the same.
 
**1. Source: contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting
 
** 2. Etype: contains a number designating what kind of connection is made.
 
***a. 0 is for calls
 
***b. 1 is for emails
 
***c. 2 is for purchases
 
***d. 3 is for meetings
 
**3. Destination: contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being invited to a meeting.
 
**4. Time stamp: is in seconds starting on May 11, 2015 at 14:00.
 
<br>
 
 
 
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
 
<b>2. Company Index Overview</b><br>
 
[[File:1. raw data overview 2.jpg|600px]]
 
* The company index shows the name of everyone in the company and their associated ID#. There are 642,631 individuals in the index.
 
 
 
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
 
<b>3. Letter from Insider</b><br>
 
[[File:Letter from insider.png|600px]]
 
* There is a letter from the insider in which he/she shares a suspicious employee list.
 
</div>
 
 
 
</div>
 
</div>
 
 
 
==<div style="background: #ffffff; padding: 17px; line-height: 0.1em;  text-indent: 10px; font-size:17px; font-family: Helvetica;  border-left:8px solid #0091b3"><font color= #000000><strong>Data Aggregation</strong></font></div>==
 
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left">
 
<b>1. Add Column Name</b><br>
 
[[File:1. add column name.jpg|750px]]
 
*Applied the same command to 9 event data tables to add column names.
 
*Corresponding Column Name: Source, Etype, Destination, TimeStamp
 
<br>
 
 
 
<b>2. Convert TimeInterval to TimeStamp</b><br>
 
[[File:2. convert to time date.jpg|750px]]
 
*Changed TimeInterval in original data tables into TimeStamp in format of dd/mm/yyyy hh:mm:ss.
 
*Initial time is 11/05/2015 14:00:00.
 
<br>
 
 
 
<b>3. Save New Data Tables</b><br>
 
[[File:3. save new data table.jpg|750px]]
 
*Saved new data tables with TimeStamp to the path.
 
<br>
 
 
 
<b>4. Data Modification in JMP</b><br>
 
[[File:Data in JMP.png|750px]]
 
*Relabeled values in Etype column.
 
*Added names for sources and targets according to SourceID (company index) and TargetID (company index).
 
*created new columns indicating different time dimensions.
 
<br>
 
 
 
<b>5. Create Nodes and Edges Data Using Suspicious Employee List</b><br>
 
:5.1 Edges_Suspicious Event Data
 
[[File:Edges susevent.png|750px]]
 
 
 
:*Select out all events initiated or accepted by the suspicious employees.
 
:*Concatenate data tables together and get a merged data table as Edges_SusEvent.
 
<br>
 
 
 
 
 
:5.2 Nodes_Suspicious Employee Data
 
[[File:Nodes susevent.png|450px]]
 
 
 
:*By using the Edges table created, all the employees involved in the suspicious behavior can be filtered out.
 
:*Except for 20 employees the insider points out, there might be more people doing bad things.
 
 
 
</div>
 

Revision as of 01:47, 8 July 2018

kk