Difference between revisions of "ISSS608 2017-18 T3 Assign LUO Haoran Data Preparation"
Jump to navigation
Jump to search
Line 18: | Line 18: | ||
[[ISSS608_2017-18_T3_Assign_LUO_Haoran_Insights|<font color="#FFFFF0">Insights</font>]] | [[ISSS608_2017-18_T3_Assign_LUO_Haoran_Insights|<font color="#FFFFF0">Insights</font>]] | ||
|} | |} | ||
+ | |||
+ | |||
+ | <!-- Body --> | ||
+ | ==<div style="background: #ffffff; padding: 17px; line-height: 0.1em; text-indent: 10px; font-size:17px; font-family: Helvetica; border-left:8px solid #0091b3"><font color= #343031><strong>Raw Data Overview</strong></font></div>== | ||
+ | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left"> | ||
+ | |||
+ | <b>1. Event Data Tables Overview</b><br> | ||
+ | [[File:1. raw data overview 1.jpg|600px]] | ||
+ | *In total there are 9 event data tables, recording calls, emails, purchases and meetings across the company. | ||
+ | *Insider has already marked out suspicious calls, emails, purchases and meetings and made them isolated data tables. | ||
+ | *No column name for all 9 data tables. However, the format of them are the same. | ||
+ | **1. Source: contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting | ||
+ | ** 2. Etype: contains a number designating what kind of connection is made. | ||
+ | ***a. 0 is for calls | ||
+ | ***b. 1 is for emails | ||
+ | ***c. 2 is for purchases | ||
+ | ***d. 3 is for meetings | ||
+ | **3. Destination: contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being invited to a meeting. | ||
+ | **4. Time stamp: is in seconds starting on May 11, 2015 at 14:00. | ||
+ | <br> | ||
+ | |||
+ | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left"> | ||
+ | <b>2. Company Index Overview</b><br> | ||
+ | [[File:1. raw data overview 2.jpg|600px]] | ||
+ | * The company index shows the name of everyone in the company and their associated ID#. There are 642,631 individuals in the index. | ||
+ | |||
+ | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left"> | ||
+ | <b>3. Letter from Insider</b><br> | ||
+ | [[File:Letter from insider.png|600px]] | ||
+ | * There is a letter from the insider in which he/she shares a suspicious employee list. | ||
+ | </div> | ||
+ | |||
+ | </div> | ||
+ | </div> | ||
+ | |||
+ | ==<div style="background: #ffffff; padding: 17px; line-height: 0.1em; text-indent: 10px; font-size:17px; font-family: Helvetica; border-left:8px solid #0091b3"><font color= #000000><strong>Data Aggregation</strong></font></div>== | ||
+ | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left"> | ||
+ | <b>1. Add Column Name</b><br> | ||
+ | [[File:1. add column name.jpg|750px]] | ||
+ | *Applied the same command to 9 event data tables to add column names. | ||
+ | *Corresponding Column Name: Source, Etype, Destination, TimeStamp | ||
+ | <br> | ||
+ | |||
+ | <b>2. Convert TimeInterval to TimeStamp</b><br> | ||
+ | [[File:2. convert to time date.jpg|750px]] | ||
+ | *Changed TimeInterval in original data tables into TimeStamp in format of dd/mm/yyyy hh:mm:ss. | ||
+ | *Initial time is 11/05/2015 14:00:00. | ||
+ | <br> | ||
+ | |||
+ | <b>3. Save New Data Tables</b><br> | ||
+ | [[File:3. save new data table.jpg|750px]] | ||
+ | *Saved new data tables with TimeStamp to the path. | ||
+ | <br> | ||
+ | |||
+ | <b>4. Data Modification in JMP</b><br> | ||
+ | [[File:Data in JMP.png|750px]] | ||
+ | *Relabeled values in Etype column. | ||
+ | *Added names for sources and targets according to SourceID (company index) and TargetID (company index). | ||
+ | *created new columns indicating different time dimensions. | ||
+ | <br> | ||
+ | |||
+ | <b>5. Create Nodes and Edges Data Using Suspicious Employee List</b><br> | ||
+ | :5.1 Edges_Suspicious Event Data | ||
+ | [[File:Edges susevent.png|750px]] | ||
+ | |||
+ | :*Select out all events initiated or accepted by the suspicious employees. | ||
+ | :*Concatenate data tables together and get a merged data table as Edges_SusEvent. | ||
+ | <br> | ||
+ | |||
+ | |||
+ | :5.2 Nodes_Suspicious Employee Data | ||
+ | [[File:Nodes susevent.png|450px]] | ||
+ | |||
+ | :*By using the Edges table created, all the employees involved in the suspicious behavior can be filtered out. | ||
+ | :*Except for 20 employees the insider points out, there might be more people doing bad things. | ||
+ | |||
+ | </div> |
Latest revision as of 14:35, 8 July 2018
|
|
|
|
Raw Data Overview
- In total there are 9 event data tables, recording calls, emails, purchases and meetings across the company.
- Insider has already marked out suspicious calls, emails, purchases and meetings and made them isolated data tables.
- No column name for all 9 data tables. However, the format of them are the same.
- 1. Source: contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting
- 2. Etype: contains a number designating what kind of connection is made.
- a. 0 is for calls
- b. 1 is for emails
- c. 2 is for purchases
- d. 3 is for meetings
- 3. Destination: contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being invited to a meeting.
- 4. Time stamp: is in seconds starting on May 11, 2015 at 14:00.
Data Aggregation
- Applied the same command to 9 event data tables to add column names.
- Corresponding Column Name: Source, Etype, Destination, TimeStamp
2. Convert TimeInterval to TimeStamp
- Changed TimeInterval in original data tables into TimeStamp in format of dd/mm/yyyy hh:mm:ss.
- Initial time is 11/05/2015 14:00:00.
- Saved new data tables with TimeStamp to the path.
- Relabeled values in Etype column.
- Added names for sources and targets according to SourceID (company index) and TargetID (company index).
- created new columns indicating different time dimensions.
5. Create Nodes and Edges Data Using Suspicious Employee List
- 5.1 Edges_Suspicious Event Data
- Select out all events initiated or accepted by the suspicious employees.
- Concatenate data tables together and get a merged data table as Edges_SusEvent.
- 5.2 Nodes_Suspicious Employee Data
- By using the Edges table created, all the employees involved in the suspicious behavior can be filtered out.
- Except for 20 employees the insider points out, there might be more people doing bad things.