Difference between revisions of "ISSS608 2017-18 T3 Assign Jyoti Bukkapatil Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
<div style=background:#2B3856 border:#A3BFB1>
 +
[[Image:MC3_2018.jpg |200px]]
 +
<b><font size = 5; color="#FFFFFF">  VAST Challenge 2018 MC3 </font></b>
 +
</div>
 +
<!--MAIN HEADER -->
 +
{|style="background-color:#1B338F;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 +
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2B3856; text-align:center;" width="20%" |
 +
;
 +
[[ISSS608_2017-18_T3_Assign_Jyoti_Bukkapatil| <font color="#FFFFFF">Background</font>]]
 +
 +
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3856; text-align:center;" width="20%" |
 +
;
 +
[[ISSS608_2017-18_T3_Assign_Jyoti_Bukkapatil_Data_Preparation| <font color="#FFFFFF">Data Preparation</font>]]
 +
 +
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3856; text-align:center;" width="20%" |
 +
;
 +
[[ISSS608_2017-18_T3_Assign_Jyoti_Bukkapatil_Methodology_&_Dashboard_Design| <font color="#FFFFFF">Methodology & Dashboard Design </font> ]]
 +
 +
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3856; text-align:center;" width="20%" |
 +
;
 +
[[ISSS608_2017-18_T3_Assign_Jyoti_Bukkapatil_Observations & Insights| <font color="#FFFFFF">Observations & Insights</font> ]]
 +
 +
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3856; text-align:center;" width="20%" |
 +
;
 +
[[ISSS608_2017-18_T3_Assign_Jyoti_Bukkapatil_Conclusion| <font color="#FFFFFF">Conclusion</font> ]]
 +
 +
|style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3856; text-align:center;" width="20%" |
 +
;
 +
[[Assignment_Dropbox_G1| <font color="#FFFFFF">Back to Dropbox</font> ]]
 +
 +
|  &nbsp;
 +
|}
 +
 +
 
== <big>Data for Visualisation</big> ==
 
== <big>Data for Visualisation</big> ==
  
Line 17: Line 51:
 
|5||Company employee ID and Name list ||CompanyIndex.csv||  642631
 
|5||Company employee ID and Name list ||CompanyIndex.csv||  642631
 
|-
 
|-
|6||Suspicious call records of suspecious group in company ,starting from 11 May 2015 14:00:00 hrs ||Suspicious_calls.csv||  70
+
|6||Suspicious call records of suspicious group in company ,starting from 11 May 2015 14:00:00 hrs ||Suspicious_calls.csv||  70
 
|-
 
|-
|7||Suspicious email records of suspecious group in company ,starting from 11 May 2015 14:00:00 hrs ||Suspicious_emails.csv|| 61
+
|7||Suspicious email records of suspicious group in company ,starting from 11 May 2015 14:00:00 hrs ||Suspicious_emails.csv|| 61
 
|-
 
|-
|8||Suspicious meeting records of suspecious group in company ,starting from 11 May 2015 14:00:00 hrs ||Suspicious_purchases.csv|| 5
+
|8||Suspicious meeting records of suspicious group in company ,starting from 11 May 2015 14:00:00 hrs ||Suspicious_purchases.csv|| 5
 
|-
 
|-
|9||Suspicious purchases records of suspecious group in company ,starting from 11 May 2015 14:00:00 hrs ||Suspicious_meetings.csv|| 1
+
|9||Suspicious purchases records of suspicious group in company ,starting from 11 May 2015 14:00:00 hrs ||Suspicious_meetings.csv|| 1
 
|-
 
|-
|10||Suspecious 7 purchases records||Other_suspicious_purchases.csv || 7
+
|10||Suspicious 7 purchases records||Other_suspicious_purchases.csv || 7
 
|-
 
|-
 
|}
 
|}
Line 91: Line 125:
  
  
==Tools Used ==
+
==Data Preparation for Other suspicious groups ==
I have used below four tools for data analysis and visualization.
+
*An insider from Kasios International has provided the list of other suspicious purchases in the organization. List of these 8 employees who were involved in suspicious purchases was used to extract interactions of these 8 employees with rest of the employees from the company. This was done by using join function JMP pro. Similar to steps followed in Data preparation for suspicious records. This data was used to plot the network graph. Below are data files created for analysis of this group.
# JMP Pro 13
+
#Other Suspicious Network.csv
# Tableau 2018.1
+
#Other Suspicious Network nodes.csv
# Gephi 0.9.2
+
#Other_Supicious_Purchase nodes.csv
# Microsoft Excel
+
*Time Stamp was calculated in the same way as described in "Time Stamp Calculation " section.
 +
*Start Date was derived from Time Stamp.

Latest revision as of 14:32, 13 July 2018

MC3 2018.jpg VAST Challenge 2018 MC3

Background

Data Preparation

Methodology & Dashboard Design

Observations & Insights

Conclusion

Back to Dropbox

 


Data for Visualisation

Data provided by Kasios International insider has been 10 different CSV files. These are mainly different call records, email records, Meeting records and Purchase records from 11th May 2015 14:00:00 hours onwards. All files contain Source, destination, connection details and time in seconds. Below table shows details of different files providing The Kasios Insider.

Index Data File Name Number of Records
1 Call records for the whole company company starting from 11 May 2015 14:00:00 hrs calls.csv 10606835
2 Email records for the whole company starting from 11 May 2015 14:00:00 hrs email.csv 14550085
3 Meetings records for whole company starting from 11 May 2015 14:00:00 hrs meeting.csv 127351
4 Purchases records for whole company starting from 11 May 2015 14:00:00 hrs purchaces.csv 762200
5 Company employee ID and Name list CompanyIndex.csv 642631
6 Suspicious call records of suspicious group in company ,starting from 11 May 2015 14:00:00 hrs Suspicious_calls.csv 70
7 Suspicious email records of suspicious group in company ,starting from 11 May 2015 14:00:00 hrs Suspicious_emails.csv 61
8 Suspicious meeting records of suspicious group in company ,starting from 11 May 2015 14:00:00 hrs Suspicious_purchases.csv 5
9 Suspicious purchases records of suspicious group in company ,starting from 11 May 2015 14:00:00 hrs Suspicious_meetings.csv 1
10 Suspicious 7 purchases records Other_suspicious_purchases.csv 7
Table 1

All data files contain only four columns:

  1. Source ID: Company ID of the person who has initiated connection i.e. either Called someone, sent email, invited someone for meeting or purchases something
  2. Etype: Connection details i.e. 0 – Calls, 1 – Emails, 2 – Purchases and 3- Meetings
  3. Target ID: Company ID of destination person for connection
  4. Time Stamp: Time in Seconds starting from 11th May 2015 at 14:00

Data Preparation

Column names were changed to make it easy to understand and analyze data.

  1. "Source ID" was changed to "Source".
  2. "Target ID" was changed to "Target".
  3. "Etype" was changed to "Communication Mode".
  4. "Time Stamp" was changed to "Time in Sec".

Time Stamp Calculation:

JMP Pro 13 is used for data exploration and preparation. Converted Time in Sec to Date and Time By below formula in JMP.

"MDYHMS (: Time in Sec + Date DMY (11, 5, 2015) + In Hours (14))"

Date time format was changed to "m/d/y h:m:s". After changing column names and calculating the date-time value, the final data table is as below.

Final Data template.png
Figure 1

Data Distribution

It was observed from the data distribution that a number of meetings in 2015 were very less compared with the rest of the two years. There was no way that we would be able to find whether data is missing, or it was the fact that fewer meetings took place during 2015. For better visualization of meetings pattern, this data was excluded. Similarly, for Calls, Emails, and Purchases, Number of records from May 2015 till Sep 2015 were less compared with the number of records for the rest of the time. So these were also excluded for further data visualization.

Data Distribution for Meetings & Calls.jpg

Data Distribution For Emails and Purchases.jpg


Figure 2 Data Distribution for Meetings,Calls,Emails & Purchases

Data Preparation for Suspicious Records

JMP Join Table1.png
Figure 3

Below four datafiles were combined to create a single file for all suspicious activities and names as "Suspicious_all.csv"

  1. Suspecious_calls.csv
  2. Suspecious_emails.csv
  3. Suspecious_meeting.csv
  4. Suspecious_purcahses.csv

Timestamp was calculated with the same formula as described in Time Stamp calculation section. List of the 20 suspicious employees was used to extract interactions of these 20 employees with rest of the employees from the company. This was done by using join function JMP pro. Figure 3,4 & 5 shows the steps involved in this data preparation. The same process was followed for all four connection modes and all records were combined in one CSV file named as “Suspicious Associations Total.csv”. Table 4 shows summary for Suspicious employee’s connection records i.e. the number of records for every years and type of connections.


JMP Join Table2.png

Figure 4
JMP Join Table3.png
Figure 5


Data Preparation for Other suspicious groups

  • An insider from Kasios International has provided the list of other suspicious purchases in the organization. List of these 8 employees who were involved in suspicious purchases was used to extract interactions of these 8 employees with rest of the employees from the company. This was done by using join function JMP pro. Similar to steps followed in Data preparation for suspicious records. This data was used to plot the network graph. Below are data files created for analysis of this group.
  1. Other Suspicious Network.csv
  2. Other Suspicious Network nodes.csv
  3. Other_Supicious_Purchase nodes.csv
  • Time Stamp was calculated in the same way as described in "Time Stamp Calculation " section.
  • Start Date was derived from Time Stamp.