ISSS608 2017-18 T3 Assign Minami Yusuke

From Visual Analytics and Applications
Jump to navigation Jump to search


Author

Yusuke MINAMI

https://www.linkedin.com/in/yusukeminami


Background

 

This page is my submission for ISSS608 Visual Analytics and Applications course assignment based on VAST Challenge 2018 Mini-Challenge 3.

The datasets were downloaded from:

VAST Challenge 2018 Mini-Challenge 3

http://vacommunity.org/VAST+Challenge+2018+MC3



Tasks


1. Elucidate changes of the company over time

2. Locate the suspicious group, elucidate the employees closely associated with the suspicious group, and highlight the employees who are making suspicious purchases

3. Elucidate the interactions within the suspicious group over time

4. Elucidate the Potentially illicit purchases


Data Preparation


Data was prepared by Python in Jupyter notebook environment taking advantage of Pandas library.


1.

 

Read the following 4 CSV files including connections and concatenate.
calls.csv
emails.csv
meetings.csv
purchases.csv

Convert the seconds since May 11, 2015 at 14:00 to Timestamp data type.

Calculate the hour-level timestamp for summarization later.

Convert the Etype code (0-3) to call, email, meeting, or purchase.

Change the long format to wide format with summarizing the counts for each connection type (call, email, meeting, or purchase) in hour-level time slots.

Save as CSV file "wide_df.to_csv('VAST_Summary_Wide.csv".



2. & 3.

 

Read the following 4 CSV files including the suspicious connections and concatenate.
Suspicious_calls.csv
Suspicious _emails.csv
Suspicious _meetings.csv
Suspicious _purchases.csv

Extract the suspicious employee list.

Add Source Suspicious flag (whether the source node is in the suspicious list), Target Suspicious flag (whether the target node is in the suspicious list), and Employee Suspicious flag (whether either the source or target is in the suspicious list).

Extract connections with Employee Suspicious flag = True.

Add the employee name by joining "CompanyIndex.csv" on employee IDs.

Save as "Emp_Suspicious.csv".

Change the long format to wide format with summarizing the counts for each connection type (call, email, meeting, or purchase) in hour-level time slots.

Save as "Emp_Suspicious_Wide.csv".

 

Calculate the month-level timestamp.

Save as "Emp_Suspicious_Wide_YM.csv"

Extract the unique suspicious employee IDs.

Add the employee name by joining "CompanyIndex.csv" on employee IDs.

Save as "Suspicious_Nodes.csv".

 

4.

Read "Other_suspicious_purchases.csv"

Extract the list of employees who made potentially illicit purchases.

Add Source Potentially Illicit flag to the larger dataset, and extract only the Source Potentially Illicit flag = True.

Add Potentially Illicit flag.

Save as “Potentially_Illicit.csv”.






Design of Visualization


Tableau Desktop Professional Edition 2018.1.0 and Gephi 0.9.2 were used to visualize the datasets.

1.

Counts for each of the 4 connection types (calls, emails, meetings, and purchases) in each month were visualized. Colors were used to differentiate the weekday, hour, and day to explore any patterns.

 

2. & 3.

In Graph Theory, Adjacency Matrix is used to describe the existence of edges between nodes. Weighted Adjacency Matrix (or simply Adjacency Matrix in a broad sense) is used to indicate the weight of each edges. The employees were treated as nodes, the connections between them as edges, and the counted numbers as weight of edges. Since the source nodes and target nodes are distinguished, the given dataset can be treated as a directed graph. Source nodes are the employees who called, sent an email, purchased something, or invited people to a meeting. Each of the 4 connection types (calls, emails, meetings, and purchases) that form directed graphs can be described by Adjacency Matrix.  It is also possible to include 4 weight values for each edge type in each cell of an Adjacency Matrix of a multigraph, but multiple numeric values in each cell sacrifices the interpretability.

In this assignment, a new visualization methodology “Treemap Adjacency Matrix” is proposed. A Treemap Adjacency Matrix indicates multiple weights in each cell by treemap utilizing the rectangle area for weight and color for edge type. Treemap Adjacency Matrices were generated for source nodes listed as suspicious and target nodes listed as suspicious according to the anonymous insider. For source nodes listed as suspicious, the edges representing connections (calls, emails, meetings, and purchases) initiated by the 20 suspicious source nodes were extracted. Both source node and target nodes were sorted in the descending order of connection counts. Since there are too many target nodes not listed as suspicious, only the target nodes with connections more than a threshold value 7. Treemap Adjacency Matrices were generated likewise for targets nodes listed as suspicious with the threshold value 9.

To characterize the structure, communication within the group, and change over time, Gephi was used. For layout, Fruchterman Reingold was used to avoid overlapping of nodes. For nodes, only the 20 suspicious employees were highlighted by red color and the labels were added. As the labels, the ID numbers instead of employee names were used as names occupy more space that causes overlapping each other. For edges, calls were highlighted by blue color (the lighter blue, the more calls). The other connections were colored gray.

 

4.

The other connections on the days the 4 employees made potentially illicit purchases were visualized.


Insights




  1. YMinami 1 Counts of Connections over Months with Colors Representing Weekdays.png


    YMinami 1 Counts of Connections over Months with Colors Representing Hours.png


    YMinami 1 Counts of Connections over Months with Colors Representing Days.png

    Numbers of calls, emails, and purchases had been almost constant. Number of meetings had been increasing rapidly. The distributions in weekdays (Sunday - Saturday), hours (0 - 23), and days (1-31) were almost equally distributed over the whole period, and no change was observed. The pattern indicates that the datasets were artificially generated.






  2. YMinami 2 Treemap Adjacency Matrix for Suspicious Sources.png

    The 20 suspicious employees are listed in the columns. Employee ID 2040565 (Gail Feindt), ID 981554 (Sherrell Biebel), ID 786361 (Sheilah Stachniw), ID 944354 (Ferne Hards), and ID 175354 (Madeline Nindorf) received connections (calls, emails, meetings, and/or purchases) from the 20 suspicious employees and appear to be closely associated with the suspicions employees. Employee ID 2040565 (Gail Feindt) received the dominant number of purchase transactions; 15 by Employee ID 857138 (Richard Fox), 4 by Employee ID 1690582 (Meryl Pastuch), 4 by Employee ID 969089 (Tobi Gatin), and 1 by ID 623027 (Lizbeth Jinbra).


 

YMinami 2 Treemap Adjacency Matrix for Suspicious Targets.png

The 20 suspicious employees are listed in the columns. Employee ID 175354 (Madeline Nindorf), ID 786361 (Sheilah Stachniew), ID 981554 (Sherrel Biebel) made connections to the 20 suspicious employees and appear to be closely associated with the suspicious employees. The phone calls between ID 1847246 (Rosalia Larroque) and ID 728286 (Kerstin Belveal) are notable; 8 from ID 1847246 and 7 from ID 728286.


YMinami 2 Connections 20170920.png

The connections with suspicious employees involved on 20th September 2017, on which suspicious purchase was recorded, were plotted. Suspicious phone calls between ID 1847246 (Rosalia Larroque) and ID 728286 (Kerstin Belveal) and purchase by ID 1847246 (Rosalia Larroque) were recorded at the same time (3:38:53AM). Later in the afternoon on the same day, ID 1847246 (Rosalia Larroque) sent an email to ID 728286 (Kerstin Belveal), and ID 728286 (Kerstin Belveal) called , ID 1847246 (Rosalia Larroque), and then ID 1847246 (Rosalia Larroque) made second purchase, which was recorded as suspicious.

  1.  

 
YMinami 3c Counts of Connections with Suspicious Employees Involved Over Months.png

The fiture above describes the counts of connections (calls, Emails, meetings, andn purchases) involved with suspicious employees (as source and/or target) including the ones listed as suspicious. There were 2 notable months; November 2015 and September 2017.

 In Novmeber 2015 and September 2017,  comparatively more suspicous connections, especially calls were recorded. In addition, there were compartively more meetings in November 2015. September 2017 was the month the suspicious purhchase was recorded as described earlier.





YMinami 3a Whole.png

Whole period (2015-2017)

The figure above shows the connections between employees during the whole period (May 2015 - Dec 2017). The ID of the 20 suspicious employees are labelled. As discussed earlier, ID 1847246 (Rosalia Larroque) who made a suspicious purchase and ID 728286 (Kerstin Belveal) made phone calls to each other frequently thus
connected by light blue edge. The other suspicious employees are mostly connected with each other as well by blue edges which means fewer phone calls. ID 857138 (Richard Fox) is connected with the most nodes (has the largest degree), thus appears to be relatively higher position such as a manager. It is assumed that most of the suspicious employees, including ID 1847246 (Rosalia Larroque) and ID 728286 (Kerstin Belveal), report to ID 857138 (Richard Fox).

 

 

The figures that show connections in each month follow.


YMinami 3b 2015-10.png

Oct 2015



YMinami 3b 2015-11.png
Nov 2015



YMinami 3b 2015-12.png
Dec 2015





YMinami 3b 2017-08.png
August 2017



YMinami 3b 2017-09.png
Sep 2017



YMinami 3b 2017-10.png
Oct 2017

There were numerous connections between the suspicious employees in November 2015, followed by September 2017, while not so much in the other months.




  1. YMinami 4 Connections by Employees Who Made Potentially Illicit Purchases.png

    The figure above shows the connections made by the 4 employees who made potentially illicit purchases. The potential illicit purchases were highlighted by orange color above.


    YMinami 4 Connections by Employees on the Days They Made Potentially Illicit Purchases.png

    The figure above zoomed in the days on which the employees made potentially illicit purchases. 4 out of 7 potentially illicit purchases occurred together with phone calls