Difference between revisions of "Fu Yi - Visualization and Insights"

From Visual Analytics and Applications
Jump to navigation Jump to search
(Created page with "<div style=background:#2b3856 border:#A3BFB1> 150px <font size = 5; color="#FFFFFF"> VAST MINI CHALLENGE 3 - Find out the suspiciousness</font> </div...")
 
Line 62: Line 62:
  
 
     [[Image:Visq2.png|550px|frameless]]      [[Image:Visq22.gif|650px|frameless]]
 
     [[Image:Visq2.png|550px|frameless]]      [[Image:Visq22.gif|650px|frameless]]
 +
  
 
=Is anyone else appears to be closely associated with this group?=
 
=Is anyone else appears to be closely associated with this group?=
  
[[Image:Visq2table.png|550px|frameless|center]]   
+
[[Image:Visq2table.png|750px|frameless|center]]   
  
 
Above suspicious (Group A) people are marked with “SS_” in front of their name in the larger dataset to make them easier to identify, by joining with IDs that have association with this group, there are 1,640 IDs in total (include Group A), this group is Group B.
 
Above suspicious (Group A) people are marked with “SS_” in front of their name in the larger dataset to make them easier to identify, by joining with IDs that have association with this group, there are 1,640 IDs in total (include Group A), this group is Group B.

Revision as of 22:45, 8 July 2018

Covermn3.gif VAST MINI CHALLENGE 3 - Find out the suspiciousness

Introduction

Preparation

Visualization & Insights

References and Feedback

 


Question 1

Is the company growing?

The tool to visualize the overall picture of the company growth is Tableau.

First, I bring in four data sources, the categories are: Calls, Emails, Meeting and Purchases, change the variable type accordingly, as we have the data from July 2015 to December 2017, it is appropriate to show the monthly changes within the company across the year. Then, I create sheet for each category by using cycle plot to compare the monthly changes across different year, because each month has different number of days, for certain month like February, the days are naturally less than other month, to eliminate the bias, it should compare the month to itself check what the pattern changes. Moreover, I add a reference line to display the average value for each month, so can refer the changes to this line, it gives a better picture.

For Calls and Emails, the changes follow the similar pattern. Overall even though we miss out the first 6 months data in 2015, we are still able to tell that in 2015, the communication volume is increasing, because all of the rest of months from July to December in 2015 display a steep increasing trend, there is no reason that the first 6 months would have opposite trend if the company was in same operation condition. On the other hand, when it comes to 2016 to 2017, the scenario changes. In general, the increasing trend becomes more flattened, some of the months even witnessed decreases, like February and December.

Visq1.png

Purchase number is a strong indicate of a company whether its business was running good or not. For Purchases cycle plot, we can see from 2015 to 2016, all the 6 months from July to December exceed the average number of purchases, most of them they are above the reference line, so we can assume that in the first 6 months, the scenario for purchase was also increasing, the company was running good business in 2015~2016. However, the situation in 2016 to 2017 changed dramatically. All the increasing trend turned to be flattened, except July and September. Many months experienced an extreme decreasing trend line, which indicate that the company business experienced a down trend, not as good as its previous year.

Visq11.png

For Meetings, the number of meetings present increasing, but it cannot indicate that the company operational perspective was doing well.

Visq111.png

The full picture of the company operational scenario indicated by 4 types of activities. Based on the changes on Purchase, Calls and Emails in 2015 – 2016, the company ran a good business. However, the company witnessed a down trend in 2016 – 2017. Click here to view Tableau of the full picture


Question 2

What does insider try to tell us?

After applied Fruchterman Reingold layout method to the suspicious dataset, edit Area = 10000, Gravity = 15. After that, I ran Betweenness Centrality and Eigenvector Centrality, rank the size and colour according to these 2 statistic measures. Betweenness Centrality is a measure of centrality in a graph based on shortest paths, represents the degree of which nodes stand between each other. Eigenvector Centrality represent the importance of the node. A high eigenvector score means that a node is connected to many nodes who themselves have high scores. The larger the circle, the higher the Betweenness Centrality value, the darker the colour, the higher the Eigenvector Centrality value.

The first picture shows the entire connection for all suspicious nodes from insider. The second one filter only show the purchase. We can see that Rosalia requested a purchase from Jenice.

   Visq2.png      Visq22.gif


Is anyone else appears to be closely associated with this group?

Visq2table.png

Above suspicious (Group A) people are marked with “SS_” in front of their name in the larger dataset to make them easier to identify, by joining with IDs that have association with this group, there are 1,640 IDs in total (include Group A), this group is Group B.

The rule for potential other suspicious IDs has to be “connect at least more than 1 IDs”, because if an IDs connects one ID only, this ID is not possible to be the messenger or head of mafia to spread out the information about any suspicious event. Based on that rule, I ran Average Degree to get the degree level. Filter down for “Degree = 1”, highlight those IDs and remove them. There are 66 Nodes/IDs left, but some Nodes also only connect 1 IDs as well, but they have 2 degrees (in & out degree), so manually select those Nodes, also remove them. After that, I filtered for “In Degree = 0”, because those people do not take in any information from suspicious group, their Betweenness and Closeness value are very low, the possible information sent out from them are from non-suspicious group, so they are not within the suspicious investigation range.

Visq2giff.gif

The remaining People are 35, excluded the number of IDs from Group A(suspicious people from insider), I name them Group C.

Group C

Visq2v9.png


Question 3


What is the full picture of the organizational structure within the group

From the bad actors created in last question, the organizational structure within this group, I will demonstrate according to the centrality measures: Betweenness, Closeness and Eigenvector centrality. The meaning behind Centrality is to find very connected individuals, popular individuals, individuals who are likely to hold most information or individuals who can quickly connect with the wider network. For betweenness centrality, it measures which nodes act as ‘bridges’ between nodes in a network, it is to find the individuals who influence the flow around a system. The people who have high betweenness might hold authority over or control collaboration between disparate clusters in a network; For closeness centrality, it aims to find the individuals who are best placed to influence the entire network most quickly, the people who have high closeness value are good ‘broadcasters’, it would be to find influencers within a single cluster; For Eigenvector Centrality, it measures how well connected a node is, and how many links their connections have through the network.

In this case, I rank the size of the Nodes based on Betweenness value; colour of the Nodes based on Closeness value; size of the Label based on Eigenvector value. The initial layout is arranged according to Modularity classes, each separated cluster is one class, I used Force Atlas 2 layout, then manually adjust them together so can look clearer.

Rules:

The bigger circle the higher Betweenness value; The darker the green, the higher Closeness value; The bigger name the higher Eigenvector value.


Overall:

Overallq3.png

The Fig above shows the overall scenario of the organizational structure and communication channels. From the Fig we can see that Richard Fox has the highest value for all three measures, I assume his is the Head of the company.


Filter for highest Betweenness centrality. Those highlighted people are the “Bridges”, means the rest of suspicious people cannot connect to each other without those highlighted people’s existence.

High Betweenness:

Highbet.png


Filter for highest Closeness centrality: Those highlighted people are “big mouth”, they can spread out the information most efficiently.

High Closeness:

Highclo.png

Does the group composition change during the course of their activities?

For the interaction changes overtime, I used In Degree and Out Degree to measure, captured monthly changes respectively. It is to compare how the communication flow between one and another for the same. The Red circle shows In-Degree, Green shows Out-Degree. The line between dots indicates the type of activities. The gif below shows communication changes in 2015, in general Richard had most communication with Tobi and Lindsy group, as most dots appear within those 2 groups. In November, the frequency of cross group communication activities is very high, especially through Emails and Calls.

Overall, there were 4 purchase orders in these 3 years, and all the purchases are made to Gail Feint, suppose Gail Feint is the main person or supplier in charge of the purchase.

In Degree Out Degree Summary
2015 inD.gif
2015 outD.gif

The gif in the right shows communication changes in 2015, in general Richard had most communication with Tobi and Lindsy group, as most dots appear within those 2 groups. In November, the frequency of cross group communication activities is very high, especially through Emails and Calls.

2016 inD.gif
2016 outD.gif

In 2016, Richard Fox consistently sent and received messages from others this person never met anyone, always communicated through Emails and Calls. He had a meeting with Kerstin on May. On the other hand, Tobi frequently purchase with Gail Feint from Meryl’s group on November, as the activity line is strong, besides that, those 2 people never had any other communication. In general, the most common communication activities are Email and Calls.

2017 com.gif
2017 outD.gif

In 2017, in general, there are more meetings and purchases happened in this year, but overall communication through regular email and call is less, most of the intensive communication happened in August and September. There is a high-volume meeting between Meryl and Richard in August. For September, not only the meetings between Meryl and Rosalia are huge, but also the email communicated between Rosalia and Kerstin is huge, furthermore, Kerstin and Sherrel frequently call each other. For purchases, there were 2 times purchases happened in this year, one is happened in May, Meryl made a purchase from Gail Feint, another one is in November, Richard make a purchase from Gail Feint as well.


Question 4

What are other suspicious purchases? What is the pattern and structure?

From the 20-suspicious people communication pattern, we can see that before Rosalia made the final purchase, there are 2 information flow came in, one is from Meryl, where Richard Fox first met Meryl, Meryl then met Rosalia; another one is from Kerstin, where Lindsy first met Kerstin, after 1 month, Kerstin emailed Rosalia, then Rosalia emailed back, Rosalia made the purchase. As the communication detail information did shared, we could not tell who exactly told Rosalia about the purchase, it can be from Richard, or Kerstin, or Lindsy, or all of them include Rosalia herself.

Gif q2 v1.gif


However, relate to the large data table, 90% of the suspicious purchases happened in 2017 (include the Rosalia one from insider’s data), it clearly shows that 2017 is where is illicit events happened. As cross checked earlier, the Richard is the only one from that 20 has connection with other suspicious purchase people, but the connection is only one, which is with Laure Pelkey through call in March 2016, it is very far away from the intensive suspicious event happened period.

Visq4v1.png


Therefore, I will keep narrow done the unrelated people, focus on the “the most suspicious period” in 2017, which would be from June to December 2017, so to focus only the time period relate to suspicious activities happen, I remain the month from June to December, remove the Edge and Node that doesn’t contribute to this range, then sort by activity type to see how dense is the Purchase distribute in this period, find out that June and December have the most Purchases happen, then I continue narrow down.

Visq4v2.png


In June 2017

Overall communication is like this, by applying Yi Fan Hu layout, which can keep the high closeness people in the middle within each cluster itself, while keep the distance among other clusters.

There are 2 purchases happen in this month, and they all within the same cluster, from the same source, Laure Pelkey. One is from insider’s information, “SSP_” means Suspicious Purchase, Laure made purchase from Carlos, other one is from Fairy. I will emphasize on the suspicious one, take a closer investigation about their communication structure.

Visq4v3.png


In 20/06/2017, 9:41 PM, Jestine emailed Laure once, and Laure emailed Ethel twice at the same time. One day after, in 21/06/2017, 08:05 AM, Laure made the suspicious purchase to Carlos. The communication time among Jestine, Laure, Ethel is very close to the suspicious purchase, as 09:41 PM it is the off-office time, why would they email to each other after 9 pm, especially Laure, emailed 2 times.

Visq4v4.png


In December 2017

Applied same Yi Fan Hu layout after filtering December, there are 3 purchases in total within this month, and 2 out of 3 purchases happened between Beth and Gregory, also they are from the same cluster, made the suspicious purcahse. I will examine how the communication activity associate with this purchase.

Visq4v5.png


The purchase happened in 12/12/2017 mid night, in the same day, same hour, same minute, 10 seconds difference between email and purchase, the entire execution completed. So first, Delmy emailed Beth at 2:26:22 AM, 10 second later (yes, it is only 10 second time interval), Beth made 2 purchases to Gregory

Visq4v6.png