Difference between revisions of "ISSS608 2017-18 T3 Assign Pooja Manohar Sawant Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 76: Line 76:
 
*Tableau – To visualize how communication and purchasing patterns are evolved over a period of 2 and half years in Kasios international
 
*Tableau – To visualize how communication and purchasing patterns are evolved over a period of 2 and half years in Kasios international
 
*Gephi – To visualize how suspected employees of Kasios International are connected with each other and with other employees in the organization
 
*Gephi – To visualize how suspected employees of Kasios International are connected with each other and with other employees in the organization
 +
 +
 +
== Data Preparation ==
 +
=== JMP Pro ===
 +
#For 4 organizational level files, imported files to JMP and renamed 4 columns as Source, Etype, Destination, TimeStamp (as input CSV files did not have column names). Also, changed the column's data types.
 +
#Created 2 new columns -
 +
##a. Start Date – with the single value - May 11, 2015: 14:00
 +
##b. Event Date – calculated with the formula -
 +
  Start Date +  TimeStamp
 +
 +
[[File:Data_Prep_jmp1.JPG|500px|center|]]
 +
 +
<div style="float:center;text-align:center;font-size:12px;"> <u>''transformed Columns in JMP Pro''</u> </div>

Revision as of 12:25, 8 July 2018

MC3 2018.jpg Detecting Suspicious Activities at Kasios International

BACKGROUND

DATA PREPARATION

METHODOLOGY AND ANALYSIS

OBSERVATIONS AND INSIGHTS

Back to Dropbox

 


About the data

Data provided by the insider has 10 CSV files as below.

Index File name Number of Rows Description
1 calls.csv 10,606,835 Call details for entire organization
2 email.csv 14,550,085 Email details for entire organization
3 meeting.csv 127,351 Meeting details for entire organization
4 purchaces.csv 762,200 Purchases details for entire organization
5 CompanyIndex.csv 642,631 Company employee ID and Name list
6 Suspicious_calls.csv 70 Calls details involving suspicious group of people within an organization
7 Suspicious_emails.csv 61 Email details involving suspicious group of people within an organization
8 Suspicious_meetings.csv 1 Meeting details involving suspicious group of people within an organization
9 Suspicious_purchases.csv 5 Purchases details involving suspicious group of people within an organization
10 Other_suspicious_purchases.csv 7 A list of 4 individuals who made 7 suspicious purchases
Table 1 - Details of input files


All above files except CompanyIndex.csv, have below four fields -

  1. Source (contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting)
  2. Etype (contains a number designating what kind of connection is made)
    1. 0 is for calls
    2. 1 is for emails
    3. 2 is for purchases
    4. 3 is for meetings
  3. Destination (contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being
  4. Time stamp – in seconds starting on May 11, 2015 at 14:00.


Tools Used for data preparation and Visualization

  • JMP Pro – For data cleaning and transformation
  • Tableau – To visualize how communication and purchasing patterns are evolved over a period of 2 and half years in Kasios international
  • Gephi – To visualize how suspected employees of Kasios International are connected with each other and with other employees in the organization


Data Preparation

JMP Pro

  1. For 4 organizational level files, imported files to JMP and renamed 4 columns as Source, Etype, Destination, TimeStamp (as input CSV files did not have column names). Also, changed the column's data types.
  2. Created 2 new columns -
    1. a. Start Date – with the single value - May 11, 2015: 14:00
    2. b. Event Date – calculated with the formula -
 Start Date +  TimeStamp
Data Prep jmp1.JPG
transformed Columns in JMP Pro