ISSS608 2017-18 T3 Assign Aakanksha Kumari Data Preparation

From Visual Analytics and Applications
Revision as of 13:10, 8 July 2018 by Aakankshak.2017 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Classified-stamp.png

Unraveling the Secrets of Kasios : VAST Mini Challenge 3

Overview

Data Preparation

Question 1

Question 2

Question 3

Question 4

Dropbox

 

Data Set

The Kasios Insider has provided data from across the company. There are call records, emails, purchases, and meetings. The data only includes the source of each transaction, the recipient (destination), and the time of the transaction. Contents of emails or phone calls are not available.

Dataset Description Size
calls.csv Information on 10.6 million calls 251 MB uncompressed
emails.csv Information on 14.6 million emails 345 MB uncompressed
purchases.csv Information on 762 thousand purchases 18.8 MB uncompressed
meetings.csv Information on 127 thousand meetings 3.26 MB uncompressed


There are four data files that contain information about individuals that the Insider has indicated as suspicious:

Dataset Description Size
Suspicious_calls.csv Information on suspicious calls 1.76 KB uncompressed
Suspicious_emails.csv Information on suspicious emails 1.55 KB uncompressed
Suspicious_purchases.csv Information on suspicious purchases 27 B uncompressed
Suspicious_meetings.csv Information on suspicious meetings 130 B uncompressed
Other_suspicious_purchases.csv list of 4 individuals who made 7 suspicious purchases (For Question 4) 378 B uncompressed


All provided data files have the same format. The data are provided in comma-separated format with four columns:

Column Name Description
Source Contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting
Etype Contains a number designating what kind of connection is made

a. 0 is for calls b. 1 is for emails c. 2 is for purchases d. 3 is for meetings

Destination Information on suspicious purchases
Suspicious_meetings.csv Contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being invited to a meeting
Time stamp In seconds starting on May 11, 2015 at 14:00.

Tools

  • Python for Data Cleaning
  • Excel for Data Cleaning
  • Gephi for Network Visualization
  • Tableau Desktop for Visualization


Data Cleaning

Converting the Time in all the CSV’s from seconds to the standard format and baselining the time w.r.t May 11, 2015 at 14:00. Using Python date-time and panda’s library the relative date-time was converted to an absolute date-time.