From Visual Analytics and Applications
Jump to navigation
Jump to search
Unraveling the Secrets of Kasios : VAST Mini Challenge 3
Data Set
The Kasios Insider has provided data from across the company. There are call records, emails, purchases, and meetings. The data only includes the source of each transaction, the recipient (destination), and the time of the transaction. Contents of emails or phone calls are not available.
Dataset |
Description |
Size
|
calls.csv |
Information on 10.6 million calls |
251 MB uncompressed
|
emails.csv |
Information on 14.6 million emails |
345 MB uncompressed
|
purchases.csv |
Information on 762 thousand purchases |
18.8 MB uncompressed
|
meetings.csv |
Information on 127 thousand meetings |
3.26 MB uncompressed
|
There are four data files that contain information about individuals that the Insider has indicated as suspicious:
Dataset |
Description |
Size
|
Suspicious_calls.csv |
Information on suspicious calls |
1.76 KB uncompressed
|
Suspicious_emails.csv |
Information on suspicious emails |
1.55 KB uncompressed
|
Suspicious_purchases.csv |
Information on suspicious purchases |
27 B uncompressed
|
Suspicious_meetings.csv |
Information on suspicious meetings |
130 B uncompressed
|
Other_suspicious_purchases.csv |
list of 4 individuals who made 7 suspicious purchases (For Question 4) |
378 B uncompressed
|
All provided data files have the same format. The data are provided in comma-separated format with four columns:
Column Name |
Description
|
Source |
Contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting
|
Etype |
Contains a number designating what kind of connection is made
a. 0 is for calls
b. 1 is for emails
c. 2 is for purchases
d. 3 is for meetings
|
Destination |
Information on suspicious purchases
|
Suspicious_meetings.csv |
Contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being invited to a meeting
|
Time stamp |
In seconds starting on May 11, 2015 at 14:00.
|
|
Tools
- Python for Data Cleaning
- Excel for Data Cleaning
- Gephi for Network Visualization
- Tableau Desktop for Visualization
|
Data Cleaning
Converting the Time in all the CSV’s from seconds to the standard format and baselining the time w.r.t May 11, 2015 at 14:00.
Using Python date-time and panda’s library the relative date-time was converted to an absolute date-time.
|