ISSS608 2017-18 T3 Assign Manu George Mathew Data Preparation

From Visual Analytics and Applications
Revision as of 17:36, 8 July 2018 by Mgmathew.2017 (talk | contribs) (Created page with "<div style=background:#ae946e border:#ae946e> 165px <b><font size = 6; color="#FFFFFF"> VAST Challenge 2018:Suspense at the Wildlife Preserve </f...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

MGM Title Band.jpg VAST Challenge 2018:Suspense at the Wildlife Preserve

Background

Data Preparation

Question 1

Question 2

Question 3

Conclusion

Back to Dropbox

 


Data Preparation

Given Data Description

We are provided data OF PAST 2.5 YEARS from across the company. There are call records, emails, purchases, and meetings. The data only includes the source of each transaction, the recipient (destination), and the time of the transaction. Contents of emails or phone calls are not available. All of the provided data files have the same format.

The data are provided in comma-separated format with four columns:

  1. Source (contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting)
  2. Etype (contains a number designating what kind of connection is made)
    1. 0 is for calls
    2. 1 is for emails
    3. 2 is for purchases
    4. 3 is for meetings
  3. Destination (contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being invited to a meeting).
  4. Time stamp – in seconds starting on May 11, 2015 at 14:00.

There is a company index that shows the name of everyone in the company and their associated ID#. There are 642,631 individuals in the index.

There are four data files that cover the whole company:

  • calls.csv has information on 10.6 million calls (251 MB uncompressed)
  • emails.csv has information on 14.6 million emails (345 MB uncompressed)
  • purchases.csv has information on 762 thousand purchases (18.8 MB uncompressed)
  • meetings.csv has information on 127 thousand meetings (3.26 MB uncompressed)

There are four data files that contain information about individuals that the Insider has indicated as suspicious:

  • Suspicious_calls.csv (1.76 KB uncompressed)
  • Suspicious_emails.csv (1.55 KB uncompressed)
  • Suspicious_purchases.csv (27 B uncompressed)
  • Suspicious_meetings.csv (130 B uncompressed)


We are also provided with a list of 20 people that the insider finds suspicious.They are,

Alex Hall, Lizbeth Jindra, Patrick Lane, Richard Fox, Sara Ballard, May Burton, Glen Grant, Dylan Ballard, Meryl Pastuch, Melita Scarpaci, Augusta Sharp, Kerstin Belveal, Rosalia Larroque, Lindsy Henion, Julie Tierno, Jose Ringwald, Ramiro Gault, Tobi Gatlin, Refugio Orrantia, and Jenice Savaria.

Steps followed Data Preparation