Difference between revisions of "ISSS608 2017-18 T3 Assign Aakanksha Kumari Data Preparation"
(25 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | [[File:Classified. | + | [[File:Classified-stamp.png|frameless|center]] |
− | <div style="background:# | + | <div style="background:#DC143C; border:#DC143C; padding-left:15px; text-align:center;"> |
− | <font size = 5; color="#FFFFFF"><span style="font-family:Century Gothic;"> | + | <font size = 5; color="#FFFFFF"><span style="font-family:Century Gothic;">Unraveling the Secrets of Kasios : VAST Mini Challenge 3</span></font> |
</div> | </div> | ||
<!--MAIN HEADER --> | <!--MAIN HEADER --> | ||
− | {|style="background-color:# | + | {|style="background-color:#DC143C;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" | |
− | | style="font-family:Century Gothic; font-size:100%; solid # | + | | style="font-family:Century Gothic; font-size:100%; solid #DC143C; background:#DC143C; text-align:center;" width="14.3%" | |
; | ; | ||
[[ISSS608 2017-18 T3 Assign Aakanksha Kumari| <font color="#FFFFFF">Overview</font>]] | [[ISSS608 2017-18 T3 Assign Aakanksha Kumari| <font color="#FFFFFF">Overview</font>]] | ||
− | | style="font-family:Century Gothic; font-size:100%; solid # | + | | style="font-family:Century Gothic; font-size:100%; solid #DC143C; background:#DC143C; text-align:center;" width="14.3%" | |
; | ; | ||
[[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Data_Preparation| <font color="#FFFFFF">Data Preparation</font>]] | [[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Data_Preparation| <font color="#FFFFFF">Data Preparation</font>]] | ||
− | | style="font-family:Century Gothic; font-size:100%; solid # | + | | style="font-family:Century Gothic; font-size:100%; solid #DC143C; background:#DC143C; text-align:center;" width="14.3%" | |
; | ; | ||
[[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Q1| <font color="#FFFFFF">Question 1</font>]] | [[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Q1| <font color="#FFFFFF">Question 1</font>]] | ||
− | | style="font-family:Century Gothic; font-size:100%; solid # | + | | style="font-family:Century Gothic; font-size:100%; solid #DC143C; background:#DC143C; text-align:center;" width="14.3%" | |
; | ; | ||
[[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Q2| <font color="#FFFFFF">Question 2</font>]] | [[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Q2| <font color="#FFFFFF">Question 2</font>]] | ||
− | | style="font-family:Century Gothic; font-size:100%; solid # | + | | style="font-family:Century Gothic; font-size:100%; solid #DC143C; background:#DC143C; text-align:center;" width="14.3%" | |
; | ; | ||
[[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Q3| <font color="#FFFFFF">Question 3</font>]] | [[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Q3| <font color="#FFFFFF">Question 3</font>]] | ||
− | | style="font-family:Century Gothic; font-size:100%; solid # | + | | style="font-family:Century Gothic; font-size:100%; solid #DC143C; background:#DC143C; text-align:center;" width="14.3%" | |
; | ; | ||
[[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Q4| <font color="#FFFFFF">Question 4</font>]] | [[ISSS608 2017-18 T3 Assign Aakanksha Kumari_Q4| <font color="#FFFFFF">Question 4</font>]] | ||
− | |||
− | |||
− | |||
− | |||
− | | style="font-family:Century Gothic; font-size:100%; solid # | + | | style="font-family:Century Gothic; font-size:100%; solid #DC143C; background:#DC143C; text-align:center;" width="14.3%" | |
; | ; | ||
[[Assignment_Dropbox_G1| <font color="#FFFFFF">Dropbox</font>]] | [[Assignment_Dropbox_G1| <font color="#FFFFFF">Dropbox</font>]] | ||
Line 41: | Line 37: | ||
|} | |} | ||
− | + | == '''Data Set''' == | |
+ | {| class="wikitable" | ||
+ | |- | ||
+ | | <div style="font-family:Palatino Linotype; border-radius: 1px;"> | ||
+ | |||
+ | <big>The Kasios Insider has provided data from across the company. There are call records, emails, purchases, and meetings. The data only includes the source of each transaction, the recipient (destination), and the time of the transaction. Contents of emails or phone calls are not available. | ||
+ | {| class="wikitable style="margin: auto;" | ||
+ | |- | ||
+ | ! Dataset !! Description !! Size | ||
+ | |- | ||
+ | | calls.csv|| Information on 10.6 million calls || 251 MB uncompressed | ||
+ | |- | ||
+ | | emails.csv || Information on 14.6 million emails|| 345 MB uncompressed | ||
+ | |- | ||
+ | | purchases.csv || Information on 762 thousand purchases|| 18.8 MB uncompressed | ||
+ | |- | ||
+ | | meetings.csv|| Information on 127 thousand meetings || 3.26 MB uncompressed | ||
+ | |} | ||
+ | </big> </div> | ||
+ | |||
+ | |||
+ | <div style="font-family:Palatino Linotype; border-radius: 1px "> <big> | ||
+ | There are four data files that contain information about individuals that the Insider has indicated as suspicious: | ||
+ | |||
+ | {| class="wikitable style="margin: auto;" | ||
+ | |- | ||
+ | ! Dataset !! Description !! Size | ||
+ | |- | ||
+ | | Suspicious_calls.csv|| Information on suspicious calls || 1.76 KB uncompressed | ||
+ | |- | ||
+ | | Suspicious_emails.csv || Information on suspicious emails|| 1.55 KB uncompressed | ||
+ | |- | ||
+ | | Suspicious_purchases.csv || Information on suspicious purchases|| 27 B uncompressed | ||
+ | |- | ||
+ | | Suspicious_meetings.csv|| Information on suspicious meetings || 130 B uncompressed | ||
+ | |- | ||
+ | | Other_suspicious_purchases.csv|| list of 4 individuals who made 7 suspicious purchases (For Question 4) || 378 B uncompressed | ||
+ | |} | ||
+ | </big></div> | ||
+ | |||
+ | |||
+ | <div style="font-family:Palatino Linotype; border-radius: 1px "> <big> | ||
+ | All provided data files have the same format. The data are provided in comma-separated format with four columns: | ||
+ | |||
+ | {| class="wikitable style="margin: auto;" | ||
+ | |- | ||
+ | ! Column Name!! Description | ||
+ | |- | ||
+ | | Source|| Contains the company ID# for the person who called, sent an email, purchased something, or invited people to a meeting | ||
+ | |- | ||
+ | | Etype || Contains a number designating what kind of connection is made | ||
+ | a. 0 is for calls | ||
+ | b. 1 is for emails | ||
+ | c. 2 is for purchases | ||
+ | d. 3 is for meetings | ||
+ | |||
+ | |- | ||
+ | | Destination || Information on suspicious purchases | ||
+ | |- | ||
+ | | Suspicious_meetings.csv|| Contains company ID# for the person who is receiving a call, receiving an email, selling something to a buyer, or being invited to a meeting | ||
+ | |- | ||
+ | | Time stamp|| In seconds starting on May 11, 2015 at 14:00. | ||
+ | |} | ||
+ | |||
+ | </big></div> | ||
+ | |||
+ | |} | ||
+ | |||
+ | == '''Tools''' == | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | | <div style="font-family:Palatino Linotype; border-radius: 1px "> <big> | ||
+ | * Python for Data Cleaning | ||
+ | * Excel for Data Cleaning | ||
+ | * Gephi for Network Visualization | ||
+ | * Tableau Desktop for Visualization </big> </div> | ||
+ | |} | ||
+ | |||
+ | |||
+ | == '''Data Cleaning''' == | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | | <div style="font-family:Palatino Linotype; border-radius: 1px "> <big> | ||
+ | Converting the Time in all the CSV’s from seconds to the standard format and baselining the time w.r.t May 11, 2015 at 14:00. | ||
+ | Using Python date-time and panda’s library the relative date-time was converted to an absolute date-time. | ||
− | + | ||
+ | </big> </div> | ||
+ | |} |
Latest revision as of 13:10, 8 July 2018
Unraveling the Secrets of Kasios : VAST Mini Challenge 3
|
|
|
|
|
|
|
Data Set
The Kasios Insider has provided data from across the company. There are call records, emails, purchases, and meetings. The data only includes the source of each transaction, the recipient (destination), and the time of the transaction. Contents of emails or phone calls are not available.
There are four data files that contain information about individuals that the Insider has indicated as suspicious:
All provided data files have the same format. The data are provided in comma-separated format with four columns:
|
Tools
|
Data Cleaning
Converting the Time in all the CSV’s from seconds to the standard format and baselining the time w.r.t May 11, 2015 at 14:00. Using Python date-time and panda’s library the relative date-time was converted to an absolute date-time.
|