Fu Yi - Data Preparation

From Visual Analytics and Applications
Revision as of 15:26, 8 July 2018 by Yi.fu.2017 (talk | contribs)
Jump to navigation Jump to search

Covermn3.gif VAST MINI CHALLENGE 3 - Find out the suspiciousness

Introduction

Preparation

Visualization

Question Insights

References

 


Methodology & Visualization tools


Questions in Mini Challenging 3 contain mostly Network analysis and company operational trend analysis.

For Question 1 company operational trend, I used Tableau to visualize the scenarios. For Question 2 Network Analysis, I used Gephi to detect abnormal patterns.

Other tools: SAS JMP Pro and Excel supported for data cleaning and preparation, as well as initial exploration of the data.


Viztool.png

Data Preparation Question 1

a) Add titles Open 4 large tables (calls, emails, purchases, meetings) in Excel. Add title for each column (source, eType, target, time) for each of 4 tables.


b) Change date

Import tables to JMP, since the real time should start from 11/05/2015, 14:00. I created 2 new columns for 11/05/2015 and 14:00 respectively, and combine Old time, Date, Time of date together to get the correct date.

Q1prepDate.png

c) No duplication

Check summary of each table to eliminate the duplication.


d) Clear out incomplete month

The date starts from May,2015, however, the first 2 months have incomplete data. I delete the first 2 months data (May + June 2015) to make the dataset have a complete cycle. The description of final 4 tables:

   - Calls table: 10,091,409 rows
   - Emails table: 13,846,639 rows
   - Purchase table: 723,586 rows
   - Meetings table: 127,110 rows

Data Preparation Question 2

a) Prepare Suspicious Node files and Edge files

Edge:

- Add titles, concatenate tables, change to correct date and extract Year/Month/Week/Day

Open 4 suspicious tables (calls, emails, purchases, meetings) in Excel. Add titles for all tables, then import all tables into JMP, use Concatenate method to aggregate all tables as one file. Change time to correct datetime accordingly (same as Question 1)

The Day here is the month of the day, it is to drill down to see the changes day by day, so as to identify who are the people conducted the suspicious event, how the suspicious purchased made.

Q2prep1.png