Fu Yi - Data Preparation

From Visual Analytics and Applications
Revision as of 14:15, 8 July 2018 by Yi.fu.2017 (talk | contribs)
Jump to navigation Jump to search

Covermn3.gif VAST MINI CHALLENGE 3 - Find out the suspiciousness

Introduction

Preparation

Visualization

Question Insights

References

 


Data Preparation Question 1

a) Add titles Open 4 large tables (calls, emails, purchases, meetings) in Excel. Add title for each column (source, eType, target, time) for each of 4 tables.


b) Change date

Import tables to JMP, since the real time should start from 11/05/2015, 14:00. I created 2 new columns for 11/05/2015 and 14:00 respectively, and combine Old time, Date, Time of date together to get the correct date.

 ->  ->  


c) No duplication

Check summary of each table to eliminate the duplication.


d) Clear out incomplete month

The date starts from May,2015, however, the first 2 months have incomplete data. I delete the first 2 months data (May + June 2015) to make the dataset have a complete cycle. The description of final 4 tables:

- Calls table: 10,091,409 rows - Emails table: 13,846,639 rows - Purchase table: 723,586 rows - Meetings table: 127,110 rows