Difference between revisions of "Fu Yi - Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
(Created page with "<div style=background:#2b3856 border:#A3BFB1> 150px <font size = 5; color="#FFFFFF"> VAST MINI CHALLENGE 3 - Find out the suspiciousness</font> </div...")
 
Line 31: Line 31:
  
 
<div style="background:#2b3856; border:#002060; padding-left:15px; text-align:left;">   
 
<div style="background:#2b3856; border:#002060; padding-left:15px; text-align:left;">   
<font size = 4; color="#FFFFFF"><span style="font-family:Century Gothic;">Gephi Data Preparation</span></font>   
+
<font size = 4; color="#FFFFFF"><span style="font-family:Century Gothic;">Data Preparation Question 1</span></font>   
 
</div>
 
</div>
 +
 +
a) Add titles
 +
Open 4 large tables (calls, emails, purchases, meetings) in Excel. Add title for each column (source, eType, target, time) for each of 4 tables.
 +
 +
 +
b) Change date
 +
 +
Import tables to JMP, since the real time should start from 11/05/2015, 14:00. I created 2 new columns for 11/05/2015 and 14:00 respectively, and combine Old time, Date, Time of date together to get the correct date.
 +
  ->  -> 
 +
 +
 +
c) No duplication
 +
 +
Check summary of each table to eliminate the duplication.
 +
 +
 +
d) Clear out incomplete month
 +
 +
The date starts from May,2015, however, the first 2 months have incomplete data. I delete the first 2 months data (May + June 2015) to make the dataset have a complete cycle.
 +
The description of final 4 tables:
 +
 +
- Calls table: 10,091,409 rows
 +
- Emails table: 13,846,639 rows
 +
- Purchase table: 723,586 rows
 +
- Meetings table: 127,110 rows

Revision as of 14:15, 8 July 2018

Covermn3.gif VAST MINI CHALLENGE 3 - Find out the suspiciousness

Introduction

Preparation

Visualization

Question Insights

References

 


Data Preparation Question 1

a) Add titles Open 4 large tables (calls, emails, purchases, meetings) in Excel. Add title for each column (source, eType, target, time) for each of 4 tables.


b) Change date

Import tables to JMP, since the real time should start from 11/05/2015, 14:00. I created 2 new columns for 11/05/2015 and 14:00 respectively, and combine Old time, Date, Time of date together to get the correct date.

 ->  ->  


c) No duplication

Check summary of each table to eliminate the duplication.


d) Clear out incomplete month

The date starts from May,2015, however, the first 2 months have incomplete data. I delete the first 2 months data (May + June 2015) to make the dataset have a complete cycle. The description of final 4 tables:

- Calls table: 10,091,409 rows - Emails table: 13,846,639 rows - Purchase table: 723,586 rows - Meetings table: 127,110 rows