Difference between revisions of "ISS608 2017-18 T1 Assign KyonghwanKim Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 37: Line 37:
  
 
=Microblog=
 
=Microblog=
 +
==1. Data cleaning==
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 50: Line 51:
 
|'''2. Outliers'''<br/>
 
|'''2. Outliers'''<br/>
 
*There are 21 items with invalid time format. They are removed from analysis.<br/>
 
*There are 21 items with invalid time format. They are removed from analysis.<br/>
 +
 +
 +
 +
 +
 +
 
*Also, there are 6 items with Longitude outside of given map range. They are removed as well so that all data are within parameters.
 
*Also, there are 6 items with Longitude outside of given map range. They are removed as well so that all data are within parameters.
 +
*Total 27 rows are removed and 1,023,050 rows are used for analysis.
 
|[[file:missing_time.png]][[file:outlier_Longitude.png]]
 
|[[file:missing_time.png]][[file:outlier_Longitude.png]]
 
|-
 
|-
 
|}
 
|}

Revision as of 03:21, 15 October 2017

Title.png

Vastropolis Epidemic Report

Background

Data Preparation

Visualization

Answer

Reference

Feedback

 



Microblog

1. Data cleaning

Description Illustration
1. Split of Columns
  • Created_at column is splitted to Date and Time columns. Date column is used in other analytics.
  • Also, Location column is splitted to Latitude and Longitude columns. These data is used to plot in Vastropolis map.
Microblog split.png
2. Outliers
  • There are 21 items with invalid time format. They are removed from analysis.




  • Also, there are 6 items with Longitude outside of given map range. They are removed as well so that all data are within parameters.
  • Total 27 rows are removed and 1,023,050 rows are used for analysis.
Missing time.pngOutlier Longitude.png