Difference between revisions of "ISS608 2017-18 T1 Assign KyonghwanKim Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 52: Line 52:
 
*There are 21 items with invalid time format. They are removed from analysis.<br/>
 
*There are 21 items with invalid time format. They are removed from analysis.<br/>
 
*Also, there are 6 items with Longitude outside of given map range. They are removed as well so that all data are within parameters.
 
*Also, there are 6 items with Longitude outside of given map range. They are removed as well so that all data are within parameters.
*Total 27 rows are removed and 1,023,050 rows are used for analysis with file name "Microblog_Final.csv".
+
*Total 27 rows are removed and 1,023,050 rows are used for analysis with file name '''''"Microblog_Final.csv"'''''.
 
|[[file:missing_time.png|150px]]    [[file:outlier_Longitude.png|300px]]
 
|[[file:missing_time.png|150px]]    [[file:outlier_Longitude.png|300px]]
 
|-
 
|-
Line 60: Line 60:
 
Some of key words are given. However, there may be additional key words to enhance accuracy of analysis. <br/>
 
Some of key words are given. However, there may be additional key words to enhance accuracy of analysis. <br/>
  
[[file:outbreak.png|500px]]
+
[[file:outbreak.png|400px]]
  
 
Above graph shows text distribution with key word ''"flu"'' and ''"cold"''. Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer is shown below.
 
Above graph shows text distribution with key word ''"flu"'' and ''"cold"''. Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer is shown below.
  
[[file:could.png|300px]]
+
[[file:cloud.png|300px]]
  
{| class="wikitable"
+
Therefore, following key are used for analysis.
|-
+
*Given: '''''flu''''', fever, chill(s), sweat(s), <span style="color: blue"><del>aches</del></span>, pain(s), fatigue, cough(ing), ''breathing difficulty'', nausea, vomit(ing), diarrhea, ''enlarged lymph nodes''
|style="text-align:center;" |'''Diagnosis'''
+
*Enhanced: '''''cold''''', <span style="color: blue">headache</span>, sick, ''shortness of breath'', ''declining health'', ''hurts to move'', ''aching muscles'', ''sore throat'', ''runny nose'', ''problems breathing'', pneumonia
|style="text-align:center;" |'''Symptoms'''
+
 
|-
+
==3. Contagion Flag==
|flu, cold
+
Text containing above 23 words and phrases are chosen from dataset '''''"Microblog_Final.csv"'''''.
|fever, chill, fatigue, cough, breath, nausea, vomit, diarrhea, sweat, pain, ''sore throat'', muscle, letharg (-y or -ic), ''runny nose'', doctor, sick
 
|-
 
|}
 

Revision as of 05:55, 15 October 2017

Title.png

Vastropolis Epidemic Report

Background

Data Preparation

Visualization

Answer

Reference

Feedback

 



Microblog

1. Data cleaning

Description Illustration
1. Split of Columns
  • Created_at column is splitted to Date and Time columns. Date column is used in other analytics.
  • Also, Location column is splitted to Latitude and Longitude columns. These data is used to plot in Vastropolis map.
Microblog split.png
2. Outliers
  • There are 21 items with invalid time format. They are removed from analysis.
  • Also, there are 6 items with Longitude outside of given map range. They are removed as well so that all data are within parameters.
  • Total 27 rows are removed and 1,023,050 rows are used for analysis with file name "Microblog_Final.csv".
Missing time.png Outlier Longitude.png

2. Key Words

Some of key words are given. However, there may be additional key words to enhance accuracy of analysis.

Outbreak.png

Above graph shows text distribution with key word "flu" and "cold". Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer is shown below.

Cloud.png

Therefore, following key are used for analysis.

  • Given: flu, fever, chill(s), sweat(s), aches, pain(s), fatigue, cough(ing), breathing difficulty, nausea, vomit(ing), diarrhea, enlarged lymph nodes
  • Enhanced: cold, headache, sick, shortness of breath, declining health, hurts to move, aching muscles, sore throat, runny nose, problems breathing, pneumonia

3. Contagion Flag

Text containing above 23 words and phrases are chosen from dataset "Microblog_Final.csv".