ISS608 2017-18 T1 Assign KyonghwanKim Data Preparation
Revision as of 06:48, 15 October 2017 by Kh.kim.2016 (talk | contribs)
Vastropolis Epidemic Report
|
|
|
|
|
|
Microblog
1. Data cleaning
2. Key Words
Some of key words are given. However, there may be additional key words to enhance accuracy of analysis.
Key word "flu" and "cold" are chosen as they are diagnosis words. Above graph shows the text distribution by "Date" that contains "flu" and "cold". Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer are shown below.
Therefore, following key words are used for analysis.
- Given: flu, fever, chill(s), sweat(s),
aches, pain(s), fatigue, cough(ing), breathing difficulty, nausea, vomit(ing), diarrhea, enlarged lymph nodes - Enhanced: cold, headache, sick, shortness of breath, declining health, hurts to move, aching muscles, sore throat, runny nose, problems breathing, pneumonia
3. Contagion Flag
Text containing above 23 words and phrases are chosen from dataset "Microblog_Final.csv".
Contagion Flag is sum of all above condition starting with "Contains". Any text with Contagion Flag bigger than 0 is subset to Contagion.csv file.