Difference between revisions of "ISS608 2017-18 T1 Assign KyonghwanKim Data Preparation"
Kh.kim.2016 (talk | contribs) |
Kh.kim.2016 (talk | contribs) |
||
Line 20: | Line 20: | ||
| style="font-family:Century Gothic; font-size:100%; solid #D8CBA4; background:#B4CE20; text-align:center;" width="16.67%" | | | style="font-family:Century Gothic; font-size:100%; solid #D8CBA4; background:#B4CE20; text-align:center;" width="16.67%" | | ||
; | ; | ||
− | [[ISS608_2017- | + | [[ISS608_2017-18_T1_Assign_KyonghwanKim_Solution| <font color="#000000">Solution</font>]] |
| style="font-family:Century Gothic; font-size:100%; solid #D8CBA4; background:#B4CE20; text-align:center;" width="16.67%" | | | style="font-family:Century Gothic; font-size:100%; solid #D8CBA4; background:#B4CE20; text-align:center;" width="16.67%" | |
Revision as of 17:58, 15 October 2017
Vastropolis Epidemic Report
|
|
|
|
|
|
Microblog
1. Data cleaning
2. Key Words
Some of key words are given. However, there may be additional key words to enhance accuracy of analysis.
Key word "flu" and "cold" are chosen as they are diagnosis words whereas other words are symptoms. Above graph shows the text distribution by "Date" that contains diagnosis words. Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer are shown below.
Therefore, following key words are chosen for analysis.
- Given: flu, fever, chill(s), sweat(s),
aches, pain(s), fatigue, cough(ing), breathing difficulty, nausea, vomit(ing), diarrhea, enlarged lymph nodes - Enhanced: cold, headache, sick, shortness of breath, declining health, hurts to move, aching muscles, sore throat, runny nose, problems breathing, pneumonia
3. Contagion Flag
Text containing above 23 words and phrases are chosen from dataset "Microblog_Final.csv".
Diagnosis Flag is text containing diagnosis words: "flu" and "cold". Symptom Flag is text containing at least 2 of all other key words apart from diagnosis words. Any text containing at least 1 of Diagnosis words or at least 2 of Symptom words are classified as Contagion Flag (20,466 rows) which is used for Visualization analysis. "Contagion_Flag.csv"