Difference between revisions of "ISS608 2017-18 T1 Assign KyonghwanKim Data Preparation"
Jump to navigation
Jump to search
Kh.kim.2016 (talk | contribs) |
Kh.kim.2016 (talk | contribs) |
||
Line 62: | Line 62: | ||
[[file:outbreak.png|400px]] | [[file:outbreak.png|400px]] | ||
− | Above graph shows text distribution | + | Key word ''"flu"'' and ''"cold"'' are chosen as they are diagnosis words. Above graph shows the text distribution by "Date" that contains ''"flu"'' and ''"cold"''. Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer are shown below. |
[[file:cloud.png|300px]] | [[file:cloud.png|300px]] | ||
− | Therefore, following key are used for analysis. | + | Therefore, following key words are used for analysis. |
*Given: '''''flu''''', fever, chill(s), sweat(s), <span style="color: blue"><del>aches</del></span>, pain(s), fatigue, cough(ing), ''breathing difficulty'', nausea, vomit(ing), diarrhea, ''enlarged lymph nodes'' | *Given: '''''flu''''', fever, chill(s), sweat(s), <span style="color: blue"><del>aches</del></span>, pain(s), fatigue, cough(ing), ''breathing difficulty'', nausea, vomit(ing), diarrhea, ''enlarged lymph nodes'' | ||
*Enhanced: '''''cold''''', <span style="color: blue">headache</span>, sick, ''shortness of breath'', ''declining health'', ''hurts to move'', ''aching muscles'', ''sore throat'', ''runny nose'', ''problems breathing'', pneumonia | *Enhanced: '''''cold''''', <span style="color: blue">headache</span>, sick, ''shortness of breath'', ''declining health'', ''hurts to move'', ''aching muscles'', ''sore throat'', ''runny nose'', ''problems breathing'', pneumonia | ||
Line 72: | Line 72: | ||
==3. Contagion Flag== | ==3. Contagion Flag== | ||
Text containing above 23 words and phrases are chosen from dataset '''''"Microblog_Final.csv"'''''. | Text containing above 23 words and phrases are chosen from dataset '''''"Microblog_Final.csv"'''''. | ||
+ | |||
+ | [[file:contagion_flag.png]] | ||
+ | |||
+ | '''Contagion Flag''' is sum of all above condition starting with "Contains". Any text with '''Contagion Flag''' bigger than 0 is subset to '''''Contagion.csv''''' file. |
Revision as of 06:48, 15 October 2017
Vastropolis Epidemic Report
|
|
|
|
|
|
Microblog
1. Data cleaning
2. Key Words
Some of key words are given. However, there may be additional key words to enhance accuracy of analysis.
Key word "flu" and "cold" are chosen as they are diagnosis words. Above graph shows the text distribution by "Date" that contains "flu" and "cold". Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer are shown below.
Therefore, following key words are used for analysis.
- Given: flu, fever, chill(s), sweat(s),
aches, pain(s), fatigue, cough(ing), breathing difficulty, nausea, vomit(ing), diarrhea, enlarged lymph nodes - Enhanced: cold, headache, sick, shortness of breath, declining health, hurts to move, aching muscles, sore throat, runny nose, problems breathing, pneumonia
3. Contagion Flag
Text containing above 23 words and phrases are chosen from dataset "Microblog_Final.csv".
Contagion Flag is sum of all above condition starting with "Contains". Any text with Contagion Flag bigger than 0 is subset to Contagion.csv file.