Difference between revisions of "ISS608 2017-18 T1 Assign KyonghwanKim Data Preparation"
Jump to navigation
Jump to search
Kh.kim.2016 (talk | contribs) |
Kh.kim.2016 (talk | contribs) |
||
Line 52: | Line 52: | ||
*There are 21 items with invalid time format. They are removed from analysis.<br/> | *There are 21 items with invalid time format. They are removed from analysis.<br/> | ||
*Also, there are 6 items with Longitude outside of given map range. They are removed as well so that all data are within parameters. | *Also, there are 6 items with Longitude outside of given map range. They are removed as well so that all data are within parameters. | ||
− | *Total 27 rows are removed and 1,023,050 rows are used for analysis with file name "Microblog_Final.csv". | + | *Total 27 rows are removed and 1,023,050 rows are used for analysis with file name '''''"Microblog_Final.csv"'''''. |
|[[file:missing_time.png|150px]] [[file:outlier_Longitude.png|300px]] | |[[file:missing_time.png|150px]] [[file:outlier_Longitude.png|300px]] | ||
|- | |- | ||
Line 60: | Line 60: | ||
Some of key words are given. However, there may be additional key words to enhance accuracy of analysis. <br/> | Some of key words are given. However, there may be additional key words to enhance accuracy of analysis. <br/> | ||
− | [[file:outbreak.png| | + | [[file:outbreak.png|400px]] |
Above graph shows text distribution with key word ''"flu"'' and ''"cold"''. Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer is shown below. | Above graph shows text distribution with key word ''"flu"'' and ''"cold"''. Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer is shown below. | ||
− | [[file: | + | [[file:cloud.png|300px]] |
− | + | Therefore, following key are used for analysis. | |
− | + | *Given: '''''flu''''', fever, chill(s), sweat(s), <span style="color: blue"><del>aches</del></span>, pain(s), fatigue, cough(ing), ''breathing difficulty'', nausea, vomit(ing), diarrhea, ''enlarged lymph nodes'' | |
− | + | *Enhanced: '''''cold''''', <span style="color: blue">headache</span>, sick, ''shortness of breath'', ''declining health'', ''hurts to move'', ''aching muscles'', ''sore throat'', ''runny nose'', ''problems breathing'', pneumonia | |
− | + | ||
− | + | ==3. Contagion Flag== | |
− | + | Text containing above 23 words and phrases are chosen from dataset '''''"Microblog_Final.csv"'''''. | |
− | |||
− | |||
− |
Revision as of 05:55, 15 October 2017
Vastropolis Epidemic Report
|
|
|
|
|
|
Microblog
1. Data cleaning
2. Key Words
Some of key words are given. However, there may be additional key words to enhance accuracy of analysis.
Above graph shows text distribution with key word "flu" and "cold". Text traffic shoots up from May 18th and remain high until 20th. Word Cloud during 3 days of outbreak using JMP text explorer is shown below.
Therefore, following key are used for analysis.
- Given: flu, fever, chill(s), sweat(s),
aches, pain(s), fatigue, cough(ing), breathing difficulty, nausea, vomit(ing), diarrhea, enlarged lymph nodes - Enhanced: cold, headache, sick, shortness of breath, declining health, hurts to move, aching muscles, sore throat, runny nose, problems breathing, pneumonia
3. Contagion Flag
Text containing above 23 words and phrases are chosen from dataset "Microblog_Final.csv".