Difference between revisions of "ISSS608 2017-18 T1 Assign XING SIYUAN Data Preparation"
(Created page with "<div style=background:#FFC0CB border:#A3BFB1> 165px <b><font size = 5; color="#8B0000"> Epidemic Spread in Smartpolis - Origin and Transmission </font></b...") |
|||
Line 11: | Line 11: | ||
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#FFC0CB; text-align:center;" width="25%" | | | style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#FFC0CB; text-align:center;" width="25%" | | ||
; | ; | ||
− | [[ISSS608_2017-18_T1_Assign_XING_SIYUAN_Data_Preparation|<b><font size="2"><font color="#8B0000">Data Preparation</font></font></b>]] | + | [[ISSS608_2017-18_T1_Assign_XING_SIYUAN_Data_Preparation|<b><font size="2"><font color="#8B0000">Data Preparation & Dashboard Design</font></font></b>]] |
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#FFC0CB; text-align:center;" width="25%" | | | style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#FFC0CB; text-align:center;" width="25%" | | ||
Line 46: | Line 46: | ||
<tr> | <tr> | ||
− | <td><b> 2.Identify infected | + | <td><b> 2.Identify infected patients </b> |
<br>Tools: Tableau | <br>Tools: Tableau | ||
− | <br>From the | + | <br>By loading the cleaned data into Tableau, we can draw a heat map to visualize the macroblog density per day in each location. From the heatmap of number of macroblogs, we know that there is a huge increase in the number of macroblogs posted on 19th & 20th of May. There must be some major events that caused the increase of macroblogs. |
− | <br> | + | <br>From the locations of macroblogs posted on 19th and 20th of May (as shown in the left figue), it is obvious that there is a high density of macroblogs around the hospital of Smartpolis (highlighted with black square). Which means these posts has a high possibility that is being posted by people who has been infected by the epidemic. By investigate what has those people posted and where has those people been to in the last few day can help us find where the outbreak started, how the infection is being transmitted and measure whether the outbreak is contained or not. |
− | <br> | + | <br>Select the macroblogs on map where the location is around the hospitals. Group user_id of these posts and create a set named patients. Extract a csv file that contains ID of all the people in patients set. |
+ | </td> | ||
+ | <td> | ||
+ | Heatmap of Number of Macroblogs by days: | ||
+ | [[File:SY_num_dis.png|200px|center]] | ||
+ | Macroblogs distribution in the last day: | ||
+ | [[File:SY_patients.png|500px|center]] | ||
</td> | </td> | ||
− | |||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td><b> 3. | + | <td><b> 3.Identify Symptom of Infected Patients </b> |
− | <br> | + | <br>Tools: JMP |
+ | <br>Load patients ID file into JMP and join it with Macroblogs table. With text explorer of JMP, the top mentioned words and phrase posted by infected people are generated (left top figure). By filtering the words that are related to symptoms of the epidemic, we can tell that most patients were suffering a fever, cough, headache, diarrhea, vomit, sore throat, aching muscles, runny nose, difficulty in breath and so on. | ||
+ | <br>By further investigating in the symptoms, it seems that the symptoms can be clustered into two categories, one related to gastrointestinal discomfort, the other related to inhalation discomfort. Hence, it is possible that the epidemic contains two type of diseases and may has two origins and multiple transmission methods. We chose 7 words from inhalation symptoms and 4 from gastrointestinal symptoms (shown on the left middle table) to identify origin and transmission method of the epidemic. | ||
+ | <br> Create 11 columns with col_name of the 11 words selected, check if the text in each row contains the corresponding words, if yes, out put 1. If no, output 0. Formula: | ||
+ | [[File:SY_formula.png|150px|left]] | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | <br>Tools: Tableau | ||
+ | <br> | ||
+ | </td> | ||
+ | <td> | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Data integration !! Text Explorer | ||
+ | |- | ||
+ | | [[File:SY_p_m.png|250px|center]] || [[File:SY_words.png|250px|center]] | ||
+ | |} | ||
+ | Words table: | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Symptom Type !! Words | ||
+ | |- | ||
+ | | Inhalation || <b>chill, flu, sore throat, breath, pneumonia, fever, cough</b> | ||
+ | |- | ||
+ | | Gastrointestinal || <b>stomachache, diarrhea, vomit, nausea</b> | ||
+ | |} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
</td> | </td> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
</tr> | </tr> | ||
<tr> | <tr> | ||
− | <td><b> 4. | + | <td><b> 4.Identify Major Events in Smartpolis </b> |
− | <br> | + | <br> |
− | |||
− | |||
− | |||
</td> | </td> | ||
<td>[[File:gyf_m_4.png|500px|center]]</td> | <td>[[File:gyf_m_4.png|500px|center]]</td> |
Revision as of 21:44, 15 October 2017
|
|
|
|
Data Preparation
Description | Illustration | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1.Data Cleaning
|
|||||||||||
2.Identify infected patients
|
Heatmap of Number of Macroblogs by days: Macroblogs distribution in the last day: |
||||||||||
3.Identify Symptom of Infected Patients
|
Words table:
|
||||||||||
4.Identify Major Events in Smartpolis
|
|||||||||||
5.Overall visualization design concepts
|