Difference between revisions of "ISSS608 2017-18 T1 Assign MA XIAOLIU Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<div style=background:#2B3820 border:#A3BFB1>
 
<div style=background:#2B3820 border:#A3BFB1>
[[Image:Page.jpg|280px]]  
+
[[file:Timg.jpg|320px]]  
<font size = 5; color="#FFFFFF"> Smartpolis</font>
+
<font size = 5; color="#FFFFFF"> Research of Epidemic Spread in Smartpolis</font>
 
</div>
 
</div>
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
Line 7: Line 7:
 
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2B3820; text-align:center;" width="16.67%" |  
 
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2B3820; text-align:center;" width="16.67%" |  
 
;
 
;
[[ISSS608_2017-18_T1_Assign_MA XIAOLIU_Overview| <font color="#FFFFFF">Overview</font>]]
+
[[ISSS608_2017-18_T1_Assign_MA XIAOLIU| <font color="#FFFFFF">Overview</font>]]
  
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#B0620E; text-align:center;" width="16.67%" |  
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#B0620E; text-align:center;" width="16.67%" |  
Line 15: Line 15:
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3820; text-align:center;" width="16.67%" |  
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3820; text-align:center;" width="16.67%" |  
 
;
 
;
[[ISSS608_2017-18_T1_Assign_MA XIAOLIU_Origin and Epidemic Spread| <font color="#FFFFFF">Origin and Epidemic Spread</font>]]
+
[[ISSS608_2017-18_T1_Assign_MA XIAOLIU_Epidemic exploration| <font color="#FFFFFF">Epidemic exploration</font>]]
  
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3820; text-align:center;" width="16.67%" |  
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3820; text-align:center;" width="16.67%" |  
 
;
 
;
[[ISSS608_2017-18_T1_Assign_MA XIAOLIU_transmition| <font color="#FFFFFF">transmition</font>]]
+
[[ISSS608_2017-18_T1_Assign_MA XIAOLIU_Transmission and development| <font color="#FFFFFF">Transmission and Tendency</font>]]
 
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3820; text-align:center;" width="16.67%" |
 
;
 
[[ISSS608_2017-18_T1_Assign_MA XIAOLIU_Suggestion| <font color="#FFFFFF">Suggestion</font>]]
 
  
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3820; text-align:center;" width="16.67%" |  
 
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3820; text-align:center;" width="16.67%" |  
Line 30: Line 26:
  
 
|}
 
|}
<br/>
 
 
=Original data=
 
=Original data=
 
According to the overview, there are 3 kind of datasets, the data contents show below:
 
According to the overview, there are 3 kind of datasets, the data contents show below:
Line 44: Line 39:
 
|}
 
|}
  
=Find the useful microblogs=
+
=Data Preparation=
 +
===Data clean===
 +
Firstly,check the missing data pattern for 'Microblogs' in JMP. There are '''48''' rows missing the 'text' value. Remove the 48 rows. There are total '1023029' rows data.
 +
 
 +
[[file:Missingdata1.png|320px|center]]
 +
 
 +
===Find the useful microblogs===
 +
the microblogs are massive, besides, not all of them are connected to the illness. So the challenge is how to get the useful microblogs. What's more, how to get the target people through the microblogs.
 +
 
 
When we decide if the text is what we want, we need to the find the key words. For example, the words that related to this epidemic illness. In this case, Observed symptoms are largely flu­like and include fever, chills,sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes. As the disease continues to expand, there is a reasonable assumption that the these words which related to the symptoms will become more frequent. According to the symptom and description of the flu, I set some key words. If the text has the same words as key words, then it can be looked as the useful text.  
 
When we decide if the text is what we want, we need to the find the key words. For example, the words that related to this epidemic illness. In this case, Observed symptoms are largely flu­like and include fever, chills,sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes. As the disease continues to expand, there is a reasonable assumption that the these words which related to the symptoms will become more frequent. According to the symptom and description of the flu, I set some key words. If the text has the same words as key words, then it can be looked as the useful text.  
 +
 
'''Key word: 'flu','fever','chills','sweats','aches','pains','fatigue','coughing','breathing','nausea','vomiting','diarrhea','lymph','death''''
 
'''Key word: 'flu','fever','chills','sweats','aches','pains','fatigue','coughing','breathing','nausea','vomiting','diarrhea','lymph','death''''
''Note:There might be a question here that, most of the people are normal people,not the doctor or nurse, so they might not use the professional term but normal words. Then this method will loss many useful text. However, we still not sure the text which might about disease but not has key words is exactly related to this flulike illness. So this method is still reasonable, which can help to find more precise texts that fit the characteristics of the disease ''
 
I pick out the text, lower the words, remove the stop words and do stemming. Then if there re same words both in key word and text, the text is the target text we want
 
  
=other adjustments=
+
<small>''Note:There might be a question here that, most of the people are normal people,not the doctor or nurse, so they might not use the professional term but normal words. Then this method will loss many useful text. However, we still not sure the text which might about disease but not has key words is exactly related to this flulike illness. So this method is still reasonable, which can help to find more precise texts that fit the characteristics of the disease ''</small>
 +
 
 +
I use python pick out the text, lower the words, remove the stop words and do stemming. Then if there re same words both in key word and text, the text is the target text we want.
 +
 
 +
Python code:[[File:Text exploration.txt|thumbnail]]
 +
 
 +
 
 
===Location===
 
===Location===
 
Separated the location to longitude and latitude. Because the longitude in west, so I change the number to negative.
 
Separated the location to longitude and latitude. Because the longitude in west, so I change the number to negative.
 +
 
===Symptom===
 
===Symptom===
 
I also add another column which named ‘Symptom’ to find the keyword in the text. This can help to know more about the flu, like which is the initial symptom, and how will the symptom change. These all can be revealed from the text.
 
I also add another column which named ‘Symptom’ to find the keyword in the text. This can help to know more about the flu, like which is the initial symptom, and how will the symptom change. These all can be revealed from the text.
  
[[File:text_symptom.jpg|280px]]
+
[[File:text_symptom.jpg|520px|center]]
=Map=
 
1. Water Supply(blue) - Residents and businesses get their drinking water by pumping water from nearby reservoirs or rivers.  These distributed water systems are both public and privately owned.
 
2. Entertainment (yellow)– Vastopolis has two stadiums (Vastopolis Dome and Westside Stadium) for sports, concerts, and other events.  The various lakes and the Vast River, which flows south at a steady rate of three miles per hour, is used for water-based sports and recreation.
 
3. City Administration(green) – Vastopolis has several locations of significance including a state courthouse, a capitol building, convention center, and a large airport.
 
4. various hospital
 
  
[[File:map.jpg|280px]]
+
=Map Description=
 +
on the original map, there are many colors and icons which represent different places. Combining with the additional information, I adjust the color of the image, and use simple color to represent the buildings.
 +
 +
[[File:map.jpg|520px|center]]
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Function !! Represent color !! Discription
 +
|-
 +
| Water Supply || [[File:Green.png|60px|center]] || Residents and businesses get their drinking water by pumping water from nearby reservoirs or rivers.  These distributed water systems are both public and privately owned.
 +
|-
 +
| Entertainment || [[File:Yellow.png|60px|center]] || Vastopolis has two stadiums (Vastopolis Dome and Westside Stadium) for sports, concerts, and other events.  The various lakes and the Vast River, which flows south at a steady rate of three miles per hour, is used for water-based sports and recreation.
 +
|-
 +
| City Administration || [[File:Red.png|60px|center]] || Vastopolis has several locations of significance including a state courthouse, a capitol building, convention center, and a large airport.
 +
|-
 +
| various hospital || [[File:Blue.png|60px|center]] || different hospitals
 +
|}

Latest revision as of 00:29, 23 October 2017

Timg.jpg Research of Epidemic Spread in Smartpolis

Overview

Data Preparation

Epidemic exploration

Transmission and Tendency

Conclusion

Original data

According to the overview, there are 3 kind of datasets, the data contents show below:

Name Description
Microblogs contains the microblogs' contents, the location and the people's ID.
Population Total population and daytime population of 13 zones.
Weather the weather, wind direction and wind power.

Data Preparation

Data clean

Firstly,check the missing data pattern for 'Microblogs' in JMP. There are 48 rows missing the 'text' value. Remove the 48 rows. There are total '1023029' rows data.

Missingdata1.png

Find the useful microblogs

the microblogs are massive, besides, not all of them are connected to the illness. So the challenge is how to get the useful microblogs. What's more, how to get the target people through the microblogs.

When we decide if the text is what we want, we need to the find the key words. For example, the words that related to this epidemic illness. In this case, Observed symptoms are largely flu­like and include fever, chills,sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes. As the disease continues to expand, there is a reasonable assumption that the these words which related to the symptoms will become more frequent. According to the symptom and description of the flu, I set some key words. If the text has the same words as key words, then it can be looked as the useful text.

Key word: 'flu','fever','chills','sweats','aches','pains','fatigue','coughing','breathing','nausea','vomiting','diarrhea','lymph','death'

Note:There might be a question here that, most of the people are normal people,not the doctor or nurse, so they might not use the professional term but normal words. Then this method will loss many useful text. However, we still not sure the text which might about disease but not has key words is exactly related to this flulike illness. So this method is still reasonable, which can help to find more precise texts that fit the characteristics of the disease

I use python pick out the text, lower the words, remove the stop words and do stemming. Then if there re same words both in key word and text, the text is the target text we want.

Python code:File:Text exploration.txt


Location

Separated the location to longitude and latitude. Because the longitude in west, so I change the number to negative.

Symptom

I also add another column which named ‘Symptom’ to find the keyword in the text. This can help to know more about the flu, like which is the initial symptom, and how will the symptom change. These all can be revealed from the text.

Text symptom.jpg

Map Description

on the original map, there are many colors and icons which represent different places. Combining with the additional information, I adjust the color of the image, and use simple color to represent the buildings.

Map.jpg
Function Represent color Discription
Water Supply
Green.png
Residents and businesses get their drinking water by pumping water from nearby reservoirs or rivers. These distributed water systems are both public and privately owned.
Entertainment
Yellow.png
Vastopolis has two stadiums (Vastopolis Dome and Westside Stadium) for sports, concerts, and other events. The various lakes and the Vast River, which flows south at a steady rate of three miles per hour, is used for water-based sports and recreation.
City Administration
Red.png
Vastopolis has several locations of significance including a state courthouse, a capitol building, convention center, and a large airport.
various hospital
Blue.png
different hospitals