Difference between revisions of "ISSS608 2017-18 T1 Assign MA XIAOLIU Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 16: Line 16:
 
When we decide if the text is what we want, we need to the find the key words. For example, the words that related to this epidemic illness. In this case, Observed symptoms are largely flu­like and include fever, chills,sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes. As the disease continues to expand, there is a reasonable assumption that the these words which related to the symptoms will become more frequent. According to the symptom and description of the flu, I set some key words. If the text has the same words as key words, then it can be looked as the useful text.  
 
When we decide if the text is what we want, we need to the find the key words. For example, the words that related to this epidemic illness. In this case, Observed symptoms are largely flu­like and include fever, chills,sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes. As the disease continues to expand, there is a reasonable assumption that the these words which related to the symptoms will become more frequent. According to the symptom and description of the flu, I set some key words. If the text has the same words as key words, then it can be looked as the useful text.  
 
'''Key word: 'flu','fever','chills','sweats','aches','pains','fatigue','coughing','breathing','nausea','vomiting','diarrhea','lymph','death''''
 
'''Key word: 'flu','fever','chills','sweats','aches','pains','fatigue','coughing','breathing','nausea','vomiting','diarrhea','lymph','death''''
''Note:There might be a question here that, most of the people are normal people,not the doctor or nurse, so they might not use the professional term but normal words. Then this method will loss many useful text. However, we still not sure the text which might about disease but not has key words is exactly related to this flulike illness. So this method is still reasonable, which can help to find more precise texts that fit the characteristics of the disease ''  
+
''Note:There might be a question here that, most of the people are normal people,not the doctor or nurse, so they might not use the professional term but normal words. Then this method will loss many useful text. However, we still not sure the text which might about disease but not has key words is exactly related to this flulike illness. So this method is still reasonable, which can help to find more precise texts that fit the characteristics of the disease ''  
 
I pick out the text, lower the words, remove the stop words and do stemming. Then if there re same words both in key word and text, the text is the target text we want
 
I pick out the text, lower the words, remove the stop words and do stemming. Then if there re same words both in key word and text, the text is the target text we want
(photo)
+
 
 
=other adjustments=
 
=other adjustments=
 
===Location===
 
===Location===
Line 24: Line 24:
 
===Symptom===
 
===Symptom===
 
I also add another column which named ‘Symptom’ to find the keyword in the text. This can help to know more about the flu, like which is the initial symptom, and how will the symptom change. These all can be revealed from the text.
 
I also add another column which named ‘Symptom’ to find the keyword in the text. This can help to know more about the flu, like which is the initial symptom, and how will the symptom change. These all can be revealed from the text.
 +
 +
[[File:text_symptom.jpg]]
 +
 +
[[File:map.jpg]]

Revision as of 00:17, 15 October 2017

Original data

According to the overview, there are 3 kind of datasets, the data contents show below:

Name Description
Microblogs contains the microblogs' contents, the location and the people's ID.
Population Total population and daytime population of 13 zones.
Weather the weather, wind direction and wind power.

Find the useful microblogs

When we decide if the text is what we want, we need to the find the key words. For example, the words that related to this epidemic illness. In this case, Observed symptoms are largely flu­like and include fever, chills,sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes. As the disease continues to expand, there is a reasonable assumption that the these words which related to the symptoms will become more frequent. According to the symptom and description of the flu, I set some key words. If the text has the same words as key words, then it can be looked as the useful text. Key word: 'flu','fever','chills','sweats','aches','pains','fatigue','coughing','breathing','nausea','vomiting','diarrhea','lymph','death' Note:There might be a question here that, most of the people are normal people,not the doctor or nurse, so they might not use the professional term but normal words. Then this method will loss many useful text. However, we still not sure the text which might about disease but not has key words is exactly related to this flulike illness. So this method is still reasonable, which can help to find more precise texts that fit the characteristics of the disease I pick out the text, lower the words, remove the stop words and do stemming. Then if there re same words both in key word and text, the text is the target text we want

other adjustments

Location

Separated the location to longitude and latitude. Because the longitude in west, so I change the number to negative.

Symptom

I also add another column which named ‘Symptom’ to find the keyword in the text. This can help to know more about the flu, like which is the initial symptom, and how will the symptom change. These all can be revealed from the text.

Text symptom.jpg

Map.jpg