ISSS608 2017-18 T1 Assign MA XIAOLIU Data Preparation

From Visual Analytics and Applications
Revision as of 00:29, 23 October 2017 by Xiaoliu.ma.2016 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Timg.jpg Research of Epidemic Spread in Smartpolis

Overview

Data Preparation

Epidemic exploration

Transmission and Tendency

Conclusion

Original data

According to the overview, there are 3 kind of datasets, the data contents show below:

Name Description
Microblogs contains the microblogs' contents, the location and the people's ID.
Population Total population and daytime population of 13 zones.
Weather the weather, wind direction and wind power.

Data Preparation

Data clean

Firstly,check the missing data pattern for 'Microblogs' in JMP. There are 48 rows missing the 'text' value. Remove the 48 rows. There are total '1023029' rows data.

Missingdata1.png

Find the useful microblogs

the microblogs are massive, besides, not all of them are connected to the illness. So the challenge is how to get the useful microblogs. What's more, how to get the target people through the microblogs.

When we decide if the text is what we want, we need to the find the key words. For example, the words that related to this epidemic illness. In this case, Observed symptoms are largely flu­like and include fever, chills,sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes. As the disease continues to expand, there is a reasonable assumption that the these words which related to the symptoms will become more frequent. According to the symptom and description of the flu, I set some key words. If the text has the same words as key words, then it can be looked as the useful text.

Key word: 'flu','fever','chills','sweats','aches','pains','fatigue','coughing','breathing','nausea','vomiting','diarrhea','lymph','death'

Note:There might be a question here that, most of the people are normal people,not the doctor or nurse, so they might not use the professional term but normal words. Then this method will loss many useful text. However, we still not sure the text which might about disease but not has key words is exactly related to this flulike illness. So this method is still reasonable, which can help to find more precise texts that fit the characteristics of the disease

I use python pick out the text, lower the words, remove the stop words and do stemming. Then if there re same words both in key word and text, the text is the target text we want.

Python code:File:Text exploration.txt


Location

Separated the location to longitude and latitude. Because the longitude in west, so I change the number to negative.

Symptom

I also add another column which named ‘Symptom’ to find the keyword in the text. This can help to know more about the flu, like which is the initial symptom, and how will the symptom change. These all can be revealed from the text.

Text symptom.jpg

Map Description

on the original map, there are many colors and icons which represent different places. Combining with the additional information, I adjust the color of the image, and use simple color to represent the buildings.

Map.jpg
Function Represent color Discription
Water Supply
Green.png
Residents and businesses get their drinking water by pumping water from nearby reservoirs or rivers. These distributed water systems are both public and privately owned.
Entertainment
Yellow.png
Vastopolis has two stadiums (Vastopolis Dome and Westside Stadium) for sports, concerts, and other events. The various lakes and the Vast River, which flows south at a steady rate of three miles per hour, is used for water-based sports and recreation.
City Administration
Red.png
Vastopolis has several locations of significance including a state courthouse, a capitol building, convention center, and a large airport.
various hospital
Blue.png
different hospitals