Difference between revisions of "ISSS608 2017-18 T1 Assign WANG YUCHEN Data Processing"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 63: Line 63:
  
 
* Filter out unrelated tweets and map the informative tweets with geospatial data to trail the spread of epidemic
 
* Filter out unrelated tweets and map the informative tweets with geospatial data to trail the spread of epidemic
 +
 +
Since we have find several types of symptoms which depict the epidemic, we will then filter out those informative texts (139556 tweets) for further study. At the same time, it should be considered the author’s role while they tweeted the message, some are sufferers and others are friends, colleagues, neighbours of the sufferers or other reporters. Since we want to detect the spread of the epidemic though non-sufferer’s tweets could be misleading thus brings us a challenge.
 +
 +
To partly solve this issue, another label to detect the role of the author is designed and if the tweets related to epidemic contains symptoms listed below will be tagged as “reporters”, else as “sufferers”.
 +
Colleague friend neighbour
 +
 +
 
* Drill down the trends of the texts through visualization tools to characterize the illnesses
 
* Drill down the trends of the texts through visualization tools to characterize the illnesses

Revision as of 06:39, 16 October 2017

EPIDEMIC.png Smartpolis Epidemic Outbreak | Visual Detective

Background

Data Processing

Origin and Epidemic Spread

Epidemic Spread

 


To be a Visual Detective

Geospatial and Text Data - Detecting an Epidemic Spread

Datasets

The given datasets involved map, weather and population information of smartpolis, while the main dataset for our analysis is the microblog dataset, which contains geospatial and text data gathered from tweets.

Through mapping the tweets information with geospatial data in the map, we try to figure out the features and trends of the epidemic. The weather information also helps us detect the transmission of the epidemic, whether it is airborne, waterborne, person-to-person, or something else?

Process of study

  • Text explorer of the tweets by date to figure out symptoms and details which describe the outbreaks

Through JMP's text visualization tool, we first explore the daily word cloud, it seemed that the epidemic related messages appeared from May 18th. Messages people tweeted before were mostly expressing living emotions.

Word.png


To have a more accurate analysis, tweets happened during the epidemic period (203020 / 1023077) are selected for further text mining. Through analysing the 3-day(05/18 – 05/20) word clouds and term and phrases lists generated by JMP, two types of illness are detected.

One is breathing related illness with symptoms listed below.

breath chill cold cough fever flu lymph pneumonia sweat throat

The other one is stomach related diseases with symptoms listed below.

diarrhea nausea stomach vomit

There are also some other key words which describe the epidemic in a more generalized way.

ache bad death die fatigue hospital kill medicine pain sick

Some related words, like “caught”, are neglected since they usually combined with other more distinct symptoms. Caught.png


To figure out what causes the outbreaks, we also go back to the days before 18th to detect some clues. A suspicious event is the trunk accident happened at bridge 610. Another one is the scary explosion with smoke clouds in smog town on 17th, but not sure whether smog town belongs to smartpolios actually.So symptoms listed below are also noticeable and are categorized as accident.

accident explosion oil shook smoke smog trunk


  • Filter out unrelated tweets and map the informative tweets with geospatial data to trail the spread of epidemic

Since we have find several types of symptoms which depict the epidemic, we will then filter out those informative texts (139556 tweets) for further study. At the same time, it should be considered the author’s role while they tweeted the message, some are sufferers and others are friends, colleagues, neighbours of the sufferers or other reporters. Since we want to detect the spread of the epidemic though non-sufferer’s tweets could be misleading thus brings us a challenge.

To partly solve this issue, another label to detect the role of the author is designed and if the tweets related to epidemic contains symptoms listed below will be tagged as “reporters”, else as “sufferers”.

Colleague friend neighbour


  • Drill down the trends of the texts through visualization tools to characterize the illnesses