Difference between revisions of "ISSS608 2017-18 T1 Assign WANG YUCHEN Data Processing"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 41: Line 41:
 
<br />
 
<br />
  
To have a more accurate analysis, tweets happened during the epidemic period (203020 / 1023077) are selected for further text mining. Through analysing the 3-day(05/18 – 05/20)  word clouds and term and phrases lists generated by JMP, two types of illness are detected.
+
To have a more accurate analysis, tweets happened during the epidemic period (203020 / 1023077) are selected for further text mining. Through analyzing the 3-day(05/18 – 05/20)  word clouds and term and phrases lists generated by JMP, two types of illness are detected.
  
 
One is breathing related illness with symptoms listed below.  
 
One is breathing related illness with symptoms listed below.  
Line 50: Line 50:
 
  diarrhea nausea stomach vomit
 
  diarrhea nausea stomach vomit
  
There are also some other key words which describe the epidemic in a more generalized way.  
+
There are also some other keywords which describe the epidemic in a more generalized way.  
 
  ache bad death die fatigue hospital kill medicine pain sick
 
  ache bad death die fatigue hospital kill medicine pain sick
  
Line 58: Line 58:
  
  
To figure out what causes the outbreaks, we also go back to the days before 18th to detect some clues. A suspicious event is the trunk accident happened at bridge 610. Another one is the scary explosion with smoke clouds in smog town on 17th, but not sure whether smog town belongs to smartpolios actually.So symptoms listed below are also noticeable and are categorized as accident.
+
To figure out what causes the outbreaks, we also go back to the days before 18th to detect some clues. A suspicious event is the trunk accident happened at bridge 610. Another one is the scary explosion with smoke clouds in smog town on 17th.So symptoms listed below are also noticeable and are categorized as accidents.
 
  accident explosion oil shook smoke smog trunk
 
  accident explosion oil shook smoke smog trunk
  
  
* Filter out unrelated tweets and map the informative tweets with geospatial data to trail the spread of epidemic
+
* Filter out unrelated tweets
  
Since we have find several types of symptoms which depict the epidemic, we will then filter out those informative texts (139556 tweets) for further study. At the same time, it should be considered the author’s role while they tweeted the message, some are sufferers and others are friends, colleagues, neighbours of the sufferers or other reporters. Since we want to detect the spread of the epidemic though non-sufferer’s tweets could be misleading thus brings us a challenge.
+
Since we have found several types of symptoms which depict the epidemic, we will then filter out those informative texts (139556 tweets) for further study. At the same time, it should be considered the author’s role while they tweeted the message, some are sufferers and others are friends, colleagues, neighbors of the sufferers or other reporters. Since we want to detect the spread of the epidemic thus non-sufferer’s tweets could be misleading which bring us a challenge.
  
 
To partly solve this issue, another label to detect the role of the author is designed and if the tweets related to epidemic contains symptoms listed below will be tagged as “reporters”, else as “sufferers”.
 
To partly solve this issue, another label to detect the role of the author is designed and if the tweets related to epidemic contains symptoms listed below will be tagged as “reporters”, else as “sufferers”.
  Colleague friend neighbour
+
  colleague friend neighbor
  
 
+
* Map the informative tweets with geospatial data to trail the spread of epidemic
* Drill down the trends of the texts through visualization tools to characterize the illnesses
+
Based on each tweet‘s spatial information,we can map the tweets to explore how the illnesses changed through time. With the temporal data and the symptom-related attributes, we can drill down to see when and where did the epidemic start, how the diseases affected the country and by which way did it transmit so as to figure out what to do next step

Latest revision as of 23:30, 17 October 2017

EPIDEMIC.png Smartpolis Epidemic Outbreak | Visual Detective

Background

Data Processing

Origin and Epidemic Spread

Epidemic Spread

 


To be a Visual Detective

Geospatial and Text Data - Detecting an Epidemic Spread

Datasets

The given datasets involved map, weather and population information of smartpolis, while the main dataset for our analysis is the microblog dataset, which contains geospatial and text data gathered from tweets.

Through mapping the tweets information with geospatial data in the map, we try to figure out the features and trends of the epidemic. The weather information also helps us detect the transmission of the epidemic, whether it is airborne, waterborne, person-to-person, or something else?

Process of study

  • Text explorer of the tweets by date to figure out symptoms and details which describe the outbreaks

Through JMP's text visualization tool, we first explore the daily word cloud, it seemed that the epidemic related messages appeared from May 18th. Messages people tweeted before were mostly expressing living emotions.

Word.png


To have a more accurate analysis, tweets happened during the epidemic period (203020 / 1023077) are selected for further text mining. Through analyzing the 3-day(05/18 – 05/20) word clouds and term and phrases lists generated by JMP, two types of illness are detected.

One is breathing related illness with symptoms listed below.

breath chill cold cough fever flu lymph pneumonia sweat throat

The other one is stomach related diseases with symptoms listed below.

diarrhea nausea stomach vomit

There are also some other keywords which describe the epidemic in a more generalized way.

ache bad death die fatigue hospital kill medicine pain sick

Some related words, like “caught”, are neglected since they usually combined with other more distinct symptoms. Caught.png


To figure out what causes the outbreaks, we also go back to the days before 18th to detect some clues. A suspicious event is the trunk accident happened at bridge 610. Another one is the scary explosion with smoke clouds in smog town on 17th.So symptoms listed below are also noticeable and are categorized as accidents.

accident explosion oil shook smoke smog trunk


  • Filter out unrelated tweets

Since we have found several types of symptoms which depict the epidemic, we will then filter out those informative texts (139556 tweets) for further study. At the same time, it should be considered the author’s role while they tweeted the message, some are sufferers and others are friends, colleagues, neighbors of the sufferers or other reporters. Since we want to detect the spread of the epidemic thus non-sufferer’s tweets could be misleading which bring us a challenge.

To partly solve this issue, another label to detect the role of the author is designed and if the tweets related to epidemic contains symptoms listed below will be tagged as “reporters”, else as “sufferers”.

colleague friend neighbor
  • Map the informative tweets with geospatial data to trail the spread of epidemic
Based on each tweet‘s spatial information,we can map the tweets to explore how the illnesses changed through time. With the temporal data and the symptom-related attributes, we can drill down to see when and where did the epidemic start, how the diseases affected the country and by which way did it transmit so as to figure out what to do next step