Difference between revisions of "ISSS608 2017-18 T1 Assign WANG YUCHEN Data Processing"

From Visual Analytics and Applications
Jump to navigation Jump to search
(Created page with " <font size="5">'''To be a Visual Detective'''</font> =Geospatial and Microblogging - Characterization of an Epidemic Spread = The given datasests involved maps, weather and...")
 
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
<div style=background:#2B3856 border:#A3BFB1>
 +
[[Image:EPIDEMIC.png|280px]]
 +
<font size = 5; color="#FFFFFF">Smartpolis Epidemic Outbreak </font>
 +
<font size = 5; color="#FFFFFF">|  Visual Detective </font>
 +
</div>
 +
<!--MAIN HEADER -->
 +
{|style="background-color:#1B338F;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 +
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2B3856; text-align:center;" width="25%" |
 +
;
 +
[[ISSS608 2017-18 T1 Assign WANG YUCHEN| <font color="#FFFFFF">Background</font>]]
 +
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2B3856; text-align:center;" width="25%" |
 +
;
 +
[[ISSS608 2017-18 T1 Assign WANG YUCHEN_Data Processing| <font color="#FFFFFF">Data Processing</font>]]
 +
 +
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" |
 +
;
 +
[[ISSS608 2017-18 T1 Assign WANG YUCHEN_Origin and Epidemic Spread| <font color="#FFFFFF">Origin and Epidemic Spread
 +
</font>]]
 +
 +
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" |
 +
;
 +
[[ISSS608 2017-18 T1 Assign WANG YUCHEN_Epidemic Spread| <font color="#FFFFFF">Epidemic Spread</font>]]
 +
 +
|  &nbsp;
 +
|}
 +
<br/>
  
 
<font size="5">'''To be a Visual Detective'''</font>
 
<font size="5">'''To be a Visual Detective'''</font>
  
=Geospatial and Microblogging - Characterization of an Epidemic Spread =
+
=Geospatial and Text Data - Detecting an Epidemic Spread =
The given datasests involved maps, weather and population information of smartpolis microblog
+
===Datasets===
characterization of the spread of an epidemic using given maps,
+
The given datasets involved map, weather and population information of smartpolis, while the main dataset for our analysis is the microblog dataset, which contains geospatial and text data gathered from tweets.  
geospatial and text data gathered from microblog tweets.  
+
 
All of the events in the scenario occurred
+
Through mapping the tweets information with geospatial data in the map, we try to figure out the features and trends of the epidemic. The weather information also helps us detect the transmission of the epidemic, whether it is airborne, waterborne, person-to-person, or something else?
in the fictional city of Vastopolis during the first half of 2011.
+
 
MC1 consisted of text (tweets) which participants needed to
+
===Process of study===
process to identify the symptoms and details of an epidemic.
+
* Text explorer of the tweets by date to figure out symptoms and details which describe the outbreaks
There were two different sets of illnesses, a waterborne illness
+
 
and an airborne illness. The participants were asked to locate
+
Through JMP's text visualization tool, we first explore the daily word cloud, it seemed that the epidemic related messages appeared from May 18th. Messages people tweeted before were mostly expressing living emotions.
and pinpoint the source of the epidemic, to describe the method
+
[[File:word.png|600px|frameless|center]]
of transmission of the epidemic, and determine if deployment of
+
<br />
treatment resources outside of the affected area was necessary.
+
 
 +
To have a more accurate analysis, tweets happened during the epidemic period (203020 / 1023077) are selected for further text mining. Through analyzing the 3-day(05/18 – 05/20)  word clouds and term and phrases lists generated by JMP, two types of illness are detected.
 +
 
 +
One is breathing related illness with symptoms listed below.  
 +
 
 +
breath chill cold cough fever flu lymph pneumonia sweat throat
 +
 
 +
The other one is stomach related diseases with symptoms listed below.
 +
diarrhea nausea stomach vomit
 +
 
 +
There are also some other keywords which describe the epidemic in a more generalized way.
 +
ache bad death die fatigue hospital kill medicine pain sick
 +
 
 +
Some related words, like “caught”, are neglected since they usually combined with other more distinct symptoms.
 +
[[File:caught.png|400px|frameless]]
 +
<br />
 +
 
 +
 
 +
To figure out what causes the outbreaks, we also go back to the days before 18th to detect some clues. A suspicious event is the trunk accident happened at bridge 610. Another one is the scary explosion with smoke clouds in smog town on 17th.So symptoms listed below are also noticeable and are categorized as accidents.
 +
accident explosion oil shook smoke smog trunk
 +
 
 +
 
 +
* Filter out unrelated tweets
 +
 
 +
Since we have found several types of symptoms which depict the epidemic, we will then filter out those informative texts (139556 tweets) for further study. At the same time, it should be considered the author’s role while they tweeted the message, some are sufferers and others are friends, colleagues, neighbors of the sufferers or other reporters. Since we want to detect the spread of the epidemic thus non-sufferer’s tweets could be misleading which bring us a challenge.
 +
 
 +
To partly solve this issue, another label to detect the role of the author is designed and if the tweets related to epidemic contains symptoms listed below will be tagged as “reporters”, else as “sufferers”.
 +
colleague friend neighbor
 +
 
 +
*  Map the informative tweets with geospatial data to trail the spread of epidemic
 +
Based on each tweet‘s spatial information,we can map the tweets to explore how the illnesses changed through time. With the temporal data and the symptom-related attributes, we can drill down to see when and where did the epidemic start, how the diseases affected the country and by which way did it transmit so as to figure out what to do next step

Latest revision as of 23:30, 17 October 2017

EPIDEMIC.png Smartpolis Epidemic Outbreak | Visual Detective

Background

Data Processing

Origin and Epidemic Spread

Epidemic Spread

 


To be a Visual Detective

Geospatial and Text Data - Detecting an Epidemic Spread

Datasets

The given datasets involved map, weather and population information of smartpolis, while the main dataset for our analysis is the microblog dataset, which contains geospatial and text data gathered from tweets.

Through mapping the tweets information with geospatial data in the map, we try to figure out the features and trends of the epidemic. The weather information also helps us detect the transmission of the epidemic, whether it is airborne, waterborne, person-to-person, or something else?

Process of study

  • Text explorer of the tweets by date to figure out symptoms and details which describe the outbreaks

Through JMP's text visualization tool, we first explore the daily word cloud, it seemed that the epidemic related messages appeared from May 18th. Messages people tweeted before were mostly expressing living emotions.

Word.png


To have a more accurate analysis, tweets happened during the epidemic period (203020 / 1023077) are selected for further text mining. Through analyzing the 3-day(05/18 – 05/20) word clouds and term and phrases lists generated by JMP, two types of illness are detected.

One is breathing related illness with symptoms listed below.

breath chill cold cough fever flu lymph pneumonia sweat throat

The other one is stomach related diseases with symptoms listed below.

diarrhea nausea stomach vomit

There are also some other keywords which describe the epidemic in a more generalized way.

ache bad death die fatigue hospital kill medicine pain sick

Some related words, like “caught”, are neglected since they usually combined with other more distinct symptoms. Caught.png


To figure out what causes the outbreaks, we also go back to the days before 18th to detect some clues. A suspicious event is the trunk accident happened at bridge 610. Another one is the scary explosion with smoke clouds in smog town on 17th.So symptoms listed below are also noticeable and are categorized as accidents.

accident explosion oil shook smoke smog trunk


  • Filter out unrelated tweets

Since we have found several types of symptoms which depict the epidemic, we will then filter out those informative texts (139556 tweets) for further study. At the same time, it should be considered the author’s role while they tweeted the message, some are sufferers and others are friends, colleagues, neighbors of the sufferers or other reporters. Since we want to detect the spread of the epidemic thus non-sufferer’s tweets could be misleading which bring us a challenge.

To partly solve this issue, another label to detect the role of the author is designed and if the tweets related to epidemic contains symptoms listed below will be tagged as “reporters”, else as “sufferers”.

colleague friend neighbor
  • Map the informative tweets with geospatial data to trail the spread of epidemic
Based on each tweet‘s spatial information,we can map the tweets to explore how the illnesses changed through time. With the temporal data and the symptom-related attributes, we can drill down to see when and where did the epidemic start, how the diseases affected the country and by which way did it transmit so as to figure out what to do next step