Difference between revisions of "ISSS608 2017-18 T1 Assign DENG CHUNLING"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(34 intermediate revisions by the same user not shown)
Line 42: Line 42:
 
| Location|| Break into "Lat" and "Lon" respectively|| SAS EG workflow to prepare Lat/Lon and save to library || [[File:DCLEG1.PNG|none|left|]]
 
| Location|| Break into "Lat" and "Lon" respectively|| SAS EG workflow to prepare Lat/Lon and save to library || [[File:DCLEG1.PNG|none|left|]]
 
|-
 
|-
| Text || Find stemmed words from text || SAS EM workflow to extract stemmed words from library. This will help avoid fuzzy lookup e.g. I search for "ache" but "mustache" is returned || [[File:DCLEM1.PNG|none|left|]]
+
| Text || Find stemmed words from text || SAS EM workflow to extract stemmed words from library. This will help avoid fuzzy lookup e.g. I search for "ache" but "mustache" is returned || [[File:DCLEM1.PNG|none|left|600px]]
 
|-
 
|-
 
| Text || Filter microblogs to only pre-defined symptoms || SAS EG workflow to join documents and words list. Note that these are all stemmed words so whatever forms of words are in the microblogs, even misspelled ones, will be detected. [[File:DCLEG2.PNG|none|300px]] || [[File:DDCLEG3.PNG|none|]]
 
| Text || Filter microblogs to only pre-defined symptoms || SAS EG workflow to join documents and words list. Note that these are all stemmed words so whatever forms of words are in the microblogs, even misspelled ones, will be detected. [[File:DCLEG2.PNG|none|300px]] || [[File:DDCLEG3.PNG|none|]]
 
|-
 
|-
| Symptom || Transpose the symptoms such that each symptom becomes a column || SAS EG work || Visualization tricks
+
| Symptom || Group symptoms into 3 broad categories: related to digestive system, respiratory system or muscle. The rationale is that some people may know that they have flu, but others are just describing the symptoms. || [[File:DCLTB1.PNG|none|300px]]|| ddd
 
|}
 
|}
  
=Origin and Spread=
+
=Origin and Initial Spread=
 +
My Tableau visualization below allows me to look at the disease situation in a multi-dimensional view.
  
 +
<gallery>
 +
File:DCLTB 17MAY.PNG|Overall Disease Investigation
 +
File:DCLTB DAILY.PNG|Daily Disease Dashboard
 +
</gallery>
  
 +
We can see from the Overall Dashboard, by filtering the date value one by one, that the flu outbreak happens on 18 May 2011. This is confirmed by another dashboard which shows the day-by-day view by more clearly. 18 May is also the day where we see a lot of symptoms coming out, from a mere hundred to a few thousand microblogs.
  
 +
{| class="wikitable"
 +
|-
 +
! 17 May !! 18 May
 +
|-
 +
| [[File:DCLTB 17MAY.PNG|none|600px]]|| [[File:DCLTB 18MAY.PNG|none|600px]]
 +
|}
 +
 +
Comparing these two days, we know two things. First, the source of this flu originated from the downtown area, and initially it spread to the east. We see that this initial spread is along one of the water pipe, whereas the wind direction on 18 May is to the west. This is the indication that the initial spread is due to water-borne factors, not air-borne, because the wind direction is opposite the spread direction.
  
 
=Transmission and Containment=
 
=Transmission and Containment=
  
==Transmission Medium - Water, Air or Human Interaction?==
+
Following the initial spread on 18 May, we see another outbreak on 19 May, but this time with very interesting characteristic.
 +
 
 +
{| class="wikitable"
 +
|-
 +
! 18 May !! 19 May
 +
|-
 +
| [[File:DCLTB 18MAY.PNG|none|600px]]|| [[File:DCLTB 19MAY.PNG|none|600px]]
 +
|}
  
==Containment Suggestions==
+
We see that the complaints of nausea and diarrhea increases very rapidly, especially along the river in Westside, Plainsville and Smogtown. The wind direction on 19 May is North West, so we are very confident that this second surge in disease outbreak is due to water-borne factor again - not along the water supply, but rather the river itself. And if we read further we will find out that the Westside Stadium is not far from the river, and is a hub of activities and entertainments.
  
 +
[[File:DCLRIVER19MAY.PNG|none|600px]]
  
 +
To provide a holistic view of the disease spread by dates and by types, we can see that the disease is spreading, but at a decreasing rate.
 +
 +
[[File:DCLDAILY.PNG|none|800px]]
 +
 +
Next let's look at the time trend. We notice that, despite the disease is still strong, the complaints of respiratory and muscle pains are reducing. I take this as a sign of recovery, as many people start to recover from the diseases, and the number of newly contracted cases decreases.
 +
 +
[[File:DCLTIMETREND.PNG|none|600px]]
 +
 +
If we look at the situation on 20 May, with respect to muscle pain, we see that there are several hot spots of concentration - and we cross-reference with map, we find out that these are actually hospital spots.
 +
 +
[[File:DCL20MAYHAHAHA.PNG|none|700px]]
 +
 +
Therefore, we can actually focus on two remaining areas of concern - first, the west bank areas (nausea and diarrhea patients) and second, the downtown areas (where air-borne respiratory transmission is likely to be high). In addition, provide proper care to the patients still in hospital.
  
 
=Link to Tableau Page=
 
=Link to Tableau Page=
Here
+
Feel free to access and comment on my visualization project here:
 +
 
 +
https://public.tableau.com/profile/deng.chunling#!/vizhome/cldeng_2016/DiseaseControl?publish=yes

Latest revision as of 23:57, 15 October 2017

Sn-hepatitis.jpg Disease Outbreak Investigation

Objective & Methodology

In light of the serious situation that Smartpolis faces (several deaths reported!), I need to:

1. Source: Determine origin of disease outbreak

2. Spread: Find out medium of transmission

3. Control: Suggest measures to contain spread

My approach to this problem is:

Action Step Result
Filter blog text for spots of flu Exclude non-disease blogs from analysis
Determine type of symptoms Categorise symptoms into water, air or human
Correlate spots with map and time Animate disease outbreak path by timelapse
Drill down into water-borne Study the origin, spread and contributing factor
Drill down into air-borne Study the origin, spread and contributing factor
Drill down into human-transmitted Study the origin, spread and contributing factor

So that I can suggest containment measures and geo-fencing for each of the transmission type.


Data Preparation

Efforts are needed to transform the "Microblog" dataset into a format that is conducive for visualization.

Variable Treatment Description Screenshot
Location Break into "Lat" and "Lon" respectively SAS EG workflow to prepare Lat/Lon and save to library
DCLEG1.PNG
Text Find stemmed words from text SAS EM workflow to extract stemmed words from library. This will help avoid fuzzy lookup e.g. I search for "ache" but "mustache" is returned
DCLEM1.PNG
Text Filter microblogs to only pre-defined symptoms SAS EG workflow to join documents and words list. Note that these are all stemmed words so whatever forms of words are in the microblogs, even misspelled ones, will be detected.
DCLEG2.PNG
DDCLEG3.PNG
Symptom Group symptoms into 3 broad categories: related to digestive system, respiratory system or muscle. The rationale is that some people may know that they have flu, but others are just describing the symptoms.
DCLTB1.PNG
ddd

Origin and Initial Spread

My Tableau visualization below allows me to look at the disease situation in a multi-dimensional view.

We can see from the Overall Dashboard, by filtering the date value one by one, that the flu outbreak happens on 18 May 2011. This is confirmed by another dashboard which shows the day-by-day view by more clearly. 18 May is also the day where we see a lot of symptoms coming out, from a mere hundred to a few thousand microblogs.

17 May 18 May
DCLTB 17MAY.PNG
DCLTB 18MAY.PNG

Comparing these two days, we know two things. First, the source of this flu originated from the downtown area, and initially it spread to the east. We see that this initial spread is along one of the water pipe, whereas the wind direction on 18 May is to the west. This is the indication that the initial spread is due to water-borne factors, not air-borne, because the wind direction is opposite the spread direction.

Transmission and Containment

Following the initial spread on 18 May, we see another outbreak on 19 May, but this time with very interesting characteristic.

18 May 19 May
DCLTB 18MAY.PNG
DCLTB 19MAY.PNG

We see that the complaints of nausea and diarrhea increases very rapidly, especially along the river in Westside, Plainsville and Smogtown. The wind direction on 19 May is North West, so we are very confident that this second surge in disease outbreak is due to water-borne factor again - not along the water supply, but rather the river itself. And if we read further we will find out that the Westside Stadium is not far from the river, and is a hub of activities and entertainments.

DCLRIVER19MAY.PNG

To provide a holistic view of the disease spread by dates and by types, we can see that the disease is spreading, but at a decreasing rate.

DCLDAILY.PNG

Next let's look at the time trend. We notice that, despite the disease is still strong, the complaints of respiratory and muscle pains are reducing. I take this as a sign of recovery, as many people start to recover from the diseases, and the number of newly contracted cases decreases.

DCLTIMETREND.PNG

If we look at the situation on 20 May, with respect to muscle pain, we see that there are several hot spots of concentration - and we cross-reference with map, we find out that these are actually hospital spots.

DCL20MAYHAHAHA.PNG

Therefore, we can actually focus on two remaining areas of concern - first, the west bank areas (nausea and diarrhea patients) and second, the downtown areas (where air-borne respiratory transmission is likely to be high). In addition, provide proper care to the patients still in hospital.

Link to Tableau Page

Feel free to access and comment on my visualization project here:

https://public.tableau.com/profile/deng.chunling#!/vizhome/cldeng_2016/DiseaseControl?publish=yes