Difference between revisions of "ISSS608 2017-18 T1 Assign DENG CHUNLING"
Cldeng.2016 (talk | contribs) |
Cldeng.2016 (talk | contribs) |
||
(52 intermediate revisions by the same user not shown) | |||
Line 38: | Line 38: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
− | ! Variable !! Treatment !! | + | ! Variable !! Treatment !! Description !! Screenshot |
|- | |- | ||
− | | Location|| Break into "Lat" and "Lon" respectively|| SAS EG | + | | Location|| Break into "Lat" and "Lon" respectively|| SAS EG workflow to prepare Lat/Lon and save to library || [[File:DCLEG1.PNG|none|left|]] |
|- | |- | ||
− | | Text || Find | + | | Text || Find stemmed words from text || SAS EM workflow to extract stemmed words from library. This will help avoid fuzzy lookup e.g. I search for "ache" but "mustache" is returned || [[File:DCLEM1.PNG|none|left|600px]] |
|- | |- | ||
− | | | + | | Text || Filter microblogs to only pre-defined symptoms || SAS EG workflow to join documents and words list. Note that these are all stemmed words so whatever forms of words are in the microblogs, even misspelled ones, will be detected. [[File:DCLEG2.PNG|none|300px]] || [[File:DDCLEG3.PNG|none|]] |
+ | |- | ||
+ | | Symptom || Group symptoms into 3 broad categories: related to digestive system, respiratory system or muscle. The rationale is that some people may know that they have flu, but others are just describing the symptoms. || [[File:DCLTB1.PNG|none|300px]]|| ddd | ||
|} | |} | ||
− | + | =Origin and Initial Spread= | |
+ | My Tableau visualization below allows me to look at the disease situation in a multi-dimensional view. | ||
− | + | <gallery> | |
+ | File:DCLTB 17MAY.PNG|Overall Disease Investigation | ||
+ | File:DCLTB DAILY.PNG|Daily Disease Dashboard | ||
+ | </gallery> | ||
+ | We can see from the Overall Dashboard, by filtering the date value one by one, that the flu outbreak happens on 18 May 2011. This is confirmed by another dashboard which shows the day-by-day view by more clearly. 18 May is also the day where we see a lot of symptoms coming out, from a mere hundred to a few thousand microblogs. | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! 17 May !! 18 May | ||
+ | |- | ||
+ | | [[File:DCLTB 17MAY.PNG|none|600px]]|| [[File:DCLTB 18MAY.PNG|none|600px]] | ||
+ | |} | ||
+ | Comparing these two days, we know two things. First, the source of this flu originated from the downtown area, and initially it spread to the east. We see that this initial spread is along one of the water pipe, whereas the wind direction on 18 May is to the west. This is the indication that the initial spread is due to water-borne factors, not air-borne, because the wind direction is opposite the spread direction. | ||
=Transmission and Containment= | =Transmission and Containment= | ||
− | = | + | Following the initial spread on 18 May, we see another outbreak on 19 May, but this time with very interesting characteristic. |
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! 18 May !! 19 May | ||
+ | |- | ||
+ | | [[File:DCLTB 18MAY.PNG|none|600px]]|| [[File:DCLTB 19MAY.PNG|none|600px]] | ||
+ | |} | ||
− | + | We see that the complaints of nausea and diarrhea increases very rapidly, especially along the river in Westside, Plainsville and Smogtown. The wind direction on 19 May is North West, so we are very confident that this second surge in disease outbreak is due to water-borne factor again - not along the water supply, but rather the river itself. And if we read further we will find out that the Westside Stadium is not far from the river, and is a hub of activities and entertainments. | |
+ | [[File:DCLRIVER19MAY.PNG|none|600px]] | ||
+ | To provide a holistic view of the disease spread by dates and by types, we can see that the disease is spreading, but at a decreasing rate. | ||
+ | |||
+ | [[File:DCLDAILY.PNG|none|800px]] | ||
+ | |||
+ | Next let's look at the time trend. We notice that, despite the disease is still strong, the complaints of respiratory and muscle pains are reducing. I take this as a sign of recovery, as many people start to recover from the diseases, and the number of newly contracted cases decreases. | ||
+ | |||
+ | [[File:DCLTIMETREND.PNG|none|600px]] | ||
+ | |||
+ | If we look at the situation on 20 May, with respect to muscle pain, we see that there are several hot spots of concentration - and we cross-reference with map, we find out that these are actually hospital spots. | ||
+ | |||
+ | [[File:DCL20MAYHAHAHA.PNG|none|700px]] | ||
+ | |||
+ | Therefore, we can actually focus on two remaining areas of concern - first, the west bank areas (nausea and diarrhea patients) and second, the downtown areas (where air-borne respiratory transmission is likely to be high). In addition, provide proper care to the patients still in hospital. | ||
=Link to Tableau Page= | =Link to Tableau Page= | ||
− | + | Feel free to access and comment on my visualization project here: | |
+ | |||
+ | https://public.tableau.com/profile/deng.chunling#!/vizhome/cldeng_2016/DiseaseControl?publish=yes |
Latest revision as of 23:57, 15 October 2017
Contents
Objective & Methodology
In light of the serious situation that Smartpolis faces (several deaths reported!), I need to:
1. Source: Determine origin of disease outbreak
2. Spread: Find out medium of transmission
3. Control: Suggest measures to contain spread
My approach to this problem is:
Action Step | Result |
---|---|
Filter blog text for spots of flu | Exclude non-disease blogs from analysis |
Determine type of symptoms | Categorise symptoms into water, air or human |
Correlate spots with map and time | Animate disease outbreak path by timelapse |
Drill down into water-borne | Study the origin, spread and contributing factor |
Drill down into air-borne | Study the origin, spread and contributing factor |
Drill down into human-transmitted | Study the origin, spread and contributing factor |
So that I can suggest containment measures and geo-fencing for each of the transmission type.
Data Preparation
Efforts are needed to transform the "Microblog" dataset into a format that is conducive for visualization.
Variable | Treatment | Description | Screenshot |
---|---|---|---|
Location | Break into "Lat" and "Lon" respectively | SAS EG workflow to prepare Lat/Lon and save to library | |
Text | Find stemmed words from text | SAS EM workflow to extract stemmed words from library. This will help avoid fuzzy lookup e.g. I search for "ache" but "mustache" is returned | |
Text | Filter microblogs to only pre-defined symptoms | SAS EG workflow to join documents and words list. Note that these are all stemmed words so whatever forms of words are in the microblogs, even misspelled ones, will be detected. | |
Symptom | Group symptoms into 3 broad categories: related to digestive system, respiratory system or muscle. The rationale is that some people may know that they have flu, but others are just describing the symptoms. | ddd |
Origin and Initial Spread
My Tableau visualization below allows me to look at the disease situation in a multi-dimensional view.
We can see from the Overall Dashboard, by filtering the date value one by one, that the flu outbreak happens on 18 May 2011. This is confirmed by another dashboard which shows the day-by-day view by more clearly. 18 May is also the day where we see a lot of symptoms coming out, from a mere hundred to a few thousand microblogs.
17 May | 18 May |
---|---|
Comparing these two days, we know two things. First, the source of this flu originated from the downtown area, and initially it spread to the east. We see that this initial spread is along one of the water pipe, whereas the wind direction on 18 May is to the west. This is the indication that the initial spread is due to water-borne factors, not air-borne, because the wind direction is opposite the spread direction.
Transmission and Containment
Following the initial spread on 18 May, we see another outbreak on 19 May, but this time with very interesting characteristic.
18 May | 19 May |
---|---|
We see that the complaints of nausea and diarrhea increases very rapidly, especially along the river in Westside, Plainsville and Smogtown. The wind direction on 19 May is North West, so we are very confident that this second surge in disease outbreak is due to water-borne factor again - not along the water supply, but rather the river itself. And if we read further we will find out that the Westside Stadium is not far from the river, and is a hub of activities and entertainments.
To provide a holistic view of the disease spread by dates and by types, we can see that the disease is spreading, but at a decreasing rate.
Next let's look at the time trend. We notice that, despite the disease is still strong, the complaints of respiratory and muscle pains are reducing. I take this as a sign of recovery, as many people start to recover from the diseases, and the number of newly contracted cases decreases.
If we look at the situation on 20 May, with respect to muscle pain, we see that there are several hot spots of concentration - and we cross-reference with map, we find out that these are actually hospital spots.
Therefore, we can actually focus on two remaining areas of concern - first, the west bank areas (nausea and diarrhea patients) and second, the downtown areas (where air-borne respiratory transmission is likely to be high). In addition, provide proper care to the patients still in hospital.
Link to Tableau Page
Feel free to access and comment on my visualization project here:
https://public.tableau.com/profile/deng.chunling#!/vizhome/cldeng_2016/DiseaseControl?publish=yes