Difference between revisions of "ISSS608 2017-18 T1 Assign GOH JUN JIE ANTHONY"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 64: Line 64:
 
From the Population.csv file, we see that Downtown had 89,286 residents but its daytime population was much higher at 258,928. Many people travelled to Downtown to work in the day. The same goes for Uptown. It had 29,762 residents but its daytime population was 116,072.
 
From the Population.csv file, we see that Downtown had 89,286 residents but its daytime population was much higher at 258,928. Many people travelled to Downtown to work in the day. The same goes for Uptown. It had 29,762 residents but its daytime population was 116,072.
  
From the Weather.csv file, we see that the wind was blowing from the west on May 18. If the infection was transmitted airborne, we should expect that people in Eastside and some parts of Surburbia and Lakeside to be infected. However, from the image below taken at May 18, 11 pm, we do not see a large spike in the number of messages from Eastside. A point to note when I analysed the messages was that many of the messages mentioned about other people who were sick. Therefore, messages from the town would not necessarily mean that people in the town was infected.
+
From the Weather.csv file, we see that the wind was blowing from the west on May 18. If the infection was transmitted airborne, we should expect that people in Eastside and some parts of Surburbia and Lakeside to be infected. However, from the image below taken at May 18, 11 pm, we do not see a large spike in the number of messages from Eastside. A point to note when I analysed the messages was that many of the messages mentioned about other people who were sick. Therefore, messages from the town would not necessarily mean that people in the town were infected. If people in Eastside were infected, the size of the circle should be almost as big as the one in Downtown and Uptown.
  
 
[[File:Tableau7.png|No spike in messages from Eastside]]
 
[[File:Tableau7.png|No spike in messages from Eastside]]

Revision as of 22:22, 15 October 2017

Background

Smartpolis is a major metropolitan area with a population of approximately two million residents. During the last few days, health professionals at local hospitals have noticed a dramatic increase in reported illnesses.

Observed symptoms are largely flu­like and include fever, chills, sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes. More recently, there have been several deaths believed to be associated with the current outbreak. City officials fear a possible epidemic and are mobilizing emergency management resources to mitigate the impact.

Two datasets have been provided. The first one contains microblog messages collected from various devices with GPS capabilities. These devices include laptop computers, handheld computers, and cellular phones. The second one contains map information for the entire metropolitan area. The map dataset contains a satellite image with labeled highways, hospitals, important landmarks, and water bodies. Supplemental tables for population statistics and observed weather data are also provided.

We are tasked with the following:

  1. Identify approximately where the outbreak started on the map (ground zero location), outline the affected area and explain how we arrived at the conclusion.
  2. Present a hypothesis on how the infection is being transmitted, e.g. whether the method of transmission is person-­to­-person, airborne, waterborne etc., and identify the trends that support our hypothesis.
  3. Advise whether the outbreak is contained and whether it is necessary for emergency management personnel to deploy treatment resources outside the affected area, and explain our reasoning.


Data Preparation

In the Microblogs.csv file, the attributes given are ID, Created_at, Location and Text. For the "Location" attribute, the latitude and longitude coordinates were combined in one column and I have to separate it for subsequent use in Tableau. The following functions were used in Excel to split the latitude and longitude:

  • LEFT(C2, SEARCH(" ",C2,1))
  • RIGHT(C2,LEN(C2)-SEARCH(" ",C2,1))

For the longitude, the README file indicated that it is West so I added a negative sign to the longitude coordinates.

There are 1,023,077 records in the Microblogs.csv file. I will need to identify relevant messages which will aid us in identifying the affected area of the disease. To do that, I will use the Text Explorer function in JMP Pro. The Text Explorer will list the most commonly used terms and phrases and I will select terms and phrases linked to the illness and symptoms, e.g. "sick", "headache", "case of the chills", "sick sucks".

Common Terms and Phrases

After selecting the relevant terms and phrases, we made them into a data table and saved the file as a SAS data set. There are 69,729 messages now compared to the earlier 1,023,077.


Data Visualisation

The SAS data set was imported into Tableau.

As Smartpolis is a fictional location, I inserted the Smartpolis_Map.png file as a background image in Tableau. The "Longitude" field was placed under Columns and the "Latitude" field was placed under Rows. The "Created at" field was placed under Pages and "Hour" was selected. We can see from the image below that at the peak of the outbreak, many points are cluttered in the middle.

Many points are cluttered in the middle

To see the intensity of the records more clearer, I did hexagonal binning by creating calculated fields using the hexbinx and hexbiny functions in Tableau. The "Number of Records" field was placed under Size so that bins with higher intensity of the records will appear as bigger circles.

Higher intensity of records will appear as bigger circles


Origin of Outbreak

By plotting the "Number of Records" against time (hour), we can see that the number of messages rose sharply from May 18, 1 am and peaked at 6 pm. There were 1,810 messages at May 18, 6 pm while previous days all had less than 100 each hour.

Number of messages increased sharply from May 18, 1 am

We can see from the image below that at May 18, 12 am, there are not many messages posted. However, at 1 am, there was a large spike in the number of messages from Downtown and Uptown. Using the Lasso Selection function, we can see that the number of messages from Downtown and Uptown increased from 8 at 12 am to 77 at 1 am. The number of messages increased even more at 8 am, when people woke up. At 8 am, 596 of the 810 messages (74%) were from Downtown and Uptown.

Everything was normal at May 18, 12 am

Spike in messages from Downtown and Uptown at 1 am

Greater spike in messages from Downtown and Uptown at 8 am

From the above, we can deduce that the ground zero location and affected areas is Downtown and Uptown.


Epidemic Spread

From the Population.csv file, we see that Downtown had 89,286 residents but its daytime population was much higher at 258,928. Many people travelled to Downtown to work in the day. The same goes for Uptown. It had 29,762 residents but its daytime population was 116,072.

From the Weather.csv file, we see that the wind was blowing from the west on May 18. If the infection was transmitted airborne, we should expect that people in Eastside and some parts of Surburbia and Lakeside to be infected. However, from the image below taken at May 18, 11 pm, we do not see a large spike in the number of messages from Eastside. A point to note when I analysed the messages was that many of the messages mentioned about other people who were sick. Therefore, messages from the town would not necessarily mean that people in the town were infected. If people in Eastside were infected, the size of the circle should be almost as big as the one in Downtown and Uptown.

No spike in messages from Eastside