ISSS608 2017-18 T1 Assign GOH JUN JIE ANTHONY

Smartpolis is a major metropolitan area with a population of approximately two million residents. During the last few days, health professionals at local hospitals have noticed a dramatic increase in reported illnesses.

Observed symptoms are largely flu­like and include fever, chills, sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes. More recently, there have been several deaths believed to be associated with the current outbreak. City officials fear a possible epidemic and are mobilizing emergency management resources to mitigate the impact.

Two datasets have been provided. The first one contains microblog messages collected from various devices with GPS capabilities. These devices include laptop computers, handheld computers, and cellular phones. The second one contains map information for the entire metropolitan area. The map dataset contains a satellite image with labeled highways, hospitals, important landmarks, and water bodies. Supplemental tables for population statistics and observed weather data are also provided.

We are tasked with the following:

  1. Identify approximately where the outbreak started on the map (ground zero location), outline the affected area and explain how we arrived at the conclusion.
  2. Present a hypothesis on how the infection is being transmitted, e.g. whether the method of transmission is person-­to­-person, airborne, waterborne etc., and identify the trends that support our hypothesis.
  3. Advise whether the outbreak is contained and whether it is necessary for emergency management personnel to deploy treatment resources outside the affected area, and explain our reasoning.

Data Preparation

In the Microblogs.csv file, the attributes given are ID, Created_at, Location and Text. For the "Location" attribute, the latitude and longitude coordinates were combined in one column and we have to separate it for subsequent use in Tableau. The following functions were used in Excel to split the latitude and longitude:

  • LEFT(C2, SEARCH(" ",C2,1))
  • RIGHT(C2,LEN(C2)-SEARCH(" ",C2,1))

There are 1,023,077 records in the Microblogs.csv file. We will need to identify relevant messages which will aid us in identifying the affected area of the disease. To do that, we will use the Text Explorer function in JMP Pro.

Text Explorer.png