ISSS608 2017-18 T1 Assign WANG SHANG

From Visual Analytics and Applications
Jump to navigation Jump to search

Title WangShang.jpg Mini Challenge: What's happened in Smartpolis?

Background

Smartpolis is a major metropolitan area with a population of approximately two million residents. During the last few days, health professionals at local hospitals have noticed a dramatic increase in reported illnesses.

I want to mine some valuable insights to track the trend of spread of illness by using visualization analysis tools, and help government to let them know what they can do for a better illness spread control.


Data Description

I have three datasets and one Smartpolis map for analysis. In the three datasets, the first one contains microblog messages collected from various devices with GPS capabilities. These devices include laptop computers, handheld computers, and cellular phones, another two are about population statistics and observed weather data. I am also supported some additional information in a Words file.


Data Preparation

In the microblog dataset, there is a column that records the text that is published to social platform by different persons, and this dataset also supports the created time and location to me. I import this dataset to JMP, using word function split the location data into two columns, latitude and longitude. Then I use text explore analysis to split each text record into words and phrases with no stemming. Because I think if someone is ill, he/she usually sends a blog message about his/her illness. So that if I can find a word that can represent a symptom or illness in a text, it probability means this blog creator has gotten this illness. Hence, I can just extract a key symptom to represent the current status of a person.

Here is an example, I use flu­like, fever, chills,sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea, vomiting, diarrhea, and enlarged lymph nodes, which is provided in the overview part of assignment introduction page, as my illness word list. And I find each word from this list in JMP text explore analysis to collect related text records and put them into a new table. In this new table, I create a new column called Key_Symptom using the particular words as the value.

pic1. finding key symptom

After finishing the same process on the all words in my illness word list, I concentrate them together to generate my visualize-used table. Before I import them into Tableau, I also create a new column named DayNight based on Created_at column. In this column, value "1" means Day, because the hour of created time between 6 and 17. and value "2" means Night, because the hour of created time less than 6 or larger than 17. So far, the data preparation has been finished. I will use it and weather and population data to do a visualization analysis.


Tasks & Solutions

Task 1: Origin and Epidemic Spread

As the below picture showed, in my opinion, the zero ground location is around the place in the red circle, and there are two affected region, they are the two yellow circle regions next to the red circle.

pic2. zero ground and affected place

After data preparation, I input the new microblog data into Tableau. First of all, in DayNight column, I recode the value "1" to Day, and value "2" to Night. Then, I insert the map image based on latitude and longitude columns, using different colors to identify different symptoms.

I think if I want to know where is the zero ground and affected place, I need to find a place that has much more points than before showed on map. I use Day of Created_at column as my filter and put it into Pages part to show the distribution of points for each day. The situation is normal before 18th May, and on 18th, there are a lot of points suddenly showed on the downtown and uptown region. And just one day later, there are a lots of points suddenly showed again on another region, the downstream of the river.

pic3. 18th outbreak
pic4. 19th outbreak

So, the two affected places is around downtown and uptown region and the banks of downstream of river.

I also want to know what exact kind of symptom that people got. So I count the number of each symptom and get the below graph.

pic5. The number of symptom

I find that headache, breathing problem, chill, cough, fatigue, fever and sweat outbreak on 18th. Diarrhea, flu, nausea and pain outbreak on 19th. Vomit outbreaks on 20th. I also visualize the distribution of point for each symptom, and find that headache, breathing problem, chill, cough, fatigue, fever, sweat and flu are major in the center region (where is downtown and uptown region). That means most of people who had the symptoms that outbreak on 18th got flu on next day. So the illness in this region possible is flulike illness.

And Diarrhea, nausea, and vomit are major in the banks of downstream of river. Same as flu, I think outbreak of vomit symptom just delays because of the reaction time of human body. And based on the words of two region's symptom, they are obviously two different illness. Flulike illness outbreaks in center region and stomach related problem outbreaks in the banks of downstream of river. (Below is the picture of point distribution for each symptom)

pic6. Point distribution on 18th
pic7. Point distribution on 19th and 20th

Notice that symptom pain outbreaks both center region and banks of downstream of river. I think it because no matter flulike illness or stomach problem, people will always feel painful. So I decide to remove it from my further analysis. Actually, headache also has same problem, but it because I added symptom ache into headache, I think it is the mainly reason that why lots of point presented near to the banks of downstream of river. So I decide to keep headache.

After finding affected place, I can find the zero ground based on what I have found. Firstly, for stomach related problem, depends on the nature of the symptom and the outbreak region where is near the water, I think people got this illness because of the river. Then, according to README file, in additional information, there is an item said the river flows to south. This is also why I call this region is downstream of river. So, there must be something in the water flows to downstream from upstream that caused the illness. Secondly, for center region, the symptoms contain breathing problem, cough, flu and so on. I think maybe there is something wrong in the air. From the weather data, on both 17th and 18th the direction of wind is from west to east. It means in the west part of center region, something happened and polluted air. In a word, combine two possible reason of two affected places, I think the zero ground is around the highway 601, which is in the red circle that I said before. So, I do another text explore analysis to mine the text data 17th and 18th. The result shows that word fire has the most count amount. So I guess there is something happened that caused a fire, the fire lead to a air pollution and make lots of dirty things into river, which results to two regions' illness.


Task 2: Epidemic Spread

Question 1: How the infection is being transmitted?

For this case, there are three ways that can spread illness.

  1. Person-to-person
  2. Airborne
  3. Waterborne

I think that because the illnesses of two affected regions are different, I can analyze them individually.

Banks of downstream of river

For this region, the illness is stomach related problem, as what I said in task 1, I think the illness in this region is probability transmitted by water. Because people usually get stomach problem when they eat something wrong. Here, I cannot collect any information about the food, and this region is so near to the river bank. So water should be the main object that can transmitted illness. Below is the stomach related points distribution on 19th and 20th.

pic8. Stomach points distribution on 19th
pic9. Stomach points distribution on 20th

From these two pictures, I can conclusion that the stomach related problem is only happened in the downstream of river, and the reason why people live near upstream river banks didn't get this illness is because the water flows from north to south, river will not push the dirty thing to the upstream. And people live in the other region which is so far away from the river also didn't get the stomach related illness, which can improve that the dirty thing that can cause illness only in the river.

According to the weather data, after 17th, the wind direction is always from west to east. If stomach related problem can be transmitted by air, the region lies in the east side of the downstream river should have lots of people get this illness. But there is almost nobody get the illness in that region actually. So stomach related problem cannot be transmitted by air.

In the banks of downstream of river, there are three regions inside, Plainville, Westside and Smogtown. The below picture shows the population change from day to night in a day for these three region.

pic10. Population change

From this picture, those three regions obviously has a large population change in a day, which means lots of people have entered these regions. However, the illness still didn't happen in other regions. So it can prove that stomach related problem cannot transmitted by person-to-person. In other word, this illness is transmitted by water.


Center region (Downtown and uptown)

For center region, flulike illness is the main problem. Based on my analysis, water cannot be the way to transmit illness. Look at the below picture, this picture is the flulike points distribution on 18th. If the flulike is transmitted by water, similarly with stomach problem, the another side of river also should have lots of people get flulike illness. However, the picture shows that there is a huge different between two red circles. So that water cannot transmit flulike illness. (The status of 19th and 20th are same with 18th's)

pic11. Flulike water analysis

Whether this illness is transmitted by person-to-person? I visualize the point distribution based on column DayNight and day of Created_at, and I find that during the day of 18th, the most flulike-sufferer is in the center region. And when the time comes to the night, the points begin to be showed in other regions. Combining the population change of center region, I can analysis that the center region also has a huge movement of population in a day. People usually come to center region for work in the day and come back their home in the night. So the points in the other regions on 18th should be those people who work in center region in the day of 18th. However, when these people go to work on the second day, the points not only didn't disappear, but also has a increasing trend.

pic12. Point distribution by daynight&day
pic13. Population change of center region

That means the flulike illness has been transmitted from the people who work in center region to the people who are not in center region everyday. So I think the flulike illness is probability transmitted by person-to-person.

Based on the weather data, after 17th, the direction of wind is always west, and sometimes is northwest. so if the air also can transmit flulike, the regions which is lie in the east and south side of center region will also be affected. According to the above picture, I can find that there are some points presented on those regions, but compared with center region, the density is so low, and considered that these points may be caused by person-to-person method, I think wind may be the reason that spread the illness, but I cannot confirm it.

So, for flulike illness in center region, I think person-to-person is the main method to transmit illness and airborne may also be one of the transmission way, but it cannot be the main way.


Question2: Is the outbreak contained? & Is it necessary to deploy treatment?

My answer is the outbreak is not contained and it is necessary for emergency management personnel to deploy treatment resources outside the affected area. Let's look at the illness status on the last day.

pic14. Last day status of stomach
pic15. Last day status of flulike

From the two above pictures, I can see in the banks of downstream of river, people who got the stomach relater problem didn't go to hospital or clinic to take treatment. And the point density of this region is still very high. Lots of people still have diarrhea symptom and on this day, there are lots of people begin to have vomit symptom. These phenomenons represent the stomach related problem haven't been contained.

For the center region, although it obviously shows that lots of people have already gone to hospital to take treatment, there are still a large number of people in the center region cannot take treatment, because the point density in center region is still really high. It means the hospital is already full running all its' resources. And because of the spread method of flulike illness, if government doesn't do anything, there will be more and more people are infected flulike illness.

Therefore, to contain the outbreak, the emergency management personnel should deploy treatment resources outside the banks of downstream of river and center region.


Tableau visualization link

https://public.tableau.com/profile/ws1881#!/vizhome/MiniChallenge/Dashboard1?publish=yes