ISSS608 2017-18 T1 Assign ZHANG LIDAN

From Visual Analytics and Applications
Revision as of 19:45, 12 October 2017 by Lidan.zhang.2016 (talk | contribs)
Jump to navigation Jump to search
Assignment 1 - To be a Visual Detective: D

Background

Data Preparation

To better deal with the data, I import the microblog data set into the JMP at first. This dataset contains a lot of useful information. For example, I can use the location axis and the timestamp to identify where these rows are located. Then, through tokenizing and stemming the words in each message, I can filter the high frequency words and flulike-related keywords for further data exploration. The microblogs dataset contains 1,023,077 rows. Firstly, I need to separate the location into longitude and latitude. Then, because these locations are at the western, hemisphere, I should reverse the longitude coordinates into negative value. Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.

1.png

Next, I create the bar chart to display the frequency of microblogs including the symptom words. From this table, it can be noticeable that there is a sharply increase in the frequency from May 18 to May 20, 2011.

2.png

Aiming to explore what happens from May 18 to May 20, I decide to reload the microblog dataset into JMP. Through observing the words in the text, I find the words are not only related to flulike symptoms, but also related to stomach problems. Then, I generate one dataset contains flulike symptoms like breath, cough, fatigue, fever, flu, and pneumonia, another dataset contains stomach ache symptoms like diarrhea, nausea, stomach and vomit.

Original and epidemic spread

Flulike problems distribution

There are 29243 rows talking about some of the flulike symptoms. After I import the dataset and map and adjust the latitude and longitude coordinates, I use the Pages functions to observe the changes of the distribution by hours from April 30 to May 20. At 8 am on May 18, there was an outbreak of flulike disease in the Downtown, I guess it might be happen in Vastopolis Dome and Convention Center.

3.png

reference

feedback