Difference between revisions of "ISSS608 2017-18 T1 Assign ZHANG Lidan Data Preparation"

Revision as of 15:03, 15 October 2017

To be a Visual Detective

Background

Data Preparation

Data Visualization

Conclusion

Data Preparation

To better deal with the data, I import the microblog data set into the JMP at first. This dataset contains a lot of useful information. For example, I can use the location axis and the timestamp to identify where these rows are located. Then, through tokenizing and stemming the words in each message, I can filter the high frequency words and flulike-related keywords for further data exploration. The microblogs dataset contains 1,023,077 rows. Firstly, I need to separate the location into longitude and latitude. Then, because these locations are at the western, hemisphere, I should reverse the longitude coordinates into negative value. Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.

@@ Line 29: / Line 29: @@
 Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.
 [[File:1.png|600px|center]]
-Next, I create the bar chart to display the frequency of microblogs including the symptom words. From this table, it can be noticeable that there is a sharply increase in the frequency from May 18 to May 20, 2011.
-[[File:2.png|1000px|center]]
-Aiming to explore what happens from May 18 to May 20, I decide to reload the microblog dataset into JMP. Through observing the words in the text, I find the words are not only related to flulike symptoms, but also related to stomach problems. Then, I generate one dataset contains flulike symptoms like breath, cough, fatigue, fever, flu, and pneumonia, another dataset contains stomach ache symptoms like diarrhea, nausea, stomach and vomit.

Difference between revisions of "ISSS608 2017-18 T1 Assign ZHANG Lidan Data Preparation"

Revision as of 15:03, 15 October 2017

Data Preparation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools