Difference between revisions of "ISSS608 2017-18 T1 Assign ZHANG LIDAN"
(Created page with "<div style="background: #3b3b3b; padding: 15px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #a9a9a9 solid 32px; font-size: 20px"><font color="white">A...") |
|||
Line 3: | Line 3: | ||
==Background== | ==Background== | ||
− | == | + | ==Data Preparation== |
+ | To better deal with the data, I import the microblog data set into the JMP at first. This dataset contains a lot of useful information. For example, I can use the location axis and the timestamp to identify where these rows are located. Then, through tokenizing and stemming the words in each message, I can filter the high frequency words and flulike-related keywords for further data exploration. | ||
+ | The microblogs dataset contains 1,023,077 rows. | ||
+ | Firstly, I need to separate the location into longitude and latitude. Then, because these locations are at the western, hemisphere, I should reverse the longitude coordinates into negative value. | ||
+ | Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns. | ||
+ | [[File:1.png|600px|center]] | ||
==disease== | ==disease== |
Revision as of 19:32, 12 October 2017
Background
Data Preparation
To better deal with the data, I import the microblog data set into the JMP at first. This dataset contains a lot of useful information. For example, I can use the location axis and the timestamp to identify where these rows are located. Then, through tokenizing and stemming the words in each message, I can filter the high frequency words and flulike-related keywords for further data exploration. The microblogs dataset contains 1,023,077 rows. Firstly, I need to separate the location into longitude and latitude. Then, because these locations are at the western, hemisphere, I should reverse the longitude coordinates into negative value. Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.