Difference between revisions of "ISSS608 2017-18 T1 Assign ZHANG Lidan Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 24: Line 24:
 
<br/>
 
<br/>
 
==Data Preparation==
 
==Data Preparation==
To better deal with the data, I import the microblog data set into the JMP at first. This dataset contains a lot of useful information. For example, I can use the location axis and the timestamp to identify where these rows are located. Then, through tokenizing and stemming the words in each message, I can filter the high frequency words and flulike-related keywords for further data exploration.
+
I import the microblog data set into the JMP at first.  
The microblogs dataset contains 1,023,077 rows.
+
Firstly, I exclude 48 rows of missing text.
Firstly, I need to separate the location into longitude and latitude. Then, because these locations are at the western, hemisphere, I should reverse the longitude coordinates into negative value.
+
Next, I separate the location into longitude and latitude through Word function.  
Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.
+
Then, because these locations are at the western hemisphere, I change the longitude coordinates into negative value by Num function.
 +
In addition, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.
 +
 
 
[[File:1.png|600px|center]]
 
[[File:1.png|600px|center]]

Revision as of 16:01, 15 October 2017

Title momo.png To be a Visual Detective

Background

Data Preparation

Data Visualization

Conclusion

 


Data Preparation

I import the microblog data set into the JMP at first. Firstly, I exclude 48 rows of missing text. Next, I separate the location into longitude and latitude through Word function. Then, because these locations are at the western hemisphere, I change the longitude coordinates into negative value by Num function. In addition, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.

1.png