Difference between revisions of "ISSS608 2017-18 T1 Assign ZHANG LIDAN"

From Visual Analytics and Applications
Jump to navigation Jump to search
(Created page with "<div style="background: #3b3b3b; padding: 15px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #a9a9a9 solid 32px; font-size: 20px"><font color="white">A...")
 
Line 3: Line 3:
 
==Background==
 
==Background==
  
==Dataset==
+
==Data Preparation==
 +
To better deal with the data, I import the microblog data set into the JMP at first. This dataset contains a lot of useful information. For example, I can use the location axis and the timestamp to identify where these rows are located. Then, through tokenizing and stemming the words in each message, I can filter the high frequency words and flulike-related keywords for further data exploration.
 +
The microblogs dataset contains 1,023,077 rows.
 +
Firstly, I need to separate the location into longitude and latitude. Then, because these locations are at the western, hemisphere, I should reverse the longitude coordinates into negative value.
 +
Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.
 +
[[File:1.png|600px|center]]
  
 
==disease==
 
==disease==

Revision as of 19:32, 12 October 2017

Assignment 1 - To be a Visual Detective: D

Background

Data Preparation

To better deal with the data, I import the microblog data set into the JMP at first. This dataset contains a lot of useful information. For example, I can use the location axis and the timestamp to identify where these rows are located. Then, through tokenizing and stemming the words in each message, I can filter the high frequency words and flulike-related keywords for further data exploration. The microblogs dataset contains 1,023,077 rows. Firstly, I need to separate the location into longitude and latitude. Then, because these locations are at the western, hemisphere, I should reverse the longitude coordinates into negative value. Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.

1.png

disease

reference

feedback