Difference between revisions of "ISSS608 2017-18 T1 Assign ZHANG Lidan Data Preparation"

Latest revision as of 15:25, 16 October 2017

Epidemic Spread in Smartpolis

Background

Data Preparation

Data Visualization

Conclusion

Data Preparation

Import the microblog data set into the JMP 13
Exclude 48 rows of missing text

Separate the location into longitude and latitude through Word function

Because these locations are at the western hemisphere, change the longitude coordinates into negative value by Num function

In addition, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.

@@ Line 1: / Line 1: @@
 <div style=background:#2B3820 border:#A3BFB1>
 [[Image:title_momo.png|150px]]
-<font size = 5; color="#FFFFFF">To be a Visual Detective</font>
+<font size = 5; color="#FFFFFF">Epidemic Spread in Smartpolis</font>
 </div>
 <!--MAIN HEADER -->
@@ Line 24: / Line 24: @@
 <br/>
 ==Data Preparation==
-To better deal with the data, I import the microblog data set into the JMP at first. This dataset contains a lot of useful information. For example, I can use the location axis and the timestamp to identify where these rows are located. Then, through tokenizing and stemming the words in each message, I can filter the high frequency words and flulike-related keywords for further data exploration.
+* Import the microblog data set into the JMP 13<br/>
-The microblogs dataset contains 1,023,077 rows.
+* Exclude 48 rows of missing text<br/>
-Firstly, I need to separate the location into longitude and latitude. Then, because these locations are at the western, hemisphere, I should reverse the longitude coordinates into negative value.
+[[File:Missingdata.PNG|400px]]
-Next, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea. Here, I use the Text Explorer in JMP to generate these new columns.
+* Separate the location into longitude and latitude through Word function<br/>
-[[File:1.png|600px|center]]
+[[File:Location.PNG|300px]]
-Next, I create the bar chart to display the frequency of microblogs including the symptom words. From this table, it can be noticeable that there is a sharply increase in the frequency from May 18 to May 20, 2011.
+* Because these locations are at the western hemisphere, change the longitude coordinates into negative value by Num function<br/>
-[[File:2.png|1000px|center]]
+[[File:locationr.PNG|100px]]
-Aiming to explore what happens from May 18 to May 20, I decide to reload the microblog dataset into JMP. Through observing the words in the text, I find the words are not only related to flulike symptoms, but also related to stomach problems. Then, I generate one dataset contains flulike symptoms like breath, cough, fatigue, fever, flu, and pneumonia, another dataset contains stomach ache symptoms like diarrhea, nausea, stomach and vomit.
+* In addition, to exclude the irrelevant information, I create the subset dataset which consists of main flulike symptoms, such as '''''chill, flu, fever, sweat, pain, fatigue, ache, cough, breath, nausea, vomit, diarrhea'''''. Here, I use the Text Explorer in JMP to generate these new columns.
+[[File:wordget.PNG|500px]]<br/><br/>
+[[File:Symptoms.PNG|600px]]

Difference between revisions of "ISSS608 2017-18 T1 Assign ZHANG Lidan Data Preparation"

Latest revision as of 15:25, 16 October 2017

Data Preparation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools