Data Preparation

From Visual Analytics and Applications
Revision as of 19:10, 14 October 2017 by Hontak.yau.2017 (talk | contribs)
Jump to navigation Jump to search

Waterbear.jpg ISS608_2017-18_T1_Assign_Yau Hon Tak

Background

Data Preparation

Discovery

Summary

 


Data Preparation

Data Preparation – Microblogs

There is a total of 1m messages. The following image is a summary of by day number of messages. MicroblogSummary.PNG

To assist with identifying messages relevant to our research, we use JMP pro text explorer to perform the work. The initial text analytics results will parse each message’s individual words. We have changed the default text analytics window to increase “Minimum Characters per Word” to 2, “Maximum Words per Phrase” to 8 and “Stem all terms”. Screenshot as follow

File:JMPTextFilterSetup.png

Key words as clues to the symptoms of the sickness has been provided. The key words are: “Observed symptoms are largely flu¬like and include fever, chills, sweats, aches and pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhea, and enlarged lymph nodes.” We have search for these key words and tagged these messages through binary coding onto the main data table. The words and phrases which were used as search are:

File:SymtomList.png

A separate data table was created to review the result. Text explorer was ran again, but this time without “Stemming”. Phrases were being reviewed and key words search were being re-performed. Reason for this re-performance was due to stemming process, where unwanted words would have been included. Unusual phrases were being reviewed such as chicken flu are being excluded. Final results came down to 52k.

Data Preparation – Smartpolis map

The map latitude (height) length is 13.9km and the longitude (width) length is 27.4km. This map can be split up into 0.99km (height) x 1.01km (width) grid, which makes each grid into 1km^2. There is a total of 378 grids. Each of these grid is then mapped into the 13 Areas of Smartpolis. The grids is prepared by building manual polygons. The results will be as follow:

Original map

File:SmartpolisMapOriginal.png Map after grid has been mapped to individual areas. We have color coded the grids here for easier visualisation

File:SmartpolisMapGrid.png

With the mapped now prepared with grids, the underlying data from data preparation above is further added with Polygon ID, Area, Latitude and Longitude Points of the grids.