ISSS608 2017-18 T1 Assign Xiao Zhenyu Preparation

From Visual Analytics and Applications
Jump to navigation Jump to search


   Mini Challenge 1: Smartpolis Epidemic Spread

Background   Data Preparation   Design Methodology   Visualisation & Analysis 1   Visualisation & Analysis 2   Interactive Links


Data Source

The following given datasets / data tables are used for the assignment work:
MICROBLOG MESSAGES – It contains a number of microblog messages posted by people with a period of 21 days starting from 30 Apr 2011.
METROPOLITAN SATELLITE MAP – It contains map information for the entire metropolitan area with labelled highways, hospitals, important landmarks, and water bodies.
POPULATION STATISTICS – It contains population statistics for 13 city zones including day time population.
OBSERVED WEATHER – It contains observed date with weather conditions, wind speed and direction.


Data Extraction and Conversion

1. New Epidemic Data Table (from MICROBLOG MESSAGES)

Max 1.png
• Location data is converted and split to Latitude and Longitude. Longitude converted to correct format.
• A new Date column is created with conversion from created_at column.
• A new Symptom column is created by using JMP text exploration and extraction from text column with key word indicator. Symptom like Flu, Chills, Cough, Fatigue, Fever, Headache, Sweat, Diarrhea, Nausea, Vomit, Pneumonia, Breath Difficulty, Runny Nose, Sore Throat, and Feeling Better will be extracted.
• Period are derived from created_at timestamp. (Assume 7am - 7pm are day time period)


2.New Region Infection Table (from MICROBLOG MESSAGES and Epidemic table)

Max 2.png
• The table contains no. of affected column by date, zone and symptom is created.
• With symptoms explored from messages, it is able to summarize the number of affected people and categorize them by symptom.
• Based on geocode for each message record, it is able to further classify the number of affected people by zone with geocode comparison.


3.Text Exploration (from MICROBLOG MESSAGES table)

Max 26.PNG

• Text exploration on microblog messages is performed to extract most frequent used words to find out what people mostly talked about or thinks at the time on epidemic breakout.



Data Cleaning

1.Remove unnecessary and meaningful information which cause data inaccuracy. When doing text exploration for the posted message, some terms searched and indicated may also include ambiguous word which need to be removed. For example, message containing fluffy, fluid, flush word does not mean flu. Similarly sweat suit, sweat shop does not mean having a sweats.

2.Some message include terms like “Chicken Flu”, however it is meaning an episode in TV series instead of actual flu disease. It’s also need to be moved.

3.Some message contains word like want to, hope, or wish someone feels better, which does not mean the person is actually getting well at that moment. They are also need to be cleaned.