DataPreparation
|
|
|
|
|
Contents
Data Preparation
Transforming data
After imported the original dataset into JMP, found that two columns are dirty data which need to be cleaned. Splitting the column "Created_at" into two columns, one is "Date" and the other one is "Time". With the same method split the column "location" into "latitude" and "longitude".
Key Words Selection
By reading requirement of the assignment, there are lots of the key words. With the function of Text Explorer in JMP, also found that the key words list. The top 5 key words in the list are all related to illness.
Combined all the resources, finally choose 13 key words in the report.
Excluding & Hiding Data
After the above steps, choose the key words and label all the rows related to these 13 key words. And then invert selection to exclude and hide the rows which do not include all the key words.
Processing Data
Tag the Key Words
Processing the Time
The time is in the HH:MM format. The format is not good to analyse the final results. Transformed the time into two different formats. One is WorkingHour and night. The other is WorkingHour, Evening, EarlyMorning and Midnight.