Difference between revisions of "DataPreparation"
Ziwenhe.2016 (talk | contribs) |
Ziwenhe.2016 (talk | contribs) |
||
(18 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<div style=background:#2B3856 border:#A3BFB1> | <div style=background:#2B3856 border:#A3BFB1> | ||
− | [[Image: | + | [[Image:ZW AffectedAreas.JPG|250px]] |
<font size = 5; color="#FFFFFF">ISSS608 Visual Analytics and Applications Assignment</font> | <font size = 5; color="#FFFFFF">ISSS608 Visual Analytics and Applications Assignment</font> | ||
</div> | </div> | ||
Line 31: | Line 31: | ||
==Transforming data== | ==Transforming data== | ||
− | + | After imported the original dataset into JMP, found that two columns are dirty data which need to be cleaned. Splitting the column "Created_at" into two columns, one is "Date" and the other one is "Time". With the same method split the column "location" into "latitude" and "longitude". The step showed as Figure 12. | |
− | After imported the original dataset into JMP, found that two columns are dirty data which need to be cleaned. Splitting the column "Created_at" into two columns, one is "Date" and the other one is "Time". With the same method split the column "location" into "latitude" and "longitude". | ||
<gallery heights="160" widths="700"> | <gallery heights="160" widths="700"> | ||
− | File:ZW JMP1.JPG|Figure : Dirty Data Columns | + | File:ZW JMP1.JPG|Figure 12: Dirty Data Columns |
</gallery> | </gallery> | ||
Line 40: | Line 39: | ||
By reading requirement of the assignment, there are lots of the key words. With the function of Text Explorer in JMP, also found that the key words list. The top 5 key words in the list are all related to illness. | By reading requirement of the assignment, there are lots of the key words. With the function of Text Explorer in JMP, also found that the key words list. The top 5 key words in the list are all related to illness. | ||
<gallery heights="700" widths="800"> | <gallery heights="700" widths="800"> | ||
− | File:KeyWords.JPG|Figure : Key Words | + | File:KeyWords.JPG|Figure 13: Key Words |
</gallery> | </gallery> | ||
− | Combined all the resources, finally choose 13 key words in the report. | + | Combined all the resources, finally choose 13 key words in the report. Please referred to Figure 13. |
==Excluding & Hiding Data== | ==Excluding & Hiding Data== | ||
− | After the above steps, choose the key words and label all the rows related to these 13 key words. And then invert selection to exclude and hide the rows which do not include all the key words. | + | After the above steps, choose the key words and label all the rows related to these 13 key words. And then invert selection to exclude and hide the rows which do not include all the key words. Please referred to Figure 14. |
<gallery heights="170" widths="1000"> | <gallery heights="170" widths="1000"> | ||
− | File:ZW Exclude&Hide.JPG|Figure : Key Words | + | File:ZW Exclude&Hide.JPG|Figure 14: Key Words |
</gallery> | </gallery> | ||
==Processing Data== | ==Processing Data== | ||
===Tag the Key Words=== | ===Tag the Key Words=== | ||
− | In order to tag all the 13 key words, used the formula to tag all the key words and made one new column for the tags. | + | In order to tag all the 13 key words, used the formula to tag all the key words and made one new column for the tags. Referred to Figure 15. |
<gallery heights="180" widths="550"> | <gallery heights="180" widths="550"> | ||
− | File:ZW tag.JPG|Figure : Tag Key Words | + | File:ZW tag.JPG|Figure 15: Tag Key Words |
</gallery> | </gallery> | ||
− | After tagged all the key words, there should be a column tag with all the key words. | + | After tagged all the key words, there should be a column tag with all the key words. Referred to Figure 16. |
<gallery heights="350" widths="950"> | <gallery heights="350" widths="950"> | ||
− | File:ZW tag2.JPG|Figure : Tag Column | + | File:ZW tag2.JPG|Figure 16: Tag Column |
</gallery> | </gallery> | ||
===Processing the Time=== | ===Processing the Time=== | ||
− | The time is in the HH:MM format. The format is not good to analyse the final results. Transformed the time into two different formats. One is WorkingHour and night. The other is WorkingHour, Evening, EarlyMorning and Midnight. | + | The time is in the HH:MM format. The format is not good to analyse the final results. Transformed the time into two different formats. One is WorkingHour and night. The other is WorkingHour, Evening, EarlyMorning and Midnight. Referred to Figure 17 & Figure 18. |
<gallery heights="180" widths="550"> | <gallery heights="180" widths="550"> | ||
− | File:ZW Time3.JPG|Figure : Day & Night | + | File:ZW Time3.JPG|Figure 17: Day & Night |
− | File:ZW Time4.JPG|Figure : Four Types of Time | + | File:ZW Time4.JPG|Figure 18: Four Types of Time |
</gallery> | </gallery> | ||
==Visualing Data in Tableau== | ==Visualing Data in Tableau== | ||
+ | After all the steps done, the data cleaning was finished. Then exported the all the data which tagged with key words into excel and imported into Tableau. Then in the tableau plot all the key words in the map via Map(Background Images) function. The scatterplots for the key words can display the trend of the data. And the bar chart for the population shows the population distribution during the day and night time. Referred to Figure 19 & 20. | ||
+ | <gallery heights="400" widths="600"> | ||
+ | File:ZW Population.JPG|Figure 19: Population Distribution By Time | ||
+ | File:ZW Map.JPG|Figure 20: Map | ||
+ | </gallery> | ||
+ | |||
+ | =Acknowledge= | ||
+ | |||
+ | Great gratitude to: | ||
+ | |||
+ | 1. Prof. Kam Tin Seong - Providing the most painful assignment in this term. | ||
+ | |||
+ | 2. Visual Analytics and Applications Classmates(discussions during day & night): | ||
+ | * '''Deng Yuetong''' | ||
+ | * '''Fam GuoTeng''' | ||
+ | |||
+ | |||
+ | Please visit their respective webpages: | ||
+ | *https://wiki.smu.edu.sg/1718t1isss608g1/ISSS608_2017-18_T1_Assign_Fam_Guo_Teng | ||
+ | *https://wiki.smu.edu.sg/1718t1isss608g1/ISSS608_2017-18_T1_Assign_DENG_YUETONG | ||
+ | |||
+ | |||
+ | =References= | ||
+ | https://www.jmp.com/en_us/home.html | ||
+ | https://www.tableau.com/ | ||
+ | https://wiki.smu.edu.sg/1718t1isss608g1/Assignments | ||
+ | https://www.hindawi.com/journals/apm/2011/124064/ | ||
+ | |||
+ | |||
+ | =Suggestions&Feedback= | ||
+ | For any feedback or comments, please contact me at: | ||
− | + | ''ziwenhe.2016@mitb.smu.edu.sg'' |
Latest revision as of 20:41, 15 October 2017
|
|
|
|
|
Contents
Data Preparation
Transforming data
After imported the original dataset into JMP, found that two columns are dirty data which need to be cleaned. Splitting the column "Created_at" into two columns, one is "Date" and the other one is "Time". With the same method split the column "location" into "latitude" and "longitude". The step showed as Figure 12.
Key Words Selection
By reading requirement of the assignment, there are lots of the key words. With the function of Text Explorer in JMP, also found that the key words list. The top 5 key words in the list are all related to illness.
Combined all the resources, finally choose 13 key words in the report. Please referred to Figure 13.
Excluding & Hiding Data
After the above steps, choose the key words and label all the rows related to these 13 key words. And then invert selection to exclude and hide the rows which do not include all the key words. Please referred to Figure 14.
Processing Data
Tag the Key Words
In order to tag all the 13 key words, used the formula to tag all the key words and made one new column for the tags. Referred to Figure 15.
After tagged all the key words, there should be a column tag with all the key words. Referred to Figure 16.
Processing the Time
The time is in the HH:MM format. The format is not good to analyse the final results. Transformed the time into two different formats. One is WorkingHour and night. The other is WorkingHour, Evening, EarlyMorning and Midnight. Referred to Figure 17 & Figure 18.
Visualing Data in Tableau
After all the steps done, the data cleaning was finished. Then exported the all the data which tagged with key words into excel and imported into Tableau. Then in the tableau plot all the key words in the map via Map(Background Images) function. The scatterplots for the key words can display the trend of the data. And the bar chart for the population shows the population distribution during the day and night time. Referred to Figure 19 & 20.
Acknowledge
Great gratitude to:
1. Prof. Kam Tin Seong - Providing the most painful assignment in this term.
2. Visual Analytics and Applications Classmates(discussions during day & night):
- Deng Yuetong
- Fam GuoTeng
Please visit their respective webpages:
- https://wiki.smu.edu.sg/1718t1isss608g1/ISSS608_2017-18_T1_Assign_Fam_Guo_Teng
- https://wiki.smu.edu.sg/1718t1isss608g1/ISSS608_2017-18_T1_Assign_DENG_YUETONG
References
https://www.jmp.com/en_us/home.html https://www.tableau.com/ https://wiki.smu.edu.sg/1718t1isss608g1/Assignments https://www.hindawi.com/journals/apm/2011/124064/
Suggestions&Feedback
For any feedback or comments, please contact me at:
ziwenhe.2016@mitb.smu.edu.sg