Difference between revisions of "ISSS608 2017-18 T3 Assign Li Hongxin Methodology"
Jump to navigation
Jump to search
Line 39: | Line 39: | ||
The following are key steps for data cleaning, and data manipulation for further visualization and analysis. | The following are key steps for data cleaning, and data manipulation for further visualization and analysis. | ||
− | <b>Step 1: </b> | + | <b>Step 1: Deal with Missing Values. </b>Replace all symbols such as "?", "??:??" in Time, and "No score" in Quality which <br>stand for missing values, into NA. |
+ | <b>Step 2: Fix Data Quality Issues. </b>Transform all letters into uppercase for convenience, and remove extra spaces and "?". | ||
+ | <b>Step 3: Unify the Date & Time Format. </b>Transform all Date into "%Y-%m-%d" format and Time into "HH:mm" format. | ||
+ | |||
+ | <b>Step 4: Modify Data Types </b>Change X and Y coordinate from character into int. | ||
==Pattern Visualization and Analysis== | ==Pattern Visualization and Analysis== |
Revision as of 20:28, 6 July 2018
|
|
|
|
Contents
Tools
a. R: used for data cleaning.
Packages: tidyverse
b. Tableau: used for Map & Pattern visualization.
c. Python: used for density visualization, audio visualization and audio classification.
Packages: os, glob, pandas, numpy, matplotlib, seaborn, librosa, sklearn
Process for Data Preparation
The following are key steps for data cleaning, and data manipulation for further visualization and analysis.
Step 1: Deal with Missing Values. Replace all symbols such as "?", "??:??" in Time, and "No score" in Quality which
stand for missing values, into NA.
Step 2: Fix Data Quality Issues. Transform all letters into uppercase for convenience, and remove extra spaces and "?".
Step 3: Unify the Date & Time Format. Transform all Date into "%Y-%m-%d" format and Time into "HH:mm" format.
Step 4: Modify Data Types Change X and Y coordinate from character into int.
Pattern Visualization and Analysis
b
Audio Visualization and Classification
c