ISSS608 2017-18 T3 Assign Li Hongxin Methodology
Revision as of 20:42, 6 July 2018 by Hongxin.li.2017 (talk | contribs)
|
|
|
|
Contents
Tools
a. R: used for data cleaning.
Packages: tidyverse
b. Tableau: used for Map & Pattern visualization.
c. Python: used for density visualization, audio visualization and audio classification.
Packages: os, glob, pandas, numpy, matplotlib, seaborn, librosa, sklearn
Process for Data Preparation
The following are key steps for data cleaning, and data manipulation for further visualization and analysis.
Step 1: Deal with Missing Values. Replace all symbols such as "?", "??:??" in Time, and "No score" in Quality which
stand for missing values, into NA.
Step 2: Fix Data Quality Issues. Transform all letters into uppercase for convenience, and remove extra spaces and "?".
Step 3: Unify the Date & Time Format. Transform all Date into "%Y-%m-%d" format. If the raw data doesn't contain month
or day info, we impute the data as "-01-"(January) and "-01"(the first day). Transform all Time into "HH:mm" format and use
standardized all times into 24 hour formatting. If raw data doesn't contain minute info, set it as "00". If raw data contain
letters such as "morning", or "dawning", imputed them into "08:00" or "18:00".
Step 4: Modify Data Types. Change X and Y coordinate from character into int.
Pattern Visualization and Analysis
b
Audio Visualization and Classification
c