Difference between revisions of "ISSS608 2017-18 T3 Assign Li Hongxin Methodology"
Line 112: | Line 112: | ||
|| | || | ||
<u> i. Feature Extraction</u> | <u> i. Feature Extraction</u> | ||
− | * | + | <br>5 types of features were selected, and by combining them we got total 193 features for each bird song |
− | <u> ii. Classification Methods</u> | + | * Mel-frequency cepstral coefficients (MFCC) |
+ | * Chromagram of a short-time Fourier transform | ||
+ | * Mel-scaled power spectrogram | ||
+ | * Octave-based spectral contrast | ||
+ | * Tonnetz | ||
+ | <u> ii. Data Partition and Feature Labels</u> | ||
+ | * Out of 2081 bird songs, set 70% as training data and 30% as test data | ||
+ | * Add labels which stand for the name of bird species to each feature | ||
+ | <u> iii. Classification Methods</u> | ||
* Logistic Regression | * Logistic Regression | ||
* SVM | * SVM |
Revision as of 11:05, 7 July 2018
|
|
|
|
Contents
Tools
a. R: used for data cleaning.
Packages: tidyverse
b. Tableau: used for Map & Pattern visualization.
c. Python: used for density visualization, audio visualization and audio classification.
Packages: os, glob, pandas, numpy, matplotlib, seaborn, librosa, sklearn
Process for Data Preparation
The following are 5 key steps for data cleaning, and data manipulation for further visualization and analysis.
Step 1: Deal with Missing Values. Replace all symbols such as "?", "??:??" in Time, and "No score" in Quality which
stand for missing values, into NA.
Step 2: Fix Data Quality Issues. Transform all letters into uppercase for convenience, and remove extra spaces and "?".
Step 3: Unify the Date & Time Format. Transform all Date into "%Y-%m-%d" format. If the raw data doesn't contain month or day
info, we impute the data as "-01-"(January) and "-01"(the first day). Transform all Time into "HH:mm" format and use
standardized all times into 24 hour formatting. If raw data doesn't contain minute info, set it as "00". If raw data contain
letters such as "morning", or "dawning", imputed them into "08:00" or "18:00".
Step 4: Modify Data Types. Change X and Y coordinate from character into int.
Step 5: Create Season and Timeslot variables based on Date and Time. For example, set March to May as Spring ,and set 06:00
to 12:00 as "Morning".
Pattern Visualization and Analysis
Approach |
Description |
Geo-spatial Visualization
|
i. Scatter Plot on Map
ii. Kernel Density Plot
|
Trend Visualization
|
Area/Line Graph
|
Interactive Dashboard
|
Combine the result of Geo-spatial visualization and trend visualization
|
Audio Data Analysis
Approach |
Description |
Audio Visualization |
i. Waveplot
ii. Specgram
|
Audio Classification |
i. Feature Extraction
ii. Data Partition and Feature Labels
iii. Classification Methods
|