ISSS608 2017-18 T3 Assign Akanksha Shrirang Yadav Insights

From Visual Analytics and Applications
Jump to navigation Jump to search

MC1 AkankshaYadav.jpg VAST Mini Challenge 1: Are The Beloved Pipits Disappearing?

Overview

Data Preparation & Approach

Discovering Trends - Exploratory Data Analysis

Is Kasios Singing A True Song?

The State Of Pipits

Referecnes

 


Analysis and Classification of Audio Files Through Visualizations

For analysis of question 2, two different approaches were implemented in order to evaluate the legitimacy of Kasios’ claim about the pipits.

  • Comparison of Oscillograms of the Audio Files
  • Comparison of Distribution of Plots Selected Audio Features with the Features of the Test Files

These methods were compared at the end to see if they produced similar or different results.

As the number of samples for each species is not sufficient enough, running a machine learning algorithm over such training data might yield poor results with low accuracy. Hence, in this analysis, the only focus was to do visual comparison of the plots obtained through both approaches.


Other approaches tried:

Spectrogram of all the files were plotted, but were not discernible enough for comparison of the audio files. Additionally, time measurement plots were also plotted to see if they can be useful in analysis. However, similar issue occurred while interpreting the plot. Hence, ultimately a simple oscillogram was chosen instead.


Conversion Of MP3 Audio Files to WAV

All the visualizations were built in R. As majority of the R packages can only read audio file in ‘.WAV’ format, all the audio files (All Birds & Test) were converted to ‘.WAV’ format using tuneR package.


Approach 1 - Oscillogram Comparison

  • The provided audio files of all the birds have quality scores associated with them wherein the value of this score can be A, B, C, D, E or No Score. However, no description is provided about the different quality scores. For the analysis, audio files without any score were not taken into consideration.
  • Hence, prior to beginning with the analysis, a set of 5 files were chosen with quality scores A, B, C, D & E for “Rose-crested Blue Pipit”. Out of these 5 files, the file with score A was found to be the clearest & without much noise. All other files were found to have varying degrees of noise in them.

The oscillogram plots of the chosen files are as shown below:

Score A file
T3 Assign QualityA.png
Score B file
T3 Assign QualityB.png
Score C file
T3 Assign QualityC.png
Score D file
T3 Assign QualityD.png
Score E file
T3 Assign QualityE.png


Oscillogram Plots of 19 Training Birds

19 training birds were chosen one from each species of the birds and their oscillograms were plotted. Similarly, oscillograms were plotted for the test files provided by Kasios. All the test plots were compared with the chosen 19 birds individually to obtain the best estimate

Now, let’s look at the oscillogram plots of 19 species of the birds:

Training Bird 1- Bent-beak Riffraff
T3 Assign Audio1.png
Training Bird 2 - Blue-collared Zipper
T3 Assign Audio2.png
Training Bird 3 - Bombadil
T3 Assign Audio3.png
Training Bird 4 - Broad-winged Jojo
T3 Assign Audio4.png
Training Bird 5 - Canadian Cootamum
T3 Assign Audio5.png
Training Bird 6 - Carries Champagne Pipit
T3 Assign Audio6.png
Training Bird 7 - Darkwing Sparrow
T3 Assign Audio7.png
Training Bird 8 - Eastern Corn Skeet
T3 Assign Audio8.png
Training Bird 9 - Green-tipped Scarlet Pipit
T3 Assign Audio9.png
Training Bird 10 - Lesser Birchbeere
T3 Assign Audio10.png
Training Bird 11 - Orange Pine Plover
T3 Assign Audio11.png
Training Bird 12 - Ordinary Snape
T3 Assign Audio12.png
Training Bird 13 - Pinkfinch
T3 Assign Audio13.png
Training Bird 14 - Purple Tooting Tout
T3 Assign Audio14.png
Training Bird 15 - Qax
T3 Assign Audio15.png
Training Bird 16 - Queenscoat
T3 Assign Audio16.png
Training Bird 17 - Rose-crested Blue Pipit
T3 Assign Audio17.png
Training Bird 18 - Scrawny Jay
T3 Assign Audio18.png
Training Bird 19 - Vermillion Trillian
T3 Assign Audio19.png



Comparison Of Test Samples With The Oscillograms Of All Bird Recordings


Test Sample 1:

T3 Assign Test1 Comparison.png


Test Sample 2:

T3 Assign Test2 Comparison.png


Test Sample 3:

T3 Assign Test3 Comparison.png


Test Sample 4:

T3 Assign Test4 Comparison.png


Test Sample 5:

T3 Assign Test5 Comparison.png


Test Sample 6:

T3 Assign Test6 Comparison.png


Test Sample 7:

T3 Assign Test7 Comparison.png


Test Sample 8:

T3 Assign Test8 Comparison.png


Test Sample 9:

T3 Assign Test9 Comparison.png


Test Sample 10:

T3 Assign Test10 Comparison.png


Test Sample 11:

T3 Assign Test11 Comparison.png


Test Sample 12:

T3 Assign Test12 Comparison.png


Test Sample 13:

T3 Assign Test13 Comparison.png


Test Sample 14:

T3 Assign Test14 Comparison.png


Test Sample 15:

T3 Assign Test15 Comparison.png


After comparing the test oscillogram plots obtained with train plots & performing visual analysis, the estimated

Estimated Outcome:

Test Samples Estimated Outcome
Test Sample 1 Scrawny-Jay
Test Sample 2 Ordinary Snape
Test Sample 3 Rose-Crested-Blue-Pipit
Test Sample 4 Bent-Beak-Riffraff
Test Sample 5 Green-Tipped-Scarlet-Pipit
Test Sample 6 Ordinary-Snape
Test Sample 7 Canadian-Cootamum
Test Sample 8 Lesser Birchbeere
Test Sample 9 Queenscoat
Test Sample 10 Orange-Pine-Plover
Test Sample 11 Qax
Test Sample 12 Vermillion-Trillian
Test Sample 13 Rose-crested Blue Pipit
Test Sample 14 Queenscoat
Test Sample 6 Carries-Champagne-Pipit


Approach 2 – Comparison of the Distribution Plots

For this analysis, various features of the audio files were retrieved using ‘analyzeFolder’ function found in ‘soundgen’ package in R. This function has the capability of processing batch files as well as also outputs the result in a summary dataframe which contains all the extracted features for all the files.

70 such features were obtained from the audio files. To reduce the number of variables, correlation plot was obtained to discard highly correlated variables. The variables were then clustered in hierarchical clusters using ‘Variable Clustering’ method in SAS JMP. Finally, the following 6 features were selected for further analysis:

  • ampleVoiced_mean
  • HNR_mean
  • peakFreq_mean
  • pitchAutocor_median
  • pitchSpec_mean
  • quartile50_me

Next, the distributions for these variables for all the 19 species were plotted.

Also, similar features were obtained for the test audio files.

The 6 features of the test files were individually compared with the 6 features of 19 species using density plot in R. The test sample bird was classified with the species with most number of features closer to the means of the respective features for that species.

The resultant plots are displayed here for a sample bird for illustration -> Test Sample 3

Test Sample 3 was estimated to be "Orange-Pine Plover"

Bent-beak Riffraff
T3 Assign Species1 attributes.png
Blue-collared Zipper
T3 Assign Species2 attributes.png
Bombadil
T3 Assign Species3 attributes.png
Broad-winged Jojo
T3 Assign Species4 attributes.png
Canadian Cootamum
T3 Assign Species5 attributes.png
Carries Champagne Pipit
T3 Assign Species6 attributes.png
Darkwing Sparrow
T3 Assign Species7 attributes.png
Eastern Corn Skeet
T3 Assign Species8 attributes.png
Green-tipped Scarlet Pipit
T3 Assign Species9 attributes.png
Lesser Birchbeere
T3 Assign Species10 attributes.png
Orange Pine Plover
T3 Assign Species11 attributes.png
Ordinary Snape
T3 Assign Species12 attributes.png
Pinkfinch
T3 Assign Species13 attributes.png
Purple Tooting Tout
T3 Assign Species14 attributes.png
Qax
T3 Assign Species15 attributes.png
Queenscoat
T3 Assign Species16 attributes.png
Rose-crested Blue Pipit
T3 Assign Species17 attributes.png
Scrawny Jay
T3 Assign Species18 attributes.png
Vermillion Trillian
T3 Assign Species19 attributes.png


Upon visual examination and comparison of the plots, the estimated outcome is:

Estimated Outcome:

Test Samples Estimated Outcome
Test Sample 1 Easter Corn Skeet
Test Sample 2 Rose-crested Blue Pipit
Test Sample 3 Orange-Pine-Plover
Test Sample 4 Bombadil
Test Sample 5 Green-Tipped-Scarlet-Pipit
Test Sample 6 Scrawny-Jay
Test Sample 7 Canadian-Cootamum
Test Sample 8 Ordinary-Snape
Test Sample 9 Vermillion-Trillian
Test Sample 10 Qax
Test Sample 11 Qax
Test Sample 12 Easter Corn Skeet
Test Sample 13 Rose-crested Blue Pipit
Test Sample 14 Queenscoat
Test Sample 6 Pinkfinch

Note that, both the outcomes do not match as they are based on only visual inspection. Although, by using the second approach, an attempt is made to classify the birds based on the statistical distributions of the species, it is still uncertain to an extent.


Plotting Test Samples By Kasios

Both approaches predicted 2 samples as "Rose-crested Blue Pipit", albeit 1 sample differed.

Plot for the test birds using the results from distribution analysis:

T3 Assign Test Samples Plot.png


This analysis does prove that the claim made by Kasios about "Pipits being found across the preserve" does not hold good. In fact, out of a meager 15 samples provided by Kasios, only 2 are estimated to be Pipits. Additionally, these 2 Pipits were not found anywhere near the dumping site.