Difference between revisions of "ISSS608 2017-18 T3 Chan En Ying Grace Which song belongs to thee"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(65 intermediate revisions by the same user not shown)
Line 32: Line 32:
  
 
===1. Location Matters===
 
===1. Location Matters===
Firstly, it is important to note that the 15 testing data are nowhere near the alleged dumping site, except for the four birds. It is unclear whether these 4 birds belong to Rose Pipits, as they could also belong to the Ordinary Snapes or Lesse Birchbeere clusters, which was earlier proven that their home range also exist near the dumping site.
+
Firstly, it is important to note that the 15 testing data are nowhere near the alleged dumping site, except for the 4 birds (enclosed in red rectangle). It is unclear whether these 4 birds belong to Rose Pipits, as they could also belong to the Ordinary Snapes or Lesser Birchbeer clusters, which was earlier proven that their home range also exist near the dumping site.
  
[[File:Pic2_1.png|600px]]
+
[[File:Pic25.png|600px]]
  
 
<hr/>
 
<hr/>
Line 41: Line 41:
 
Next, we convert all the MP3 files into .wav format. Wav format is preferred for many of the R packages that deal with audio files. We’ll select and work with the left channel of each file, then convert our sound array to floating point values ranging from -1 to 1.  
 
Next, we convert all the MP3 files into .wav format. Wav format is preferred for many of the R packages that deal with audio files. We’ll select and work with the left channel of each file, then convert our sound array to floating point values ranging from -1 to 1.  
  
Now, we are ready to plot the oscillogram to visualize the relationship between Amplitude and Time (seconds), for each bird species. We will colour the envelope in red, for easier visualization.
+
Using the training dataset (2081 birds), we will pick 5 birds per species and then obtain a representative one by visualizing the amplitude wave that is most common for that species. Now, we are ready to plot the oscillogram to visualize the relationship between Amplitude and Time (seconds), for each bird species. We will also outline the envelope in a different colour, for easier visualization.  
  
Using the training dataset (2081 birds), we will pick 5 birds per species and then obtain a representative one by choosing the pattern that is most common for that species. We will do this in 3 steps.  
+
Then, we will perform a distribution analysis of the acoustic parameters of all the 2081 training data to complement the aforementioned approach, followed by classification techniques. We will do this in the following 4 steps.  
  
====Approach====
+
====4-Step Approach: Envelope Plot, Oscillogram Plot, Trellis Plot & Audio Classification====
First, we will plot the envelope of each sound wave of the bird species, for a quick visualization across each of the 19 bird species. Second, we plot the osciilogram outlined by the envelope to view the amplitude pattern in greater detail, for each of the 19 species. Thirdly, we will use ggplot to obtain a trellis plot of the distribution of each parameter of the audio file (e.g. amplitude, HNR, entropy, spectrogram slope, pitch frequency) for each species to obtain the mean.  
+
1. First, we will plot the amplitude envelope of each sound wave of the bird species, for a quick visualization across each of the 19 bird species.
 +
<br> 2. Second, we plot the oscillogram to view the amplitude pattern in greater detail, for each of the 19 species.
 +
<br> 3. Thirdly, we will obtain a trellis plot of the distribution of each acoustic parameter by species type (e.g. amplitude, HNR, entropy, spectrogram slope, pitch frequency) for each species and label the mean by parameter.
 +
<br> 4. Lastly, we will attempt audio classification using Random Forest and Decision Tree algorithms to predict the bird species.
  
With these 3 techniques, we will then do the same for the 15 testing birds, and compare using the three visualisations against the 19 species of the training data. This will help us identify the bird species of the 15 testing birds of Kasios.
+
=====Visualisation=====
 +
The first 3 techniques will focus on visualisation. With these 3 complementary techniques, we will plot the same for the 15 testing birds, against the training data. Thereafter, we will compare the three visualisations against the 19 species of the training data. This will help us identify the bird species of the 15 testing birds of Kasios.
 +
 
 +
=====Classification=====
 +
We also attempted the fourth technique of classification to ascertain whether our visualisations were accurate. However, our findings later show that the chosen classification models produced 0.56 misclassification error rate and hence needs to be better refined.
  
 
<hr/>
 
<hr/>
Line 57: Line 64:
 
====Training Birds (19 Species)====
 
====Training Birds (19 Species)====
  
First, we will plot the envelope of each sound wave of the bird species, for a quick visualization across each of the 19 bird species.  
+
First, we will plot the amplitude envelope of each sound wave of the bird species, for a quick visualization across the 19 bird species.  
  
 
[[File:TrainBirds2.png|1200px]]
 
[[File:TrainBirds2.png|1200px]]
Line 65: Line 72:
 
<hr/>
 
<hr/>
  
Next, we will plot the envelope of each sound wave for the 15 test birds.
+
Next, we will plot the amplitude envelope of each sound wave for the 15 test birds.
  
 
[[File:TestBird.png|1200px]]
 
[[File:TestBird.png|1200px]]
Line 72: Line 79:
 
<hr/>
 
<hr/>
  
By visualizing the envelope of the amplitude waves of both the training and testing data, the last column shows the predicted species for each of the 15 test birds.
+
By visualizing the envelope of the amplitude envelope plots of both the training and testing data, the last column shows the predicted species for each of the 15 test birds.
 +
 
 +
2 out of 15 birds are predicted to be Rose Pipits. They are Test Bird 2 and Test Bird 9.
  
[[File:Table3.png|500px]]
+
[[File:Table3.png|700px]]
  
 
<hr/>
 
<hr/>
Line 253: Line 262:
 
====Testing Birds====
 
====Testing Birds====
 
<hr/>
 
<hr/>
The oscillograms of each of the 15 test birds are as follows:
+
The oscillograms of each of the 15 test birds are as follows.
 +
 
 +
The predicted species is indicated in the last column, after visualising and comparing the similarity of the amplitude plots. Our results show that the predicted species based on oscillogram visualisation, matches the predicted species based on envelope plot visualisation. This is not a surprise because the envelope is obtained from the oscillogram.
 +
 
 +
We plot both, because the envelope gives a quick comparison while the oscillogram provides a more indepth visualisation.
  
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
Line 273: Line 286:
 
Eastern Corn Skeet
 
Eastern Corn Skeet
 
||
 
||
Yes
+
Yes. Though, this is quite close to the Rose-Crested Pipit. However, the Pipit produces more ‘chirps’ per 100 sec, as compared to the Skeet.
 +
 
 +
|-
 +
|
 +
2
 +
||
 +
[[File:T2.png|500px]]
 +
||
 +
Rose-Crested Pipit
 +
||
 +
Yes.
 +
 
 +
|-
 +
|
 +
3
 +
||
 +
[[File:T3.png|500px]]
 +
||
 +
Queenscoat
 +
||
 +
Yes.
 +
 
 +
|-
 +
|
 +
4
 +
||
 +
[[File:T4.png|500px]]
 +
||
 +
Bombadil
 +
||
 +
Yes.
 +
 
 +
|-
 +
|
 +
5
 +
||
 +
[[File:T5.png|500px]]
 +
||
 +
Canadian Cootamum
 +
||
 +
Yes.
  
This is quite close to the Rose-Crested Pipit. However, the Pipit produces more ‘chirps’ per 100 sec, as compared to the Skeet.
+
|-
 +
|
 +
6
 +
||
 +
[[File:T6.png|500px]]
 +
||
 +
Qax
 +
||
 +
Yes.  
  
 
|-
 
|-
 
|  
 
|  
 +
7
 +
||
 +
[[File:T7.png|500px]]
 +
||
 +
Canadian Cootamum
 +
||
 +
Yes.
  
 +
|-
 +
|
 +
8
 +
||
 +
[[File:T8.png|500px]]
 +
||
 +
Green-Tipped Scarlet Pipit
 +
||
 +
Yes.
 +
 +
|-
 +
|
 +
9
 +
||
 +
[[File:T9.png|500px]]
 +
||
 +
Rose-Crested Blue Pipit
 +
||
 +
Yes.
 +
 +
|-
 +
|
 +
10
 +
||
 +
[[File:T10.png|500px]]
 +
||
 +
Qax
 +
||
 +
Yes.
 +
 +
|-
 +
|
 +
11
 +
||
 +
[[File:T11.png|500px]]
 +
||
 +
Scrawny Jay
 +
||
 +
Yes.
 +
 +
|-
 +
|
 +
12
 +
||
 +
[[File:T12.png|500px]]
 +
||
 +
Qax
 +
||
 +
Yes.
 +
 +
|-
 +
|
 +
13
 +
||
 +
[[File:T13.png|500px]]
 +
||
 +
Qax
 +
||
 +
Yes.
 +
 +
 +
|-
 +
|
 +
14
 +
||
 +
[[File:T14.png|500px]]
 +
||
 +
Bombadil
 +
||
 +
Yes.
 +
 +
 +
|-
 +
|
 +
15
 +
||
 +
[[File:T15.png|500px]]
 +
||
 +
Pinkfinch
 +
||
 +
Yes.
  
 
|}
 
|}
 
<br>
 
<br>
 +
 +
<hr/>
  
 
===5. Trellis Plot (Distribution Analysis) ===
 
===5. Trellis Plot (Distribution Analysis) ===
  
A caveat to the above is that we did not make use of all the training birds in the visualisation. Rather, we pick about 5 birds per species and then selected 1 to represent the entire species.  
+
A caveat to the previous analysis is that we did not make use of all the training birds in the visualisation. Rather, we randomly selected 5 birds per species to visualise, and then chose 1 to represent the entire species. Thus, we now make use of all the training birds by plotting the distributions across the parameters.
 +
 
 +
====Narrowed Down 7 Acoustic Parameters====
 +
To make the analysis more rigorous, we will use analyzeFolder() to obtain the individual values of each parameters of the audiofile. AnalyseFolder() provides an acoustic analysis of all .wav files in a folder.
 +
 
 +
There is a total of 15 parameters, out of which, 7 are chosen as these 7 parameters have greater distinction between the species. The 7 parameters are: <b> dom_median, HNR_median,mean, Freq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median</b>.
 +
 
 +
====Distributions====
 +
The trellis plot of the 7 parameters of the training birds is shown below, where the mean is indicated by the black solid line.
 +
 
 +
Next, we will plot each of the 15 testing birds from Kasios onto this plot, in blue dotted line. We will then select the closest species for each parameter. The species with the most parameters selected will be assigned as the predicted species.
 +
 
 +
Given that Test Bird 2 and Test Bird 9 were predicted to be Rose-Crested Blue Pipits, we will focus on these two birds for visualisation.
 +
 
 +
=====Test Bird 2 Against Training Birds=====
 +
<b>The following shows the trellis plot of <u>Test Bird 2</u> (in blue dotted line) against the 2081 Training Birds (black solid line).</b>
 +
 
 +
The ticks in green represent the top 3 (or more if there is a tie) closest training species to the testing species, by parameter. The species with the highest ticks (i.e. closest to the testing bird) will be selected as the predicted species.
 +
 
 +
Based on this, Test Bird 2 is predicted to be a <b>Qax</b>.
 +
 
 +
[[File:TB9_c.jpg|1200px]]
 +
 
 +
=====Test Bird 9 Against Training Birds=====
 +
<b>The following shows the trellis plot of <u>Test Bird 9</u> (in blue dotted line) against the 2081 Training Birds (black solid line).</b>
 +
 
 +
The ticks in green represent the top 3 (or more if there is a tie) closest training species to the testing species, by parameter. The species with the highest ticks (i.e. closest to the testing bird) will be selected as the predicted species.
 +
 
 +
Based on this, Test Bird 9 seems like a <b>Vermillion Trillian</b>.
 +
 
 +
[[File:TB2_c.jpg|1200px]]
 +
 
 +
===Results===
 +
 
 +
From the above plots, based on the most number of parameters closest to the species’ mean, Test Bird 2 seems like a Qax while Test Bird 9 seems like a Vermillion Trillian. Unfortunately, this does not match our earlier predictions by visualizing the amplitude plot. We conclude that this method may not be ideal as it is a numerical representation, while the amplitude plots are more likely to be more reflective (though less representative of the entire training population).
 +
 
 +
As such, we will rely on Method 1 (Envelope Plot) & Method 2 (Oscillogram Plot), and leave Method 3 (Trellis Plot) out from our concluding hypothesis. The results are as follows, where Test Bird 2 and 9 are predicted to be Rose Pipits. But before that, let's also try audio classification to see if it supports our results.
 +
 
 +
[[File:Graph2.png|700px]]
  
To make the analysis more rigorous, we will use analyzeFolder() to obtain the distribution analysis of the parameters of the audiofile. There is a total of 15 parameters, out of which, 7 are chosen as these 7 parameters have greater distinction between the species.
+
<hr/>
  
The 7 parameters are: dom_median, HNR_median,mean, Freq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median.
+
===6. Audio Classification===
 +
We also attempted classification to predict the bird species - first by experimenting Decision Tree and then Random Forest.  
  
The trellis plot of the 7 parameters of the training birds, is as follows:
+
After extracting the dataframe of the training data using `analyzeFolder`, we set aside 70% of the 2081 birds as training data, and 30% as validation data. The 15 birds from Kasios form the testing data.
  
<!!!insert Trellis of Training>
+
===Method 1: Decision Tree===
  
Next, we will plot each of the 15 testing birds from Kasios onto this plot. We will then select the closest species for each parameter. The species with the most parameters selected will be assigned as the predicted species.
+
The decision tree produced a high misclassification error rate of 0.574.  
  
Given that Test Bird 2 and Test Bird 9 were predicted to be Rose-Crested Blue Pipits, we will focus on these two birds for visualisation. (To view the distribution plot of rest of the birds, please refer to the Annex).  
+
Based on the Decision Tree Model, Test Bird 2 was predicted as a Lesser Birchbeere (with a low probability of 0.51) while Test Bird 9 was predicted as a Green Tipped Scarlet Pipit (with a low probability of 0.41).  
  
<!!!insert TB2>
+
This is contrary to our earlier predictions. Out of the 15 predictions, only 1 matches, and that's Test Bird 7 (in green below). Given that the misclassification rate is rather high (57%), we should not rely on our classification results from the Decision Tree model. While decision trees are easy to interpret, they tend to be more simplistic and don't produce good accuracy rates.
  
<!!!insert TB9>
+
[[File:TableDT.png|1250px]]
  
====Results====
+
===Method 2: Random Forest===
  
From the above plots, based on the most number of parameters closest to the species’ mean, Test Bird 2 seems like a Qax while Test Bird 9 seems like a Vermillion Trillon. Unfortunately, this does not match our earlier predictions by visualizing the amplitude plot. We conclude that this method may not be ideal as it is a numerical representation, while the amplitude plots are more likely to be more reflective (though less representative of the entire training population).
+
Instead, we use Random Forest to improve the performance of decision trees. The algorithm starts by building out trees. We attempt 3 different Random Forest models, by fine-tuning the parameters to reduce misclassification rate. First, we set no. of trees = 500, mtry =2 (Model 1). Next, we set mtry to 6 (Model 2) and mtry = 10 (Model 3).  
  
<!!!Insert Table>
+
Unfortunately, the lowest classification rate is 0.5565 (Model 2 & 3), which is low and only slightly better than the Decision Tree model. Moreover, not only did the predicted results not match our visualisation plots, the table below shows that the predicted results did not match that of the Decision Tree either.
  
Let’s analyse the coordinates to understand whether these 2 predicted birds are found near the Rose Pipit clusters. In fact, these two birds (represented in green below) are not found in the two clusters near the dumping site (represented in blue). But they did appear together, which makes sense since birds of the same species tend to fly together.  
+
We will thus not rely on the predicted results from classification. More work on fine-tuning the model needs to be done.  
  
<!!!Insert Graph of coords>
+
[[File:TableRF.png|1200px]]
  
Moreover, only 2 out of the 15 birds have resemblance to the Rose Pipits. It is likely that the Rose Pipits were indeed affected by the dumping of Kasios.  
+
<hr/>
 +
 
 +
=== 7. Visualisation or Classification?===
 +
 
 +
Visualisation.
 +
 
 +
In my opinion, classification is not a good method for predicting bird species at all. This is because, the data obtained is actually the same as that used in the Trellis Plots (i.e. the 15 parameters representing sound e.g. amplitude mean). Bird calls across species may have similar amplitude mean, pitch frequency etc, but are different in nature. We should look at the shape (wave pattern), than at the statistical parameters.  
 +
 
 +
Thus, the best method is still visualising the oscillogram and amplitude envelopes.
 +
 
 +
<hr/>
  
 +
===8. Where Did the 2 Suspected Rose Pipits Come From? ===
  
==Concluding Hypothesis==
+
====Coordinate Plot====
 +
Going back to our earlier findings from the visualisation plots that Birds 2 and 9 are likely to be Rose Pipits, let’s analyse their location coordinates to understand whether these 2 predicted birds are found near the Rose Pipit clusters.
  
The Pipits are surviving, but not thriving across the Boonsong Preserve. There is likely to be dumping given that 2015 was the year of change across many variables.  
+
In fact, these two predicted-to-be-Pipit birds (represented in green below) are not found in the two clusters near the dumping site. But they did appear together, which makes sense since birds of the same species tend to fly together, lending credibility to our prediction by visualisation.
  
However, since the Control Groups were not affected, one hypothesis is that the dumping only affected the Rose Pipits due to the biological made-up of the Rose Pipits. An alternative hypothesis is that it was the dumping but, another event that affected the Rose Pipits only.
+
[[File:Graph1.png|800px]]
  
==Next Steps to be Taken==
+
==<b>Concluding Hypothesis: Pipits NOT being found across the Preserve, based on Kasios Birds</b>==
 +
====Key Observations====
 +
1. Only 2 out of the 15 birds have resemblance to the Rose Pipits.
 +
<br>2. These 2 birds were not found near the dumping site, neither were they found in the previous 2 clusters identified.
  
To test our hypothesis to determine whether it is the dumping that caused it, we can conduct a Randomised Control Trial. Introduce the dumping substance and put a Rose Pipit and an Ordinary Snape together. Make all other variables constant. If only the Rose Pipit dies, while the Ordinary Snape survives, then our hypothesis is correct.
+
====Hypothesis: Pipits not found across preserve====
 +
Given that only 2 of the 15 birds provided by Kasios were likely to be Pipits, Kasios' claim that the Pipits were thriving across the Preserve is doubted. Based on the set of bird calls supplied by Kasios, it<b> does not</b> support the claim of Pipits being found across the Preserve.

Latest revision as of 01:11, 1 July 2018

Rose Pipits.png “Mine dear rose pipits, whence did do thou vanish?”

Background

Methodology

Did Rose Pipit kicketh the bucket?

Which song belongs to thee?

Conclusion

 


Audio Visualisation: "Which song belongs to thee, Rose Pipit?"

1. Location Matters

Firstly, it is important to note that the 15 testing data are nowhere near the alleged dumping site, except for the 4 birds (enclosed in red rectangle). It is unclear whether these 4 birds belong to Rose Pipits, as they could also belong to the Ordinary Snapes or Lesser Birchbeer clusters, which was earlier proven that their home range also exist near the dumping site.

Pic25.png


2. Data Preparation

Next, we convert all the MP3 files into .wav format. Wav format is preferred for many of the R packages that deal with audio files. We’ll select and work with the left channel of each file, then convert our sound array to floating point values ranging from -1 to 1.

Using the training dataset (2081 birds), we will pick 5 birds per species and then obtain a representative one by visualizing the amplitude wave that is most common for that species. Now, we are ready to plot the oscillogram to visualize the relationship between Amplitude and Time (seconds), for each bird species. We will also outline the envelope in a different colour, for easier visualization.

Then, we will perform a distribution analysis of the acoustic parameters of all the 2081 training data to complement the aforementioned approach, followed by classification techniques. We will do this in the following 4 steps.

4-Step Approach: Envelope Plot, Oscillogram Plot, Trellis Plot & Audio Classification

1. First, we will plot the amplitude envelope of each sound wave of the bird species, for a quick visualization across each of the 19 bird species.
2. Second, we plot the oscillogram to view the amplitude pattern in greater detail, for each of the 19 species.
3. Thirdly, we will obtain a trellis plot of the distribution of each acoustic parameter by species type (e.g. amplitude, HNR, entropy, spectrogram slope, pitch frequency) for each species and label the mean by parameter.
4. Lastly, we will attempt audio classification using Random Forest and Decision Tree algorithms to predict the bird species.

Visualisation

The first 3 techniques will focus on visualisation. With these 3 complementary techniques, we will plot the same for the 15 testing birds, against the training data. Thereafter, we will compare the three visualisations against the 19 species of the training data. This will help us identify the bird species of the 15 testing birds of Kasios.

Classification

We also attempted the fourth technique of classification to ascertain whether our visualisations were accurate. However, our findings later show that the chosen classification models produced 0.56 misclassification error rate and hence needs to be better refined.


3. Envelope Plot

Amplitude envelope refers to the changes in the amplitude of a sound over time. It is an important property of sound, because it is what allows us to effortlessly identify sounds, and uniquely distinguish them from other sounds. We will thus use envelope plot to distinguish bird calls/songs.

Training Birds (19 Species)

First, we will plot the amplitude envelope of each sound wave of the bird species, for a quick visualization across the 19 bird species.

TrainBirds2.png

Testing Birds (15 Birds Identified By Kasios)


Next, we will plot the amplitude envelope of each sound wave for the 15 test birds.

TestBird.png

Predicted Results, Based on Visualisation


By visualizing the envelope of the amplitude envelope plots of both the training and testing data, the last column shows the predicted species for each of the 15 test birds.

2 out of 15 birds are predicted to be Rose Pipits. They are Test Bird 2 and Test Bird 9.

Table3.png


4. Oscillogram Comparison

For confirmation, let us also look at the oscillogram which gives us not only the envelope but also the waves. Oscillogram is the plot of relative amplitude vs. time in seconds (while a Spectrogram is the plot of frequency in kHz vs. time in seconds).

Training Birds


The oscillograms of each of the 19 species of the training birds are as follows:

No.

Species Name

Oscillogram

1.

Bent Beat Riffraff

O1.png

2.

Blue Collared Zipper

O2.png

3.

Bombadil

O3.png

4.

Broad Winged Jojo

O4.png

5.

Canadian Cootamum

O5.png

6.

Carries Champagne Pipit

O6.png

7.

Darkwing Sparrow

O7.png

8.

Eastern Corn Skeet

O8.png

9.

Green Tipped Scarlet Pipit

O9.png

10.

Lesser Birchbeere

O10.png

11.

Orange Pine Plover

O11.png

12.

Ordinary Snape

O12.png

13.

Pinkfinch

O13.png


14.

Purple Tooting Tout

O14.png


15.

Qax

O15.png

16.

Queenscoat

O16.png

17.

Rose-Crested Blue Pipit

O17.png

18.

Scrawny Jay

O18.png

19.

Vermillion Trillian

O19.png


Testing Birds


The oscillograms of each of the 15 test birds are as follows.

The predicted species is indicated in the last column, after visualising and comparing the similarity of the amplitude plots. Our results show that the predicted species based on oscillogram visualisation, matches the predicted species based on envelope plot visualisation. This is not a surprise because the envelope is obtained from the oscillogram.

We plot both, because the envelope gives a quick comparison while the oscillogram provides a more indepth visualisation.

ID

Oscillogram

Predicted Species

Same as earlier predicted by envelope?

1

T1.png

Eastern Corn Skeet

Yes. Though, this is quite close to the Rose-Crested Pipit. However, the Pipit produces more ‘chirps’ per 100 sec, as compared to the Skeet.

2

T2.png

Rose-Crested Pipit

Yes.

3

T3.png

Queenscoat

Yes.

4

T4.png

Bombadil

Yes.

5

T5.png

Canadian Cootamum

Yes.

6

T6.png

Qax

Yes.

7

T7.png

Canadian Cootamum

Yes.

8

T8.png

Green-Tipped Scarlet Pipit

Yes.

9

T9.png

Rose-Crested Blue Pipit

Yes.

10

T10.png

Qax

Yes.

11

T11.png

Scrawny Jay

Yes.

12

T12.png

Qax

Yes.

13

T13.png

Qax

Yes.


14

T14.png

Bombadil

Yes.


15

T15.png

Pinkfinch

Yes.



5. Trellis Plot (Distribution Analysis)

A caveat to the previous analysis is that we did not make use of all the training birds in the visualisation. Rather, we randomly selected 5 birds per species to visualise, and then chose 1 to represent the entire species. Thus, we now make use of all the training birds by plotting the distributions across the parameters.

Narrowed Down 7 Acoustic Parameters

To make the analysis more rigorous, we will use analyzeFolder() to obtain the individual values of each parameters of the audiofile. AnalyseFolder() provides an acoustic analysis of all .wav files in a folder.

There is a total of 15 parameters, out of which, 7 are chosen as these 7 parameters have greater distinction between the species. The 7 parameters are: dom_median, HNR_median,mean, Freq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median.

Distributions

The trellis plot of the 7 parameters of the training birds is shown below, where the mean is indicated by the black solid line.

Next, we will plot each of the 15 testing birds from Kasios onto this plot, in blue dotted line. We will then select the closest species for each parameter. The species with the most parameters selected will be assigned as the predicted species.

Given that Test Bird 2 and Test Bird 9 were predicted to be Rose-Crested Blue Pipits, we will focus on these two birds for visualisation.

Test Bird 2 Against Training Birds

The following shows the trellis plot of Test Bird 2 (in blue dotted line) against the 2081 Training Birds (black solid line).

The ticks in green represent the top 3 (or more if there is a tie) closest training species to the testing species, by parameter. The species with the highest ticks (i.e. closest to the testing bird) will be selected as the predicted species.

Based on this, Test Bird 2 is predicted to be a Qax.

TB9 c.jpg

Test Bird 9 Against Training Birds

The following shows the trellis plot of Test Bird 9 (in blue dotted line) against the 2081 Training Birds (black solid line).

The ticks in green represent the top 3 (or more if there is a tie) closest training species to the testing species, by parameter. The species with the highest ticks (i.e. closest to the testing bird) will be selected as the predicted species.

Based on this, Test Bird 9 seems like a Vermillion Trillian.

TB2 c.jpg

Results

From the above plots, based on the most number of parameters closest to the species’ mean, Test Bird 2 seems like a Qax while Test Bird 9 seems like a Vermillion Trillian. Unfortunately, this does not match our earlier predictions by visualizing the amplitude plot. We conclude that this method may not be ideal as it is a numerical representation, while the amplitude plots are more likely to be more reflective (though less representative of the entire training population).

As such, we will rely on Method 1 (Envelope Plot) & Method 2 (Oscillogram Plot), and leave Method 3 (Trellis Plot) out from our concluding hypothesis. The results are as follows, where Test Bird 2 and 9 are predicted to be Rose Pipits. But before that, let's also try audio classification to see if it supports our results.

Graph2.png


6. Audio Classification

We also attempted classification to predict the bird species - first by experimenting Decision Tree and then Random Forest.

After extracting the dataframe of the training data using `analyzeFolder`, we set aside 70% of the 2081 birds as training data, and 30% as validation data. The 15 birds from Kasios form the testing data.

Method 1: Decision Tree

The decision tree produced a high misclassification error rate of 0.574.

Based on the Decision Tree Model, Test Bird 2 was predicted as a Lesser Birchbeere (with a low probability of 0.51) while Test Bird 9 was predicted as a Green Tipped Scarlet Pipit (with a low probability of 0.41).

This is contrary to our earlier predictions. Out of the 15 predictions, only 1 matches, and that's Test Bird 7 (in green below). Given that the misclassification rate is rather high (57%), we should not rely on our classification results from the Decision Tree model. While decision trees are easy to interpret, they tend to be more simplistic and don't produce good accuracy rates.

TableDT.png

Method 2: Random Forest

Instead, we use Random Forest to improve the performance of decision trees. The algorithm starts by building out trees. We attempt 3 different Random Forest models, by fine-tuning the parameters to reduce misclassification rate. First, we set no. of trees = 500, mtry =2 (Model 1). Next, we set mtry to 6 (Model 2) and mtry = 10 (Model 3).

Unfortunately, the lowest classification rate is 0.5565 (Model 2 & 3), which is low and only slightly better than the Decision Tree model. Moreover, not only did the predicted results not match our visualisation plots, the table below shows that the predicted results did not match that of the Decision Tree either.

We will thus not rely on the predicted results from classification. More work on fine-tuning the model needs to be done.

TableRF.png


7. Visualisation or Classification?

Visualisation.

In my opinion, classification is not a good method for predicting bird species at all. This is because, the data obtained is actually the same as that used in the Trellis Plots (i.e. the 15 parameters representing sound e.g. amplitude mean). Bird calls across species may have similar amplitude mean, pitch frequency etc, but are different in nature. We should look at the shape (wave pattern), than at the statistical parameters.

Thus, the best method is still visualising the oscillogram and amplitude envelopes.


8. Where Did the 2 Suspected Rose Pipits Come From?

Coordinate Plot

Going back to our earlier findings from the visualisation plots that Birds 2 and 9 are likely to be Rose Pipits, let’s analyse their location coordinates to understand whether these 2 predicted birds are found near the Rose Pipit clusters.

In fact, these two predicted-to-be-Pipit birds (represented in green below) are not found in the two clusters near the dumping site. But they did appear together, which makes sense since birds of the same species tend to fly together, lending credibility to our prediction by visualisation.

Graph1.png

Concluding Hypothesis: Pipits NOT being found across the Preserve, based on Kasios Birds

Key Observations

1. Only 2 out of the 15 birds have resemblance to the Rose Pipits.
2. These 2 birds were not found near the dumping site, neither were they found in the previous 2 clusters identified.

Hypothesis: Pipits not found across preserve

Given that only 2 of the 15 birds provided by Kasios were likely to be Pipits, Kasios' claim that the Pipits were thriving across the Preserve is doubted. Based on the set of bird calls supplied by Kasios, it does not support the claim of Pipits being found across the Preserve.