Difference between revisions of "ISSS608 2016-17 T3 Assign Chan En Ying Grace Methodology"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
[[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace|<b><font size="3"><font color="#000000">Background</font></font></b>]]
 
[[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace|<b><font size="3"><font color="#000000">Background</font></font></b>]]
  
| style="font-family:Arial; font-size:100%; solid #000000; background:#E6E6FA; text-align:center;" width="15%" |  
+
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="15%" |  
 
;
 
;
[[ISSS608_2016-17_T3_Assign_Chan_En_Ying_Grace_Methodology|<b><font size="3"><font color="#000000">Methodology</font></font></b>]]
+
[[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace_Methodology|<b><font size="3"><font color="#000000">Methodology</font></font></b>]]
  
 
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" |  
 
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" |  
 
;
 
;
[[ISSS608_2016-17_T3_Assign_Chan_En_Ying_Grace_Did Rose Pipit kick the bucket?|<b><font size="3"><font color="#000000">Did Rose Pipit kicketh the bucket?</font></font></b>]]
+
[[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace_Did Rose Pipit kick the bucket?|<b><font size="3"><font color="#000000">Did Rose Pipit kicketh the bucket?</font></font></b>]]
  
 
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" |  
 
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" |  
 
;
 
;
[[ISSS608_2016-17_T3_Chan_En_Ying_Grace_Which song belongs to thee?| <b><font size="3"><font color="#000000">Which song belongs to thee?</font></font></b>]]
+
[[ISSS608_2017-18_T3_Chan_En_Ying_Grace_Which song belongs to thee?| <b><font size="3"><font color="#000000">Which song belongs to thee?</font></font></b>]]
  
 
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="20%" |  
 
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="20%" |  
 
;
 
;
[[ISSS608_2016-17_T3_Chan_En_Ying_Grace_Conclusion| <b><font size="3"><font color="#000000">Conclusion</font></font></b>]]
+
[[ISSS608_2017-18_T3_Chan_En_Ying_Grace_Conclusion| <b><font size="3"><font color="#000000">Conclusion</font></font></b>]]
  
 
|  &nbsp;
 
|  &nbsp;
Line 63: Line 63:
 
|}
 
|}
 
<!-- END OF TOOLS-->
 
<!-- END OF TOOLS-->
 
<br/>
 
  
 
==<font size="5"><font color="#000000">'''Approach Taken'''</font></font>==
 
==<font size="5"><font color="#000000">'''Approach Taken'''</font></font>==
  
The following outlines the approach used for the analysis.
+
The following outlines the 6 broad steps used for the analysis - Data Cleaning, Data Preparation, Geospatial Visualisation, Statistical Confirmation, Audio Processing and Audio Visualisation.
  
 
<div style="margin:0px; padding: 2px; background: #E6E6FA; font-family: Arial; border-radius: 1px; text-align:left">
 
<div style="margin:0px; padding: 2px; background: #E6E6FA; font-family: Arial; border-radius: 1px; text-align:left">
Line 86: Line 84:
 
<b>Data Understanding</b>  
 
<b>Data Understanding</b>  
 
||
 
||
<b>1. Read in Raster Layer (Lekagul Roadways Map)</b>  
+
<b>i. Read in Raster Layer (Lekagul Roadways Map)</b>  
 
* It is a single layer raster file. 200x200.  
 
* It is a single layer raster file. 200x200.  
  
Line 98: Line 96:
  
  
<b>2. Find out structure of Raster Layer</b>
+
<b>ii. Find out structure of Raster Layer</b>
 
<br> Extent          : 40000
 
<br> Extent          : 40000
 
<br> CRS arguments  : NA  
 
<br> CRS arguments  : NA  
Line 111: Line 109:
 
<b>Data Cleaning</b>  
 
<b>Data Cleaning</b>  
 
||
 
||
<b>1. Import two CSV Files (Birds)</b>  
+
<b>i. Import two CSV Files (Birds)</b>  
 
* 2081 Training Birds (Metadata)
 
* 2081 Training Birds (Metadata)
 
* 15 Test Birds (Provided by Kasios)
 
* 15 Test Birds (Provided by Kasios)
  
  
<b>2. Fix Data Quality Issues</b>  
+
<b>ii. Fix Data Quality Issues</b>  
 
* Change File ID from numeric to character  
 
* Change File ID from numeric to character  
 
* Change coordinates to numeric
 
* Change coordinates to numeric
Line 128: Line 126:
  
  
<b>3. Data Manipulation</b>  
+
<b>iii. Data Manipulation</b>  
 
* Extract out the “Year” and “Month” from the date, as new columns
 
* Extract out the “Year” and “Month” from the date, as new columns
 
* Create a new column for Quarter (Q1,Q2,Q3,Q4) & Season (Spring, Summer, Fall, Winter)
 
* Create a new column for Quarter (Q1,Q2,Q3,Q4) & Season (Spring, Summer, Fall, Winter)
  
  
<b>4. Geospatial File Compatibility</b>  
+
<b>iv. Geospatial File Compatibility</b>  
 
* Convert CSV file (2081 birds) into the following:
 
* Convert CSV file (2081 birds) into the following:
 
** spatial point data frame  
 
** spatial point data frame  
Line 143: Line 141:
  
  
<b>5. Data Overview & Exploration</b>  
+
<b>v. Data Overview & Exploration</b>  
 
* Overlay 2081 Birds, Raster Map & Dumping Site, for an integrated overview using `plot()`
 
* Overlay 2081 Birds, Raster Map & Dumping Site, for an integrated overview using `plot()`
 
* Use `facet_wrap` to identify location of clustering across species, across time, and across season, and by call/song
 
* Use `facet_wrap` to identify location of clustering across species, across time, and across season, and by call/song
  
  
<b>6. Segregation of Treatment & Control Groups</b>  
+
<b>vi. Segregation of Treatment & Control Groups</b>  
 
* Use ‘Rose Pipits’ as Treatment Group
 
* Use ‘Rose Pipits’ as Treatment Group
 
* Use ‘Ordinary Snape’ and ‘Lesse Birchbeere’ as Control Groups
 
* Use ‘Ordinary Snape’ and ‘Lesse Birchbeere’ as Control Groups
Line 159: Line 157:
 
<b>Geospatial Visualisation </b>  
 
<b>Geospatial Visualisation </b>  
 
||
 
||
<b>Spatial Point Pattern Visualisation (Density-Based Measure) </b>  
+
<b><u>Spatial Point Pattern Visualisation (Density-Based Measure) </u></b>  
  
<b>1. Prepare polygon layer </b>
+
<b>i. Prepare polygon layer </b>
 
* Create a 200x200 spatial polygon to depict the boundaries of Lekagul raster map
 
* Create a 200x200 spatial polygon to depict the boundaries of Lekagul raster map
 
* Merge Raster Polygon with Rose Pipit Layer, using `owin` from spatstat package
 
* Merge Raster Polygon with Rose Pipit Layer, using `owin` from spatstat package
  
<b>2. Kernel Density Plot </b>
+
 
 +
<b>ii. Kernel Density Plot </b>
 
* First, set sigma=bw.diggle  
 
* First, set sigma=bw.diggle  
 
* Apply the Kernel Density Plot (By Year; 2012-2017)
 
* Apply the Kernel Density Plot (By Year; 2012-2017)
Line 172: Line 171:
 
** For OS & LB only (Control Groups)
 
** For OS & LB only (Control Groups)
  
<b>3. Adjust Parameters (sigma) </b>
+
 
 +
<b>iii. Adjust Parameters (sigma) </b>
 
* Adjust the plots by using the sigma of the most dense cluster  
 
* Adjust the plots by using the sigma of the most dense cluster  
 
** This is typically the largest sigma
 
** This is typically the largest sigma
 +
 +
 +
<b>iv. Fine-Tune for Clearer Visualisation </b>
 +
* Then add in the dumping site & adjust the colour/size
 +
* So that we can visualize the clusters relative to the dumping site
 +
|-
 +
|
 +
4.
 +
||
 +
<b>Statistical Confirmation </b>
 +
||
 +
<b><u>Spatial Point Pattern Analysis (Distance-Based Measure)</u></b>
 +
 +
<b>i. Quadrat Analysis  </b>
 +
* Apply Monti-Carlo Simulation
 +
* Followed by Quadrat Test to test for clustering
 +
 +
 +
<b>ii. K-Nearest Neighbour  </b>
 +
* Apply Monti-Carlo Simulation
 +
* Followed by Clark-Evans Test to test for clustering
 +
 +
 +
<b>iii. K-Function  </b>
 +
* Apply Monti-Carlo simulation
 +
* Visualise significance based on grey band
 +
 +
|-
 +
|
 +
5.
 +
||
 +
<b>Audio Processing</b>
 +
||
 +
<b>i. Data Preparation  (Density-Based Measure)</b>
 +
* Read in MP3 Files (Training & Testing Data)
 +
* Convert to .wav format using `writeWav()`
 +
* Convert .wav files to data frame using `analyzeFolder()`
 +
* Read in data frame
 +
 +
 +
<b>ii. Audio Extraction & Manipulation </b>
 +
* Extract only 1 of 2 channels (choose left).
 +
* Convert each sound array to floating point values ranging from -1 to 1.
 +
 +
 +
<b>iii. Adjust Parameters (sigma) </b>
 +
* Adjust the plots by using the sigma of the most dense cluster
 +
** This is typically the largest sigma
 +
 +
 +
<b>iv. Fine-Tune for Clearer Visualisation </b>
 +
* Then add in the dumping site & adjust the colour/size
 +
* So that we can visualize the clusters relative to the dumping site
 +
 +
|-
 +
|
 +
6.
 +
||
 +
<b>Audio Visualisation</b>
 +
||
 +
<b>i. Amplitude Envelope Plot</b>
 +
* Use diffenv() to plot the envelopes of the amplitutde plots
 +
* Do this for all the 15 test birds
 +
* Do this for 5 training birds per species and select most representative plot as your ‘dictionary’
 +
 +
 +
<b>ii. Oscillogram Plot</b>
 +
* Use seewave package to plot osciilogram
 +
* Do this for all the 15 test birds
 +
* Do this for 5 of the training birds, per species and select most representative plot as your ‘dictionary’
 +
 +
 +
<b>iii. Distribution of audio parameters, using Trellis Plot</b>
 +
* Out of the 15 attributes available after extracting the dataframe from the .wav files, the following 7 will be used for analysis:
 +
* dom_median,HNR_median, meanFreq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median
 +
* These were selected as they vary across the species more, than the other of the 8 variables
 +
* Use ggplot() to plot a trellis plot using the 19 training species
 +
* Label the mean
 +
* Use ggplot() to insert the 15 testing birds
 +
* Visualise and identify the top 3 closest species, per parameter
 +
* Select the species based on no. of parameters closest to the training mean
  
 
|}
 
|}
 
</div>
 
</div>
 
<br>
 
<br>

Latest revision as of 22:42, 22 June 2018

Rose Pipits.png “Mine dear rose pipits, whence did do thou vanish?”

Background

Methodology

Did Rose Pipit kicketh the bucket?

Which song belongs to thee?

Conclusion

 


Tools

R is the primary tool used in this analysis. The following lists the packages used for the project’s scope - for data cleaning, data visualisation, geospatial analysis and audio processing.

  • R libraries
    • sp
    • rgdal
    • sf
    • raster
    • spatstat
    • maptools
    • gplots
    • ggplot2
    • ggmap
    • rasterVis
    • lattice
    • latticeExtra
    • tidyverse
    • zoo
    • tmap
    • reshape2
    • quantmod
    • ggTimeSeries
    • viridis
    • rlang
    • soundgen
    • tuneR
    • phonTools
    • seewave

Approach Taken

The following outlines the 6 broad steps used for the analysis - Data Cleaning, Data Preparation, Geospatial Visualisation, Statistical Confirmation, Audio Processing and Audio Visualisation.

Step

Approach

Description

1.

Data Understanding

i. Read in Raster Layer (Lekagul Roadways Map)

  • It is a single layer raster file. 200x200.

class : RasterLayer
dimensions : 200, 200, 40000 (nrow, ncol, ncell)
resolution : 1, 1 (x, y)
extent : 0, 200, 0, 200 (xmin, xmax, ymin, ymax)
coord. ref. : NA
names : Lekagul_Roadways_2018
values : 0, 255 (min, max)


ii. Find out structure of Raster Layer
Extent : 40000
CRS arguments : NA
File Size : 41078
Object Size : 14376 bytes
Layer : 1

2.

Data Cleaning

i. Import two CSV Files (Birds)

  • 2081 Training Birds (Metadata)
  • 15 Test Birds (Provided by Kasios)


ii. Fix Data Quality Issues

  • Change File ID from numeric to character
  • Change coordinates to numeric
  • Change Date from Character to Date
  • Omit the two NA values for the Y coordinate.
  • Clean the Dates (All standardise to m/d/y. For missing month/year, I will replace with NA. For missing day, I will impute as 1st day of the month.)
  • Clean the Timing (Standardise all to 24 hour formatting. Use “.” instead of ":")
  • Clean the Vocalisation Type (Standardise all to lower case. For values consisting of both ‘song and call’, change to ‘call’, assumed as a sign of distress while ‘song’ is assumed as the default)
  • Clean the Quality (Recode ‘no score’ as ‘NA’)


iii. Data Manipulation

  • Extract out the “Year” and “Month” from the date, as new columns
  • Create a new column for Quarter (Q1,Q2,Q3,Q4) & Season (Spring, Summer, Fall, Winter)


iv. Geospatial File Compatibility

  • Convert CSV file (2081 birds) into the following:
    • spatial point data frame
    • sp format
    • shp format
    • st_read compatible format
    • readOGR compatible format
    • ppp format (for spatstat compatibility)


v. Data Overview & Exploration

  • Overlay 2081 Birds, Raster Map & Dumping Site, for an integrated overview using `plot()`
  • Use `facet_wrap` to identify location of clustering across species, across time, and across season, and by call/song


vi. Segregation of Treatment & Control Groups

  • Use ‘Rose Pipits’ as Treatment Group
  • Use ‘Ordinary Snape’ and ‘Lesse Birchbeere’ as Control Groups
  • Use ‘All Birds’ as third control

3.

Geospatial Visualisation

Spatial Point Pattern Visualisation (Density-Based Measure)

i. Prepare polygon layer

  • Create a 200x200 spatial polygon to depict the boundaries of Lekagul raster map
  • Merge Raster Polygon with Rose Pipit Layer, using `owin` from spatstat package


ii. Kernel Density Plot

  • First, set sigma=bw.diggle
  • Apply the Kernel Density Plot (By Year; 2012-2017)
    • For All Birds
    • For Rose Pipits only (Treatment Group)
    • For OS & LB only (Control Groups)


iii. Adjust Parameters (sigma)

  • Adjust the plots by using the sigma of the most dense cluster
    • This is typically the largest sigma


iv. Fine-Tune for Clearer Visualisation

  • Then add in the dumping site & adjust the colour/size
  • So that we can visualize the clusters relative to the dumping site

4.

Statistical Confirmation

Spatial Point Pattern Analysis (Distance-Based Measure)

i. Quadrat Analysis

  • Apply Monti-Carlo Simulation
  • Followed by Quadrat Test to test for clustering


ii. K-Nearest Neighbour

  • Apply Monti-Carlo Simulation
  • Followed by Clark-Evans Test to test for clustering


iii. K-Function

  • Apply Monti-Carlo simulation
  • Visualise significance based on grey band

5.

Audio Processing

i. Data Preparation (Density-Based Measure)

  • Read in MP3 Files (Training & Testing Data)
  • Convert to .wav format using `writeWav()`
  • Convert .wav files to data frame using `analyzeFolder()`
  • Read in data frame


ii. Audio Extraction & Manipulation

  • Extract only 1 of 2 channels (choose left).
  • Convert each sound array to floating point values ranging from -1 to 1.


iii. Adjust Parameters (sigma)

  • Adjust the plots by using the sigma of the most dense cluster
    • This is typically the largest sigma


iv. Fine-Tune for Clearer Visualisation

  • Then add in the dumping site & adjust the colour/size
  • So that we can visualize the clusters relative to the dumping site

6.

Audio Visualisation

i. Amplitude Envelope Plot

  • Use diffenv() to plot the envelopes of the amplitutde plots
  • Do this for all the 15 test birds
  • Do this for 5 training birds per species and select most representative plot as your ‘dictionary’


ii. Oscillogram Plot

  • Use seewave package to plot osciilogram
  • Do this for all the 15 test birds
  • Do this for 5 of the training birds, per species and select most representative plot as your ‘dictionary’


iii. Distribution of audio parameters, using Trellis Plot

  • Out of the 15 attributes available after extracting the dataframe from the .wav files, the following 7 will be used for analysis:
  • dom_median,HNR_median, meanFreq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median
  • These were selected as they vary across the species more, than the other of the 8 variables
  • Use ggplot() to plot a trellis plot using the 19 training species
  • Label the mean
  • Use ggplot() to insert the 15 testing birds
  • Visualise and identify the top 3 closest species, per parameter
  • Select the species based on no. of parameters closest to the training mean