“Mine dear rose pipits, whence did do thou vanish?”
Tools
R is the primary tool used in this analysis. The following lists the packages used for the project’s scope - for data cleaning, data visualisation, geospatial analysis and audio processing.
- R libraries
- sp
- rgdal
- sf
- raster
- spatstat
- maptools
- gplots
- ggplot2
- ggmap
- rasterVis
- lattice
- latticeExtra
- tidyverse
- zoo
- tmap
- reshape2
- quantmod
- ggTimeSeries
- viridis
- rlang
- soundgen
- tuneR
- phonTools
- seewave
|
Approach Taken
The following outlines the 6 broad steps used for the analysis - Data Cleaning, Data Preparation, Geospatial Visualisation, Statistical Confirmation, Audio Processing and Audio Visualisation.
Step
|
Approach
|
Description
|
1.
|
Data Understanding
|
i. Read in Raster Layer (Lekagul Roadways Map)
- It is a single layer raster file. 200x200.
class : RasterLayer
dimensions : 200, 200, 40000 (nrow, ncol, ncell)
resolution : 1, 1 (x, y)
extent : 0, 200, 0, 200 (xmin, xmax, ymin, ymax)
coord. ref. : NA
names : Lekagul_Roadways_2018
values : 0, 255 (min, max)
ii. Find out structure of Raster Layer
Extent : 40000
CRS arguments : NA
File Size : 41078
Object Size : 14376 bytes
Layer : 1
|
2.
|
Data Cleaning
|
i. Import two CSV Files (Birds)
- 2081 Training Birds (Metadata)
- 15 Test Birds (Provided by Kasios)
ii. Fix Data Quality Issues
- Change File ID from numeric to character
- Change coordinates to numeric
- Change Date from Character to Date
- Omit the two NA values for the Y coordinate.
- Clean the Dates (All standardise to m/d/y. For missing month/year, I will replace with NA. For missing day, I will impute as 1st day of the month.)
- Clean the Timing (Standardise all to 24 hour formatting. Use “.” instead of ":")
- Clean the Vocalisation Type (Standardise all to lower case. For values consisting of both ‘song and call’, change to ‘call’, assumed as a sign of distress while ‘song’ is assumed as the default)
- Clean the Quality (Recode ‘no score’ as ‘NA’)
iii. Data Manipulation
- Extract out the “Year” and “Month” from the date, as new columns
- Create a new column for Quarter (Q1,Q2,Q3,Q4) & Season (Spring, Summer, Fall, Winter)
iv. Geospatial File Compatibility
- Convert CSV file (2081 birds) into the following:
- spatial point data frame
- sp format
- shp format
- st_read compatible format
- readOGR compatible format
- ppp format (for spatstat compatibility)
v. Data Overview & Exploration
- Overlay 2081 Birds, Raster Map & Dumping Site, for an integrated overview using `plot()`
- Use `facet_wrap` to identify location of clustering across species, across time, and across season, and by call/song
vi. Segregation of Treatment & Control Groups
- Use ‘Rose Pipits’ as Treatment Group
- Use ‘Ordinary Snape’ and ‘Lesse Birchbeere’ as Control Groups
- Use ‘All Birds’ as third control
|
3.
|
Geospatial Visualisation
|
Spatial Point Pattern Visualisation (Density-Based Measure)
i. Prepare polygon layer
- Create a 200x200 spatial polygon to depict the boundaries of Lekagul raster map
- Merge Raster Polygon with Rose Pipit Layer, using `owin` from spatstat package
ii. Kernel Density Plot
- First, set sigma=bw.diggle
- Apply the Kernel Density Plot (By Year; 2012-2017)
- For All Birds
- For Rose Pipits only (Treatment Group)
- For OS & LB only (Control Groups)
iii. Adjust Parameters (sigma)
- Adjust the plots by using the sigma of the most dense cluster
- This is typically the largest sigma
iv. Fine-Tune for Clearer Visualisation
- Then add in the dumping site & adjust the colour/size
- So that we can visualize the clusters relative to the dumping site
|
4.
|
Statistical Confirmation
|
Spatial Point Pattern Analysis (Distance-Based Measure)
i. Quadrat Analysis
- Apply Monti-Carlo Simulation
- Followed by Quadrat Test to test for clustering
ii. K-Nearest Neighbour
- Apply Monti-Carlo Simulation
- Followed by Clark-Evans Test to test for clustering
iii. K-Function
- Apply Monti-Carlo simulation
- Visualise significance based on grey band
|
5.
|
Audio Processing
|
i. Data Preparation (Density-Based Measure)
- Read in MP3 Files (Training & Testing Data)
- Convert to .wav format using `writeWav()`
- Convert .wav files to data frame using `analyzeFolder()`
- Read in data frame
ii. Audio Extraction & Manipulation
- Extract only 1 of 2 channels (choose left).
- Convert each sound array to floating point values ranging from -1 to 1.
iii. Adjust Parameters (sigma)
- Adjust the plots by using the sigma of the most dense cluster
- This is typically the largest sigma
iv. Fine-Tune for Clearer Visualisation
- Then add in the dumping site & adjust the colour/size
- So that we can visualize the clusters relative to the dumping site
|
6.
|
Audio Visualisation
|
i. Amplitude Envelope Plot
- Use diffenv() to plot the envelopes of the amplitutde plots
- Do this for all the 15 test birds
- Do this for 5 training birds per species and select most representative plot as your ‘dictionary’
ii. Oscillogram Plot
- Use seewave package to plot osciilogram
- Do this for all the 15 test birds
- Do this for 5 of the training birds, per species and select most representative plot as your ‘dictionary’
iii. Distribution of audio parameters, using Trellis Plot
- Out of the 15 attributes available after extracting the dataframe from the .wav files, the following 7 will be used for analysis:
- dom_median,HNR_median, meanFreq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median
- These were selected as they vary across the species more, than the other of the 8 variables
- Use ggplot() to plot a trellis plot using the 19 training species
- Label the mean
- Use ggplot() to insert the 15 testing birds
- Visualise and identify the top 3 closest species, per parameter
- Select the species based on no. of parameters closest to the training mean
|