Difference between revisions of "ISSS608 2016-17 T3 Assign Chan En Ying Grace Methodology"

Latest revision as of 22:42, 22 June 2018

“Mine dear rose pipits, whence did do thou vanish?”

Background

Methodology

Did Rose Pipit kicketh the bucket?

Which song belongs to thee?

Conclusion

Tools

R is the primary tool used in this analysis. The following lists the packages used for the project’s scope - for data cleaning, data visualisation, geospatial analysis and audio processing.

R libraries sp rgdal sf raster spatstat maptools gplots ggplot2 ggmap rasterVis lattice latticeExtra tidyverse zoo tmap reshape2 quantmod ggTimeSeries viridis rlang soundgen tuneR phonTools seewave

Approach Taken

The following outlines the 6 broad steps used for the analysis - Data Cleaning, Data Preparation, Geospatial Visualisation, Statistical Confirmation, Audio Processing and Audio Visualisation.

Step	Approach	Description
1.	Data Understanding	i. Read in Raster Layer (Lekagul Roadways Map) It is a single layer raster file. 200x200. class : RasterLayer dimensions : 200, 200, 40000 (nrow, ncol, ncell) resolution : 1, 1 (x, y) extent : 0, 200, 0, 200 (xmin, xmax, ymin, ymax) coord. ref. : NA names : Lekagul_Roadways_2018 values : 0, 255 (min, max) ii. Find out structure of Raster Layer Extent : 40000 CRS arguments : NA File Size : 41078 Object Size : 14376 bytes Layer : 1
2.	Data Cleaning	i. Import two CSV Files (Birds) 2081 Training Birds (Metadata) 15 Test Birds (Provided by Kasios) ii. Fix Data Quality Issues Change File ID from numeric to character Change coordinates to numeric Change Date from Character to Date Omit the two NA values for the Y coordinate. Clean the Dates (All standardise to m/d/y. For missing month/year, I will replace with NA. For missing day, I will impute as 1st day of the month.) Clean the Timing (Standardise all to 24 hour formatting. Use “.” instead of ":") Clean the Vocalisation Type (Standardise all to lower case. For values consisting of both ‘song and call’, change to ‘call’, assumed as a sign of distress while ‘song’ is assumed as the default) Clean the Quality (Recode ‘no score’ as ‘NA’) iii. Data Manipulation Extract out the “Year” and “Month” from the date, as new columns Create a new column for Quarter (Q1,Q2,Q3,Q4) & Season (Spring, Summer, Fall, Winter) iv. Geospatial File Compatibility Convert CSV file (2081 birds) into the following: spatial point data frame sp format shp format st_read compatible format readOGR compatible format ppp format (for spatstat compatibility) v. Data Overview & Exploration Overlay 2081 Birds, Raster Map & Dumping Site, for an integrated overview using `plot()` Use `facet_wrap` to identify location of clustering across species, across time, and across season, and by call/song vi. Segregation of Treatment & Control Groups Use ‘Rose Pipits’ as Treatment Group Use ‘Ordinary Snape’ and ‘Lesse Birchbeere’ as Control Groups Use ‘All Birds’ as third control
3.	Geospatial Visualisation	Spatial Point Pattern Visualisation (Density-Based Measure) i. Prepare polygon layer Create a 200x200 spatial polygon to depict the boundaries of Lekagul raster map Merge Raster Polygon with Rose Pipit Layer, using `owin` from spatstat package ii. Kernel Density Plot First, set sigma=bw.diggle Apply the Kernel Density Plot (By Year; 2012-2017) For All Birds For Rose Pipits only (Treatment Group) For OS & LB only (Control Groups) iii. Adjust Parameters (sigma) Adjust the plots by using the sigma of the most dense cluster This is typically the largest sigma iv. Fine-Tune for Clearer Visualisation Then add in the dumping site & adjust the colour/size So that we can visualize the clusters relative to the dumping site
4.	Statistical Confirmation	Spatial Point Pattern Analysis (Distance-Based Measure) i. Quadrat Analysis Apply Monti-Carlo Simulation Followed by Quadrat Test to test for clustering ii. K-Nearest Neighbour Apply Monti-Carlo Simulation Followed by Clark-Evans Test to test for clustering iii. K-Function Apply Monti-Carlo simulation Visualise significance based on grey band
5.	Audio Processing	i. Data Preparation (Density-Based Measure) Read in MP3 Files (Training & Testing Data) Convert to .wav format using `writeWav()` Convert .wav files to data frame using `analyzeFolder()` Read in data frame ii. Audio Extraction & Manipulation Extract only 1 of 2 channels (choose left). Convert each sound array to floating point values ranging from -1 to 1. iii. Adjust Parameters (sigma) Adjust the plots by using the sigma of the most dense cluster This is typically the largest sigma iv. Fine-Tune for Clearer Visualisation Then add in the dumping site & adjust the colour/size So that we can visualize the clusters relative to the dumping site
6.	Audio Visualisation	i. Amplitude Envelope Plot Use diffenv() to plot the envelopes of the amplitutde plots Do this for all the 15 test birds Do this for 5 training birds per species and select most representative plot as your ‘dictionary’ ii. Oscillogram Plot Use seewave package to plot osciilogram Do this for all the 15 test birds Do this for 5 of the training birds, per species and select most representative plot as your ‘dictionary’ iii. Distribution of audio parameters, using Trellis Plot Out of the 15 attributes available after extracting the dataframe from the .wav files, the following 7 will be used for analysis: dom_median,HNR_median, meanFreq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median These were selected as they vary across the species more, than the other of the 8 variables Use ggplot() to plot a trellis plot using the 19 training species Label the mean Use ggplot() to insert the 15 testing birds Visualise and identify the top 3 closest species, per parameter Select the species based on no. of parameters closest to the training mean

@@ Line 9: / Line 9: @@
 [[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace|<b><font size="3"><font color="#000000">Background</font></font></b>]]
-| style="font-family:Arial; font-size:100%; solid #000000; background:#E6E6FA; text-align:center;" width="15%" |
+| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="15%" |
 ;
-[[ISSS608_2016-17_T3_Assign_Chan_En_Ying_Grace_Methodology|<b><font size="3"><font color="#000000">Methodology</font></font></b>]]
+[[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace_Methodology|<b><font size="3"><font color="#000000">Methodology</font></font></b>]]
 | style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" |
 ;
-[[ISSS608_2016-17_T3_Assign_Chan_En_Ying_Grace_Did Rose Pipit kick the bucket?|<b><font size="3"><font color="#000000">Did Rose Pipit kicketh the bucket?</font></font></b>]]
+[[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace_Did Rose Pipit kick the bucket?|<b><font size="3"><font color="#000000">Did Rose Pipit kicketh the bucket?</font></font></b>]]
 | style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" |
 ;
-[[ISSS608_2016-17_T3_Chan_En_Ying_Grace_Which song belongs to thee?| <b><font size="3"><font color="#000000">Which song belongs to thee?</font></font></b>]]
+[[ISSS608_2017-18_T3_Chan_En_Ying_Grace_Which song belongs to thee?| <b><font size="3"><font color="#000000">Which song belongs to thee?</font></font></b>]]
 | style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="20%" |
 ;
-[[ISSS608_2016-17_T3_Chan_En_Ying_Grace_Conclusion| <b><font size="3"><font color="#000000">Conclusion</font></font></b>]]
+[[ISSS608_2017-18_T3_Chan_En_Ying_Grace_Conclusion| <b><font size="3"><font color="#000000">Conclusion</font></font></b>]]
 |  &nbsp;
@@ Line 63: / Line 63: @@
 |}
 <!-- END OF TOOLS-->
-<br/>
 ==<font size="5"><font color="#000000">'''Approach Taken'''</font></font>==
-The following outlines the approach used for the analysis.
+The following outlines the 6 broad steps used for the analysis - Data Cleaning, Data Preparation, Geospatial Visualisation, Statistical Confirmation, Audio Processing and Audio Visualisation.
 <div style="margin:0px; padding: 2px; background: #E6E6FA; font-family: Arial; border-radius: 1px; text-align:left">
@@ Line 86: / Line 84: @@
 <b>Data Understanding</b>
 ||
-<b>1. Read in Raster Layer (Lekagul Roadways Map)</b>
+<b>i. Read in Raster Layer (Lekagul Roadways Map)</b>
 * It is a single layer raster file. 200x200.
@@ Line 98: / Line 96: @@
-<b>2. Find out structure of Raster Layer</b>
+<b>ii. Find out structure of Raster Layer</b>
 <br> Extent          : 40000
 <br> CRS arguments   : NA
@@ Line 111: / Line 109: @@
 <b>Data Cleaning</b>
 ||
-<b>1. Import two CSV Files (Birds)</b>
+<b>i. Import two CSV Files (Birds)</b>
 * 2081 Training Birds (Metadata)
 * 15 Test Birds (Provided by Kasios)
-<b>2. Fix Data Quality Issues</b>
+<b>ii. Fix Data Quality Issues</b>
 * Change File ID from numeric to character
 * Change coordinates to numeric
@@ Line 128: / Line 126: @@
-<b>3. Data Manipulation</b>
+<b>iii. Data Manipulation</b>
 * Extract out the “Year” and “Month” from the date, as new columns
 * Create a new column for Quarter (Q1,Q2,Q3,Q4) & Season (Spring, Summer, Fall, Winter)
-<b>4. Geospatial File Compatibility</b>
+<b>iv. Geospatial File Compatibility</b>
 * Convert CSV file (2081 birds) into the following:
 ** spatial point data frame
@@ Line 143: / Line 141: @@
-<b>5. Data Overview & Exploration</b>
+<b>v. Data Overview & Exploration</b>
 * Overlay 2081 Birds, Raster Map & Dumping Site, for an integrated overview using `plot()`
 * Use `facet_wrap` to identify location of clustering across species, across time, and across season, and by call/song
-<b>6. Segregation of Treatment & Control Groups</b>
+<b>vi. Segregation of Treatment & Control Groups</b>
 * Use ‘Rose Pipits’ as Treatment Group
 * Use ‘Ordinary Snape’ and ‘Lesse Birchbeere’ as Control Groups
@@ Line 159: / Line 157: @@
 <b>Geospatial Visualisation </b>
 ||
-<b>Spatial Point Pattern Visualisation (Density-Based Measure) </b>
+<b><u>Spatial Point Pattern Visualisation (Density-Based Measure) </u></b>
-<b>1. Prepare polygon layer </b>
+<b>i. Prepare polygon layer </b>
 * Create a 200x200 spatial polygon to depict the boundaries of Lekagul raster map
 * Merge Raster Polygon with Rose Pipit Layer, using `owin` from spatstat package
-<b>2. Kernel Density Plot </b>
+<b>ii. Kernel Density Plot </b>
 * First, set sigma=bw.diggle
 * Apply the Kernel Density Plot (By Year; 2012-2017)
@@ Line 172: / Line 171: @@
 ** For OS & LB only (Control Groups)
-<b>3. Adjust Parameters (sigma) </b>
+<b>iii. Adjust Parameters (sigma) </b>
 * Adjust the plots by using the sigma of the most dense cluster
 ** This is typically the largest sigma
+<b>iv. Fine-Tune for Clearer Visualisation </b>
+* Then add in the dumping site & adjust the colour/size
+* So that we can visualize the clusters relative to the dumping site
+|-
+|
+.
+||
+<b>Statistical Confirmation </b>
+||
+<b><u>Spatial Point Pattern Analysis (Distance-Based Measure)</u></b>
+<b>i. Quadrat Analysis  </b>
+* Apply Monti-Carlo Simulation
+* Followed by Quadrat Test to test for clustering
+<b>ii. K-Nearest Neighbour  </b>
+* Apply Monti-Carlo Simulation
+* Followed by Clark-Evans Test to test for clustering
+<b>iii. K-Function  </b>
+* Apply Monti-Carlo simulation
+* Visualise significance based on grey band
+|-
+|
+.
+||
+<b>Audio Processing</b>
+||
+<b>i. Data Preparation  (Density-Based Measure)</b>
+* Read in MP3 Files (Training & Testing Data)
+* Convert to .wav format using `writeWav()`
+* Convert .wav files to data frame using `analyzeFolder()`
+* Read in data frame
+<b>ii. Audio Extraction & Manipulation </b>
+* Extract only 1 of 2 channels (choose left).
+* Convert each sound array to floating point values ranging from -1 to 1.
+<b>iii. Adjust Parameters (sigma) </b>
+* Adjust the plots by using the sigma of the most dense cluster
+** This is typically the largest sigma
+<b>iv. Fine-Tune for Clearer Visualisation </b>
+* Then add in the dumping site & adjust the colour/size
+* So that we can visualize the clusters relative to the dumping site
+|-
+|
+.
+||
+<b>Audio Visualisation</b>
+||
+<b>i. Amplitude Envelope Plot</b>
+* Use diffenv() to plot the envelopes of the amplitutde plots
+* Do this for all the 15 test birds
+* Do this for 5 training birds per species and select most representative plot as your ‘dictionary’
+<b>ii. Oscillogram Plot</b>
+* Use seewave package to plot osciilogram
+* Do this for all the 15 test birds
+* Do this for 5 of the training birds, per species and select most representative plot as your ‘dictionary’
+<b>iii. Distribution of audio parameters, using Trellis Plot</b>
+* Out of the 15 attributes available after extracting the dataframe from the .wav files, the following 7 will be used for analysis:
+* dom_median,HNR_median, meanFreq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median
+* These were selected as they vary across the species more, than the other of the 8 variables
+* Use ggplot() to plot a trellis plot using the 19 training species
+* Label the mean
+* Use ggplot() to insert the 15 testing birds
+* Visualise and identify the top 3 closest species, per parameter
+* Select the species based on no. of parameters closest to the training mean
 |}
 </div>
 <br>

Difference between revisions of "ISSS608 2016-17 T3 Assign Chan En Ying Grace Methodology"

Latest revision as of 22:42, 22 June 2018

Tools

Approach Taken

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools