Difference between revisions of "ISSS608 2016-17 T3 Assign Chan En Ying Grace Methodology"
(15 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
[[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace|<b><font size="3"><font color="#000000">Background</font></font></b>]] | [[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace|<b><font size="3"><font color="#000000">Background</font></font></b>]] | ||
− | | style="font-family:Arial; font-size:100%; solid # | + | | style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="15%" | |
; | ; | ||
− | [[ | + | [[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace_Methodology|<b><font size="3"><font color="#000000">Methodology</font></font></b>]] |
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" | | | style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" | | ||
; | ; | ||
− | [[ | + | [[ISSS608_2017-18_T3_Assign_Chan_En_Ying_Grace_Did Rose Pipit kick the bucket?|<b><font size="3"><font color="#000000">Did Rose Pipit kicketh the bucket?</font></font></b>]] |
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" | | | style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="25%" | | ||
; | ; | ||
− | [[ | + | [[ISSS608_2017-18_T3_Chan_En_Ying_Grace_Which song belongs to thee?| <b><font size="3"><font color="#000000">Which song belongs to thee?</font></font></b>]] |
| style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="20%" | | | style="font-family:Arial; font-size:100%; solid #1B338F; background:#E6E6FA; text-align:center;" width="20%" | | ||
; | ; | ||
− | [[ | + | [[ISSS608_2017-18_T3_Chan_En_Ying_Grace_Conclusion| <b><font size="3"><font color="#000000">Conclusion</font></font></b>]] |
| | | | ||
Line 64: | Line 64: | ||
<!-- END OF TOOLS--> | <!-- END OF TOOLS--> | ||
− | <br/> | + | ==<font size="5"><font color="#000000">'''Approach Taken'''</font></font>== |
+ | |||
+ | The following outlines the 6 broad steps used for the analysis - Data Cleaning, Data Preparation, Geospatial Visualisation, Statistical Confirmation, Audio Processing and Audio Visualisation. | ||
+ | |||
+ | <div style="margin:0px; padding: 2px; background: #E6E6FA; font-family: Arial; border-radius: 1px; text-align:left"> | ||
+ | {| class="wikitable" style="background-color:#FFFFFF;" width="100%" | ||
+ | |- | ||
+ | | | ||
+ | <b>Step</b> | ||
+ | || | ||
+ | <b>Approach</b> | ||
+ | || | ||
+ | <b>Description</b> | ||
+ | |||
+ | |- | ||
+ | | | ||
+ | 1. | ||
+ | || | ||
+ | <b>Data Understanding</b> | ||
+ | || | ||
+ | <b>i. Read in Raster Layer (Lekagul Roadways Map)</b> | ||
+ | * It is a single layer raster file. 200x200. | ||
+ | |||
+ | class : RasterLayer | ||
+ | <br> dimensions : 200, 200, 40000 (nrow, ncol, ncell) | ||
+ | <br> resolution : 1, 1 (x, y) | ||
+ | <br> extent : 0, 200, 0, 200 (xmin, xmax, ymin, ymax) | ||
+ | <br> coord. ref. : NA | ||
+ | <br> names : Lekagul_Roadways_2018 | ||
+ | <br> values : 0, 255 (min, max) | ||
+ | |||
+ | |||
+ | <b>ii. Find out structure of Raster Layer</b> | ||
+ | <br> Extent : 40000 | ||
+ | <br> CRS arguments : NA | ||
+ | <br> File Size : 41078 | ||
+ | <br> Object Size : 14376 bytes | ||
+ | <br> Layer : 1 | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | 2. | ||
+ | || | ||
+ | <b>Data Cleaning</b> | ||
+ | || | ||
+ | <b>i. Import two CSV Files (Birds)</b> | ||
+ | * 2081 Training Birds (Metadata) | ||
+ | * 15 Test Birds (Provided by Kasios) | ||
+ | |||
+ | |||
+ | <b>ii. Fix Data Quality Issues</b> | ||
+ | * Change File ID from numeric to character | ||
+ | * Change coordinates to numeric | ||
+ | * Change Date from Character to Date | ||
+ | * Omit the two NA values for the Y coordinate. | ||
+ | * Clean the Dates (All standardise to m/d/y. For missing month/year, I will replace with NA. For missing day, I will impute as 1st day of the month.) | ||
+ | * Clean the Timing (Standardise all to 24 hour formatting. Use “.” instead of ":") | ||
+ | * Clean the Vocalisation Type (Standardise all to lower case. For values consisting of both ‘song and call’, change to ‘call’, assumed as a sign of distress while ‘song’ is assumed as the default) | ||
+ | * Clean the Quality (Recode ‘no score’ as ‘NA’) | ||
+ | |||
+ | |||
+ | |||
+ | <b>iii. Data Manipulation</b> | ||
+ | * Extract out the “Year” and “Month” from the date, as new columns | ||
+ | * Create a new column for Quarter (Q1,Q2,Q3,Q4) & Season (Spring, Summer, Fall, Winter) | ||
+ | |||
+ | |||
+ | <b>iv. Geospatial File Compatibility</b> | ||
+ | * Convert CSV file (2081 birds) into the following: | ||
+ | ** spatial point data frame | ||
+ | ** sp format | ||
+ | ** shp format | ||
+ | ** st_read compatible format | ||
+ | ** readOGR compatible format | ||
+ | ** ppp format (for spatstat compatibility) | ||
+ | |||
+ | |||
+ | <b>v. Data Overview & Exploration</b> | ||
+ | * Overlay 2081 Birds, Raster Map & Dumping Site, for an integrated overview using `plot()` | ||
+ | * Use `facet_wrap` to identify location of clustering across species, across time, and across season, and by call/song | ||
+ | |||
+ | |||
+ | <b>vi. Segregation of Treatment & Control Groups</b> | ||
+ | * Use ‘Rose Pipits’ as Treatment Group | ||
+ | * Use ‘Ordinary Snape’ and ‘Lesse Birchbeere’ as Control Groups | ||
+ | * Use ‘All Birds’ as third control | ||
+ | |||
+ | |- | ||
+ | | | ||
+ | 3. | ||
+ | || | ||
+ | <b>Geospatial Visualisation </b> | ||
+ | || | ||
+ | <b><u>Spatial Point Pattern Visualisation (Density-Based Measure) </u></b> | ||
+ | |||
+ | <b>i. Prepare polygon layer </b> | ||
+ | * Create a 200x200 spatial polygon to depict the boundaries of Lekagul raster map | ||
+ | * Merge Raster Polygon with Rose Pipit Layer, using `owin` from spatstat package | ||
+ | |||
+ | |||
+ | <b>ii. Kernel Density Plot </b> | ||
+ | * First, set sigma=bw.diggle | ||
+ | * Apply the Kernel Density Plot (By Year; 2012-2017) | ||
+ | ** For All Birds | ||
+ | ** For Rose Pipits only (Treatment Group) | ||
+ | ** For OS & LB only (Control Groups) | ||
+ | |||
+ | |||
+ | <b>iii. Adjust Parameters (sigma) </b> | ||
+ | * Adjust the plots by using the sigma of the most dense cluster | ||
+ | ** This is typically the largest sigma | ||
+ | |||
+ | |||
+ | <b>iv. Fine-Tune for Clearer Visualisation </b> | ||
+ | * Then add in the dumping site & adjust the colour/size | ||
+ | * So that we can visualize the clusters relative to the dumping site | ||
+ | |- | ||
+ | | | ||
+ | 4. | ||
+ | || | ||
+ | <b>Statistical Confirmation </b> | ||
+ | || | ||
+ | <b><u>Spatial Point Pattern Analysis (Distance-Based Measure)</u></b> | ||
+ | |||
+ | <b>i. Quadrat Analysis </b> | ||
+ | * Apply Monti-Carlo Simulation | ||
+ | * Followed by Quadrat Test to test for clustering | ||
+ | |||
+ | |||
+ | <b>ii. K-Nearest Neighbour </b> | ||
+ | * Apply Monti-Carlo Simulation | ||
+ | * Followed by Clark-Evans Test to test for clustering | ||
+ | |||
+ | |||
+ | <b>iii. K-Function </b> | ||
+ | * Apply Monti-Carlo simulation | ||
+ | * Visualise significance based on grey band | ||
+ | |||
+ | |- | ||
+ | | | ||
+ | 5. | ||
+ | || | ||
+ | <b>Audio Processing</b> | ||
+ | || | ||
+ | <b>i. Data Preparation (Density-Based Measure)</b> | ||
+ | * Read in MP3 Files (Training & Testing Data) | ||
+ | * Convert to .wav format using `writeWav()` | ||
+ | * Convert .wav files to data frame using `analyzeFolder()` | ||
+ | * Read in data frame | ||
+ | |||
+ | |||
+ | <b>ii. Audio Extraction & Manipulation </b> | ||
+ | * Extract only 1 of 2 channels (choose left). | ||
+ | * Convert each sound array to floating point values ranging from -1 to 1. | ||
+ | |||
+ | |||
+ | <b>iii. Adjust Parameters (sigma) </b> | ||
+ | * Adjust the plots by using the sigma of the most dense cluster | ||
+ | ** This is typically the largest sigma | ||
+ | |||
+ | |||
+ | <b>iv. Fine-Tune for Clearer Visualisation </b> | ||
+ | * Then add in the dumping site & adjust the colour/size | ||
+ | * So that we can visualize the clusters relative to the dumping site | ||
+ | |||
+ | |- | ||
+ | | | ||
+ | 6. | ||
+ | || | ||
+ | <b>Audio Visualisation</b> | ||
+ | || | ||
+ | <b>i. Amplitude Envelope Plot</b> | ||
+ | * Use diffenv() to plot the envelopes of the amplitutde plots | ||
+ | * Do this for all the 15 test birds | ||
+ | * Do this for 5 training birds per species and select most representative plot as your ‘dictionary’ | ||
+ | |||
+ | |||
+ | <b>ii. Oscillogram Plot</b> | ||
+ | * Use seewave package to plot osciilogram | ||
+ | * Do this for all the 15 test birds | ||
+ | * Do this for 5 of the training birds, per species and select most representative plot as your ‘dictionary’ | ||
+ | |||
+ | |||
+ | <b>iii. Distribution of audio parameters, using Trellis Plot</b> | ||
+ | * Out of the 15 attributes available after extracting the dataframe from the .wav files, the following 7 will be used for analysis: | ||
+ | * dom_median,HNR_median, meanFreq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median | ||
+ | * These were selected as they vary across the species more, than the other of the 8 variables | ||
+ | * Use ggplot() to plot a trellis plot using the 19 training species | ||
+ | * Label the mean | ||
+ | * Use ggplot() to insert the 15 testing birds | ||
+ | * Visualise and identify the top 3 closest species, per parameter | ||
+ | * Select the species based on no. of parameters closest to the training mean | ||
+ | |||
+ | |} | ||
+ | </div> | ||
+ | <br> |
Latest revision as of 22:42, 22 June 2018
|
|
|
|
|
Tools
R is the primary tool used in this analysis. The following lists the packages used for the project’s scope - for data cleaning, data visualisation, geospatial analysis and audio processing.
|
Approach Taken
The following outlines the 6 broad steps used for the analysis - Data Cleaning, Data Preparation, Geospatial Visualisation, Statistical Confirmation, Audio Processing and Audio Visualisation.
Step |
Approach |
Description |
1. |
Data Understanding |
i. Read in Raster Layer (Lekagul Roadways Map)
class : RasterLayer
|
2. |
Data Cleaning |
i. Import two CSV Files (Birds)
iii. Data Manipulation
|
3. |
Geospatial Visualisation |
Spatial Point Pattern Visualisation (Density-Based Measure) i. Prepare polygon layer
|
4. |
Statistical Confirmation |
Spatial Point Pattern Analysis (Distance-Based Measure) i. Quadrat Analysis
|
5. |
Audio Processing |
i. Data Preparation (Density-Based Measure)
|
6. |
Audio Visualisation |
i. Amplitude Envelope Plot
|