Difference between revisions of "ISSS608 2017-18 T3 Assign Wang Runyu Data Preparation"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 74: Line 74:
 
4. Call analyzeFolder() to read all the wav file as dataframe. And store the dataframe as csv format. Audio files in both All Birds and Test Birds from Kasios are required to process as above
 
4. Call analyzeFolder() to read all the wav file as dataframe. And store the dataframe as csv format. Audio files in both All Birds and Test Birds from Kasios are required to process as above
  
The actual code can be found at [https://github.com/runyu/B6_Visual_Analytics/blob/master/Assignment_MC1_audio_preparation.Rmd here]
+
The actual code for audio processing can be found at [https://github.com/runyu/B6_Visual_Analytics/blob/master/Assignment_MC1_audio_preparation.Rmd here]

Revision as of 23:45, 8 July 2018

Brambling20male-zzzzzzzz.jpg VAST Mini Challenge 1: "Cheep" Shots?

Background

Data Preparation

Data Visualization

Conclusions

 


Data Description

“ALL BIRDS.zip” contain calls and songs from the known birds in the Boonsong Lekagul Wildlife Preserve. These files are MP3 format and are of varying lengths. The name contains an integer that refers to the metadata about the particular bird and audio file in file “AllBirdsv4.csv”.

“AllBirdsv4.csv” is the metadata file for the bird sounds files. The File ID field is the index to the file names in the ALL BIRDS file collection. The English_name is the common English name for the particular bird. The Vocalization_type is the kind of bird sound it is: a call, a song, or some other particular sound. The researchers did not provide additional descriptions of the differences between the vocalizations, so you will have to manage this data as best you can. Quality is a score A, B, C, D, or E. These provide a qualitative measure of the quality of the bird sound, e.g., purity, lack of background noise, and so on. Again, the researchers did not provide additional descriptions of this field. Time and Date are for the capture of the sound. Finally, X and Y are the coordinates on the enclosed map of where the sound was recorded (see the map information below). Example: File ID,English_name,Vocalization_type,Quality,Time,Date,X,Y 402254,Rose-crested Blue Pipit,call,no score,13:30,2/8/2018,49,63 406171,Rose-crested Blue Pipit,call,A,7:48,6/7/2017,125,133 405901,Rose-crested Blue Pipit,call,A,12:00,2/8/2018,58,76

“Lekagul Roadways 2018” is a 200 x 200 pixel map of the Preserve, with general indications of roadways through the site. The coordinates from AllBirdsv4 should be considered from bottom left to top right, (0,0) to (199,199). The alleged dumping site for the Kasios waste products was centered around coordinates (148,159). Its extent has not been thoroughly studied.

Lekagul Roadways 2018.jpg

“Test Birds from Kasios” are the bird sounds Kasios claims as Pipits from across the Preserve. All of these recording were taken over the past couple of months. “Test Bird Locations” indicate where in the pixel map the bird sounds were recorded. Example: ID, X, Y 1,140,119 2,63,153

Tools

The following tools have been used in this assignment

1. R - audio file processing, audio file visualization and audio file machine learning classification model. The following packages are used in this assignment: corrplot, ggpubr, GGally, tidyverse, nnet,caret, MLmetrics, rpart.plot, ggplot2, soundgen, tuneR, seewave

2. JMP Pro - csv data preparation

3. Tableau - csv data visualization

Text File Preparation

Text file AllBirdsV4.csv's format contains inconsistent data values and missing data.

1. Format Date Field to MM/DD/YYYY

2. Omit data with no date value. As date field is important for us to identify the existent of certain bird specie

3. Recode the time to 24 hour scale with format hh:mm. Recode empty time to 12:00, early morning to 8:00 and am to 9:00

Audio File Processing

1. To process the audio files, following R packages are loaded in this assignment: soundgen, tuneR, seewave

2. As the function analyzeFolder() which converts audio files to dataframe can only read WAV format, it is necessary for me to convert MP3 format to WAV format. In the first step, I convert the all the MP3 audio files to WAV format

3. Not all the audio files are good quality. Some audio contains noise which will distract the audio classification task. Extract all the audio files which quality is 'A'

4. Call analyzeFolder() to read all the wav file as dataframe. And store the dataframe as csv format. Audio files in both All Birds and Test Birds from Kasios are required to process as above

The actual code for audio processing can be found at here