Difference between revisions of "ISSS608 2017-18 T3 Assign Tan Yong Ying Data Overview and Cleaning"
Jump to navigation
Jump to search
Yy.tan.2017 (talk | contribs) |
Yy.tan.2017 (talk | contribs) |
||
Line 79: | Line 79: | ||
# '''File ID'''<br>This variable has no invalid values, meaning all File IDs are valid integer values.<br>[[File:FileIDSummary.png|400px|none]]<br> | # '''File ID'''<br>This variable has no invalid values, meaning all File IDs are valid integer values.<br>[[File:FileIDSummary.png|400px|none]]<br> | ||
# '''English_name'''<br>This variable has no invalid values. The summary shows we have recordings from 19 unique known species in the Preserve provided to us.<br>[[File:English_nameSummary.png|1000px|none]]<br> | # '''English_name'''<br>This variable has no invalid values. The summary shows we have recordings from 19 unique known species in the Preserve provided to us.<br>[[File:English_nameSummary.png|1000px|none]]<br> | ||
− | # '''Vocalization_Type'''<br>There is an invalid value of “?” for some rows. In any analysis of bird sounds, it is important to differentiate between songs and calls because they play different roles in the communication of birds. Bird songs are usually used by male birds to establish their territories and attract female birds during the breeding season. On the other hand, bird calls are functional and used to coordinate behavior between pairs or birds in a flock. Thus, any records of unknown vocalization type are excluded from our analysis.<br>[[File:VocalizationScreenshot.png|800px|none]]<br>There are 10 unique values in total (see screenshot below). Since bird sounds are commonly differentiated as a "song" or "call", I reduced the number of level in this variable to the three most common values: "call", "song" and "call,song" (case sensitive).[[File:VocalizationSummary.png|800px|none]] | + | # '''Vocalization_Type'''<br>There is an invalid value of “?” for some rows. In any analysis of bird sounds, it is important to differentiate between songs and calls because they play different roles in the communication of birds. Bird songs are usually used by male birds to establish their territories and attract female birds during the breeding season. On the other hand, bird calls are functional and used to coordinate behavior between pairs or birds in a flock. Thus, any records of unknown vocalization type are excluded from our analysis.<br>[[File:VocalizationScreenshot.png|800px|none]]<br>There are 10 unique values in total (see screenshot below). Since bird sounds are commonly differentiated as a "song" or "call", I reduced the number of level in this variable to the three most common values: "call", "song" and "call,song" (case sensitive). Records that do not contain the words "call" or "song" for Vocalization_type are removed from further analysis.<br>[[File:VocalizationSummary.png|800px|none]]<br> |
+ | # '''Quality'''<br>The summary shows there are six levels in this variable: “A”, “B”, “C”, “D”, “E” and “no score”. Although the value “no score” does not indicate the quality of the sound file, it is not a vital piece of information in my analysis because I prioritized files of “A” quality in the comparison of known files against Kasios files. Therefore I did not delete any records based on their Quality value. Instead, the 6 levels can be used as a filter requirement during analysis and application development later on.<br>[[File:QualitySummary.png|600px|none]] | ||
Banner image credit to: [https://www.flickr.com/photos/23660854@N07/24385545393 Marshal Hedin] | Banner image credit to: [https://www.flickr.com/photos/23660854@N07/24385545393 Marshal Hedin] |
Revision as of 14:28, 7 July 2018
|
|
|
|
|
Data Overview
For this challenge, we were provided with the following data:
Data Cleaning
Out of the 5 pieces of data listed above, only AllBirdsv4.csv requires data cleaning to remove values that cannot be imputed or replaced manually through guessing or inference. The data cleaning outcome for each variable in AllBirdsv4.csv is as follows:
- File ID
This variable has no invalid values, meaning all File IDs are valid integer values. - English_name
This variable has no invalid values. The summary shows we have recordings from 19 unique known species in the Preserve provided to us. - Vocalization_Type
There is an invalid value of “?” for some rows. In any analysis of bird sounds, it is important to differentiate between songs and calls because they play different roles in the communication of birds. Bird songs are usually used by male birds to establish their territories and attract female birds during the breeding season. On the other hand, bird calls are functional and used to coordinate behavior between pairs or birds in a flock. Thus, any records of unknown vocalization type are excluded from our analysis.
There are 10 unique values in total (see screenshot below). Since bird sounds are commonly differentiated as a "song" or "call", I reduced the number of level in this variable to the three most common values: "call", "song" and "call,song" (case sensitive). Records that do not contain the words "call" or "song" for Vocalization_type are removed from further analysis. - Quality
The summary shows there are six levels in this variable: “A”, “B”, “C”, “D”, “E” and “no score”. Although the value “no score” does not indicate the quality of the sound file, it is not a vital piece of information in my analysis because I prioritized files of “A” quality in the comparison of known files against Kasios files. Therefore I did not delete any records based on their Quality value. Instead, the 6 levels can be used as a filter requirement during analysis and application development later on.
Banner image credit to: Marshal Hedin