ISSS608 2017-18 T3 Assign NEVIL BRUNO Data Prep

From Visual Analytics and Applications
Jump to navigation Jump to search
Nevil banner.jpg

OVERVIEW

DATA PREPARATION

Q1: PATTERNS

Q2: CHIRP-CHIRP!

Q3: WHAT NOW?

REFERENCES

BACK TO DROPBOX

 

Data

For our analysis, the following datasets are at our disposal:

Nevil Data Table.JPG

Tools Used

For the data prep, analysis, and visualizations, the following tools will be used:
1. SAS JMP Pro 13
2. Microsoft Excel
3. Tableau Desktop: Professional Edition
4. R Studio

  • plotly
  • tuneR

5. Microsoft Paint
6. Photo to GIF: GIF Maker

Data Prep

On initial exploratory data analysis, we find that there are no issues with the MP3 files and map. For the CSV files, there are format issues in the vocalization type, and date and time fields. We can use SAS JMP to recode these values to a standard format. For the vocalization type, we recode the values to a standard nomenclature format. For the dates, we will be using the dd/mm/yyyy format. For time, we use the 24-hour format.

Date and Time Recoding


There are no format issues with the File IDs, Bird names, Quality scores, X&Y coordinates for both files. For missing values, we will be excluding them from the analysis.
Preliminary EDA gives us the following insights:

1. There are 19 unique bird species. Out of the 2081 records, 186 are of the Rose-Crested Blue Pipit
Nevil eda1.jpg

2. The Vocalization type is mainly ‘Songs’ and ‘Calls’
Nevil eda2.jpg

3. There are 5 levels for audio quality: A, B, C, D, E. In the next section, we will analyse the audio samples to determine what each grade signifies.
Nevil eda3.jpg

4. There are very few recordings pre-2007. There is also lack of data in 2018.
Nevil eda4.jpg

5. Most of the recordings took place during the morning (06:00 to 12:00)
Nevil eda5.jpg

6. Number of pipit recordings have a significant number post 2009.
Nevil eda6.jpg

7. Looking at the overall data at monthly level granularity, it can be observed that the number of birds for most of the months is very low. The month of May has on average high number of recordings compared to the other months, especially between 2011 and 2017. Apart from this, there are a lot of months which have no recordings.
Nevil eda7.jpg

Map Prep

The map provided is a 200 X 200 .bmp image of the reserve. For analysing patterns and anomalies on across the Preserve, the map can be imported into tableau and will be used in our analysis:

Uploading the Map on Tableau Desktop


We are also provided with coordinates for the dumping ground (148,159). For our analysis, we would require an area around the point. This is done so that we can analyze the dumping site, and its immediate surrounding area. A square 20 X 20 area centered around the dumping ground can be created by plotting the corner coordinates, and creating a shaded area representing the square on Tableau . The coordinates for the square corners are as follows:

  • TOP LEFT: (128,179)
  • TOP RIGHT: (168,179)
  • BOTTOM LEFT: (128,139)
  • BOTTOM RIGHT: (168,139)

Nevil area.jpg
This area can be used to determine if the dumping site affects the bid scatter across the preserve.

Final Decisions & Assumptions before Analysis

1.2018 data is not a sign of decline in numbers, it is the lack of data recorded during this time which translates to the low numbers.
2.Due to the lower numbers pre-2007, we will be excluding them from the analysis. We will be keeping our analysis time-period from 2007-2017 inclusive.
3.Due to irregularities at monthly level data, the analysis will be kept at a yearly level.
4.The 20 X 20 area marked on the map is the zone that is affected by the dumping at point (148,159).