Difference between revisions of "ISSS608 2017-18 T3 Assign NEVIL BRUNO Data Prep"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[File:Nevil_banner.jpg|1000px|centre]]
+
[[File:Nevil_banner.jpg|1030px|centre]]
  
 
<!--MAIN HEADER -->  
 
<!--MAIN HEADER -->  
Line 13: Line 13:
 
| style="font-family:Trebuchet MS; font-size:100%; solid #000000; background:#ffcc00; text-align:center;" width="14.3%" |   
 
| style="font-family:Trebuchet MS; font-size:100%; solid #000000; background:#ffcc00; text-align:center;" width="14.3%" |   
 
;  
 
;  
[[ISSS608_2017-18_T3_Assign_NEVIL_BRUNO_Q1| <font color="#000000">Q1: PATTERNS EVERYWHERE</font>]]  
+
[[ISSS608_2017-18_T3_Assign_NEVIL_BRUNO_Q1| <font color="#000000">Q1: PATTERNS</font>]]  
 
   
 
   
 
| style="font-family:Trebuchet MS; font-size:100%; solid #000000; background:#ffcc00; text-align:center;" width="14.3%" |   
 
| style="font-family:Trebuchet MS; font-size:100%; solid #000000; background:#ffcc00; text-align:center;" width="14.3%" |   
Line 25: Line 25:
 
| style="font-family:Trebuchet MS; font-size:100%; solid #000000; background:#ffcc00; text-align:center;" width="14.3%" |   
 
| style="font-family:Trebuchet MS; font-size:100%; solid #000000; background:#ffcc00; text-align:center;" width="14.3%" |   
 
;  
 
;  
[[ISSS608_2017-18_T3_Assign_NEVIL_BRUNO_References| <font color="#000000">ACKNOWLEDGMENTS & REFERENCES</font>]]  
+
[[ISSS608_2017-18_T3_Assign_NEVIL_BRUNO_References| <font color="#000000">REFERENCES</font>]]  
 
    
 
    
 
| style="font-family:Trebuchet MS; font-size:100%; solid #000000; background:#ffcc00; text-align:center;" width="14.3%" |   
 
| style="font-family:Trebuchet MS; font-size:100%; solid #000000; background:#ffcc00; text-align:center;" width="14.3%" |   
Line 56: Line 56:
 
<font size = 3; color="#000000"><span style="font-family:Trebuchet MS; font-weight:bold;"><h2>Data Prep</h2></span></font>  
 
<font size = 3; color="#000000"><span style="font-family:Trebuchet MS; font-weight:bold;"><h2>Data Prep</h2></span></font>  
 
</div>
 
</div>
On initial exploratory data analysis, we find that there are no issues with the MP3 files and map. For the CSV files, there are format issues in the date and time fields. We can use SAS JMP to recode these values to a standard format. For the dates, we will be using the dd/mm/yyyy format. For time, we use the 24-hour format.<br>
+
On initial exploratory data analysis, we find that there are no issues with the MP3 files and map. For the CSV files, there are format issues in the vocalization type, and date and time fields. We can use SAS JMP to recode these values to a standard format. For the vocalization type, we recode the values to a standard nomenclature format. For the dates, we will be using the dd/mm/yyyy format. For time, we use the 24-hour format.<br>
 
[[File:Nevil_dataprep1.jpg|800px|center|thumb|float|Date and Time Recoding]]<br>
 
[[File:Nevil_dataprep1.jpg|800px|center|thumb|float|Date and Time Recoding]]<br>
 
There are no format issues with the File IDs, Bird names, Quality scores, X&Y coordinates for both files. For missing values, we will be excluding them from the analysis.
 
There are no format issues with the File IDs, Bird names, Quality scores, X&Y coordinates for both files. For missing values, we will be excluding them from the analysis.
Line 73: Line 73:
 
6. Number of pipit recordings have a significant number post 2009. <br>
 
6. Number of pipit recordings have a significant number post 2009. <br>
 
[[File:nevil_eda6.jpg|700px||]]<br><br>
 
[[File:nevil_eda6.jpg|700px||]]<br><br>
7. Looking at the overall data at monthly level granularity, it can be observed that the number of birds for most of the months is very low. Apart from this, there are a lot of months which have no recordings.<br>
+
7. Looking at the overall data at monthly level granularity, it can be observed that the number of birds for most of the months is very low. The month of May has on average high number of recordings compared to the other months, especially between 2011 and 2017. Apart from this, there are a lot of months which have no recordings.<br>
 
[[File:Nevil eda7.jpg|800px||]]<br><br>
 
[[File:Nevil eda7.jpg|800px||]]<br><br>
 
<div style=" text-align:left">  
 
<div style=" text-align:left">  
 
<font size = 3; color="#000000"><span style="font-family:Trebuchet MS; font-weight:bold;"><h2>Map Prep</h2></span></font>  
 
<font size = 3; color="#000000"><span style="font-family:Trebuchet MS; font-weight:bold;"><h2>Map Prep</h2></span></font>  
 
</div>
 
</div>
The map provided is a 200 X 200 .bmp image of the reserve. For analysing patterns and anomalies on across the Preserve, the map can be imported into tableau:<br>
+
The map provided is a 200 X 200 .bmp image of the reserve. For analysing patterns and anomalies on across the Preserve, the map can be imported into tableau and will be used in our analysis:<br>
+
[[File:Nevil map upload.jpg|350px|center|thumb|float|Uploading the Map on Tableau Desktop]]<br>
We are also provided with coordinates for the dumping ground <b>(148,159)</b>. For our analysis, we would require an area around the point. A square 20 X 20 area centred around the dumping ground can be created. The coordinates for the square are as follows:<br>
+
We are also provided with coordinates for the dumping ground <b>(148,159)</b>. For our analysis, we would require an area around the point. This is done so that we can analyze the dumping site, and its immediate surrounding area. A square 20 X 20 area centered around the dumping ground can be created by plotting the corner coordinates, and creating a shaded area representing the square on Tableau . The coordinates for the square corners are as follows:<br>
 
*TOP LEFT: (128,179)<br>
 
*TOP LEFT: (128,179)<br>
 
*TOP RIGHT: (168,179)<br>
 
*TOP RIGHT: (168,179)<br>
 
*BOTTOM LEFT: (128,139)<br>
 
*BOTTOM LEFT: (128,139)<br>
*BOTTOM RIGHT: (168,139)<br>
+
*BOTTOM RIGHT: (168,139)<br><br>
 +
[[File:Nevil area.jpg|550px|]]<br>  
 
This area can be used to determine if the dumping site affects the bid scatter across the preserve. <br>
 
This area can be used to determine if the dumping site affects the bid scatter across the preserve. <br>
 
<div style=" text-align:left">  
 
<div style=" text-align:left">  

Latest revision as of 18:21, 8 July 2018

Nevil banner.jpg

OVERVIEW

DATA PREPARATION

Q1: PATTERNS

Q2: CHIRP-CHIRP!

Q3: WHAT NOW?

REFERENCES

BACK TO DROPBOX

 

Data

For our analysis, the following datasets are at our disposal:

Nevil Data Table.JPG

Tools Used

For the data prep, analysis, and visualizations, the following tools will be used:
1. SAS JMP Pro 13
2. Microsoft Excel
3. Tableau Desktop: Professional Edition
4. R Studio

  • plotly
  • tuneR

5. Microsoft Paint
6. Photo to GIF: GIF Maker

Data Prep

On initial exploratory data analysis, we find that there are no issues with the MP3 files and map. For the CSV files, there are format issues in the vocalization type, and date and time fields. We can use SAS JMP to recode these values to a standard format. For the vocalization type, we recode the values to a standard nomenclature format. For the dates, we will be using the dd/mm/yyyy format. For time, we use the 24-hour format.

Date and Time Recoding


There are no format issues with the File IDs, Bird names, Quality scores, X&Y coordinates for both files. For missing values, we will be excluding them from the analysis.
Preliminary EDA gives us the following insights:

1. There are 19 unique bird species. Out of the 2081 records, 186 are of the Rose-Crested Blue Pipit
Nevil eda1.jpg

2. The Vocalization type is mainly ‘Songs’ and ‘Calls’
Nevil eda2.jpg

3. There are 5 levels for audio quality: A, B, C, D, E. In the next section, we will analyse the audio samples to determine what each grade signifies.
Nevil eda3.jpg

4. There are very few recordings pre-2007. There is also lack of data in 2018.
Nevil eda4.jpg

5. Most of the recordings took place during the morning (06:00 to 12:00)
Nevil eda5.jpg

6. Number of pipit recordings have a significant number post 2009.
Nevil eda6.jpg

7. Looking at the overall data at monthly level granularity, it can be observed that the number of birds for most of the months is very low. The month of May has on average high number of recordings compared to the other months, especially between 2011 and 2017. Apart from this, there are a lot of months which have no recordings.
Nevil eda7.jpg

Map Prep

The map provided is a 200 X 200 .bmp image of the reserve. For analysing patterns and anomalies on across the Preserve, the map can be imported into tableau and will be used in our analysis:

Uploading the Map on Tableau Desktop


We are also provided with coordinates for the dumping ground (148,159). For our analysis, we would require an area around the point. This is done so that we can analyze the dumping site, and its immediate surrounding area. A square 20 X 20 area centered around the dumping ground can be created by plotting the corner coordinates, and creating a shaded area representing the square on Tableau . The coordinates for the square corners are as follows:

  • TOP LEFT: (128,179)
  • TOP RIGHT: (168,179)
  • BOTTOM LEFT: (128,139)
  • BOTTOM RIGHT: (168,139)

Nevil area.jpg
This area can be used to determine if the dumping site affects the bid scatter across the preserve.

Final Decisions & Assumptions before Analysis

1.2018 data is not a sign of decline in numbers, it is the lack of data recorded during this time which translates to the low numbers.
2.Due to the lower numbers pre-2007, we will be excluding them from the analysis. We will be keeping our analysis time-period from 2007-2017 inclusive.
3.Due to irregularities at monthly level data, the analysis will be kept at a yearly level.
4.The 20 X 20 area marked on the map is the zone that is affected by the dumping at point (148,159).