Difference between revisions of "ISSS608 2017-18 T3 Assign Lu Yanzhang Data Preparation Methodology"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 42: Line 42:
 
==Timestamp Transformation in Python==
 
==Timestamp Transformation in Python==
 
The raw timestamp format is the second record from '2015-05-11 14:00:00'.  
 
The raw timestamp format is the second record from '2015-05-11 14:00:00'.  
For further use of timestamp data, the format needs to be transformed to MMMM-YY-DD rather than raw second format.
 
  
1. Format Date Field to MM/DD/YYYY
+
For further use of timestamp data, the format needs to be transformed to '''YYYY/MM//DD''' rather than raw second format.
  
2. Omit data with no date value. As date field is important for us to identify the existent of certain bird specie
+
==Join operation among diverse tables in JMP==
  
3. Recode the time to 24 hour scale with format hh:mm. Recode empty time to 12:00, early morning to 8:00 and am to 9:00
+
Join the tables where the source or target is “suspicious” and select out the suspicious transactions for the further visualizations in Tableau and social network analytics in Gephi
  
==Audio File Processing==
+
==Social network modeling in Gephi==
1. To process the audio files, following R packages are loaded in this assignment: soundgen, tuneR, seewave
+
Import the suspicious data file into Gephi and model the data with two methodologies:
  
2. As the function analyzeFolder() which converts audio files to dataframe can only read WAV format, it is necessary for me to convert MP3 format to WAV format. In the first step, I convert the all the MP3 audio files to WAV format
+
1. Eigenvalue centrality for vertex importance calculation.
  
3. Not all the audio files are good quality. Some audio contains noise which will distract the audio classification task. Extract all the audio files which quality is 'A'
+
2. Modularity for clustering calculation.
  
4. Call analyzeFolder() to read all the wav file as dataframe. And store the dataframe as csv format. Audio files in both All Birds and Test Birds from Kasios are required to process as above
+
==Visualization in Tableau==
  
The actual code for audio processing can be found at [https://github.com/runyu/B6_Visual_Analytics/blob/master/Assignment_MC1_audio_preparation.Rmd here]
+
1. Visualize the communication table by day and by month to interpret the growth from 2015 to 2017.
 +
 
 +
2. Visualize the suspicious staffs' activities.

Latest revision as of 14:52, 10 July 2018

MC3 2018.jpg

VAST Challenge 2018 MC3:
Who hurts the brid?

INTRODUCTION

DATA PREPARATION & METHODOLOGY

OBSERVATION AND INSIGHTS

Back to Dropbox

 


Tools

The following tools have been used in this assignment

1. Python - Timestamp transformation and new data source generation.

The following packages are used in this assignment: pandas, numpy, glob, datetime.

2. JMP Pro - Data preparation

3. Tableau - Visualization

4. Gephi - Social network modeling and visualization

Timestamp Transformation in Python

The raw timestamp format is the second record from '2015-05-11 14:00:00'.

For further use of timestamp data, the format needs to be transformed to YYYY/MM//DD rather than raw second format.

Join operation among diverse tables in JMP

Join the tables where the source or target is “suspicious” and select out the suspicious transactions for the further visualizations in Tableau and social network analytics in Gephi

Social network modeling in Gephi

Import the suspicious data file into Gephi and model the data with two methodologies:

1. Eigenvalue centrality for vertex importance calculation.

2. Modularity for clustering calculation.

Visualization in Tableau

1. Visualize the communication table by day and by month to interpret the growth from 2015 to 2017.

2. Visualize the suspicious staffs' activities.