Difference between revisions of "ISSS608 2017-18 T3 Assign Lu Yanzhang Data Preparation Methodology"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(19 intermediate revisions by the same user not shown)
Line 29: Line 29:
 
The following tools have been used in this assignment
 
The following tools have been used in this assignment
  
1. '''R''' - audio file processing, audio file visualization and audio file machine learning classification model.  
+
1. '''Python''' - Timestamp transformation and new data source generation.
 +
 
 
The following packages are used in this assignment:
 
The following packages are used in this assignment:
corrplot, ggpubr, GGally, tidyverse, nnet, caret, MLmetrics, rpart.plot, ggplot2,
+
pandas, numpy, glob, datetime.
soundgen, tuneR, seewave
+
 
 +
2. '''JMP Pro''' - Data preparation
 +
 
 +
3. '''Tableau''' - Visualization
  
2. '''JMP Pro''' - csv data preparation
+
4. '''Gephi''' - Social network modeling and visualization
  
3. '''Tableau''' - csv data visualization
+
==Timestamp Transformation in Python==
 +
The raw timestamp format is the second record from '2015-05-11 14:00:00'.
  
==Text File Preparation==
+
For further use of timestamp data, the format needs to be transformed to '''YYYY/MM//DD''' rather than raw second format.
Text file AllBirdsV4.csv's format contains inconsistent data values and missing data.
 
  
1. Format Date Field to MM/DD/YYYY
+
==Join operation among diverse tables in JMP==
  
2. Omit data with no date value. As date field is important for us to identify the existent of certain bird specie
+
Join the tables where the source or target is “suspicious” and select out the suspicious transactions for the further visualizations in Tableau and social network analytics in Gephi
  
3. Recode the time to 24 hour scale with format hh:mm. Recode empty time to 12:00, early morning to 8:00 and am to 9:00
+
==Social network modeling in Gephi==
 +
Import the suspicious data file into Gephi and model the data with two methodologies:
  
==Audio File Processing==
+
1. Eigenvalue centrality for vertex importance calculation.
1. To process the audio files, following R packages are loaded in this assignment: soundgen, tuneR, seewave
 
  
2. As the function analyzeFolder() which converts audio files to dataframe can only read WAV format, it is necessary for me to convert MP3 format to WAV format. In the first step, I convert the all the MP3 audio files to WAV format
+
2. Modularity for clustering calculation.
  
3. Not all the audio files are good quality. Some audio contains noise which will distract the audio classification task. Extract all the audio files which quality is 'A'
+
==Visualization in Tableau==
  
4. Call analyzeFolder() to read all the wav file as dataframe. And store the dataframe as csv format. Audio files in both All Birds and Test Birds from Kasios are required to process as above
+
1. Visualize the communication table by day and by month to interpret the growth from 2015 to 2017.
  
The actual code for audio processing can be found at [https://github.com/runyu/B6_Visual_Analytics/blob/master/Assignment_MC1_audio_preparation.Rmd here]
+
2. Visualize the suspicious staffs' activities.

Latest revision as of 14:52, 10 July 2018

MC3 2018.jpg

VAST Challenge 2018 MC3:
Who hurts the brid?

INTRODUCTION

DATA PREPARATION & METHODOLOGY

OBSERVATION AND INSIGHTS

Back to Dropbox

 


Tools

The following tools have been used in this assignment

1. Python - Timestamp transformation and new data source generation.

The following packages are used in this assignment: pandas, numpy, glob, datetime.

2. JMP Pro - Data preparation

3. Tableau - Visualization

4. Gephi - Social network modeling and visualization

Timestamp Transformation in Python

The raw timestamp format is the second record from '2015-05-11 14:00:00'.

For further use of timestamp data, the format needs to be transformed to YYYY/MM//DD rather than raw second format.

Join operation among diverse tables in JMP

Join the tables where the source or target is “suspicious” and select out the suspicious transactions for the further visualizations in Tableau and social network analytics in Gephi

Social network modeling in Gephi

Import the suspicious data file into Gephi and model the data with two methodologies:

1. Eigenvalue centrality for vertex importance calculation.

2. Modularity for clustering calculation.

Visualization in Tableau

1. Visualize the communication table by day and by month to interpret the growth from 2015 to 2017.

2. Visualize the suspicious staffs' activities.