Difference between revisions of "ISSS608 2017-18 T3 Assign Lu Yanzhang Data Preparation Methodology"
Line 44: | Line 44: | ||
For further use of timestamp data, the format needs to be transformed to YYYY/MM//DD rather than raw second format. | For further use of timestamp data, the format needs to be transformed to YYYY/MM//DD rather than raw second format. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==Audio File Processing== | ==Audio File Processing== |
Revision as of 14:01, 10 July 2018
|
|
|
|
Tools
The following tools have been used in this assignment
1. Python - Timestamp transformation and new data source generation.
The following packages are used in this assignment: pandas, numpy, glob, datetime.
2. JMP Pro - Data preparation
3. Tableau - Visualization
4. Gephi - Social network modeling and visualization
Timestamp Transformation in Python
The raw timestamp format is the second record from '2015-05-11 14:00:00'.
For further use of timestamp data, the format needs to be transformed to YYYY/MM//DD rather than raw second format.
Audio File Processing
1. To process the audio files, following R packages are loaded in this assignment: soundgen, tuneR, seewave
2. As the function analyzeFolder() which converts audio files to dataframe can only read WAV format, it is necessary for me to convert MP3 format to WAV format. In the first step, I convert the all the MP3 audio files to WAV format
3. Not all the audio files are good quality. Some audio contains noise which will distract the audio classification task. Extract all the audio files which quality is 'A'
4. Call analyzeFolder() to read all the wav file as dataframe. And store the dataframe as csv format. Audio files in both All Birds and Test Birds from Kasios are required to process as above
The actual code for audio processing can be found at here