[METADATA]
template: "post"
draft: false
category: "Deployment"
tags: "Dataset","Parse"
description: "如何利用Python下載龐大的AudioSet數據集,並且進行數據驗證於後處理。"
這裏只挑選幾個值得一說的點來寫,剩下的都在完整代碼中有所涉及。
下面這個樣例是執行程序時終端的輸出,也代表了下載的整體流程。
其中mP0bdHdXLyo_1000裏面的mP0bdHdXLyo代表YouTube的VideoID,1000代表是從第1000毫秒開始,往後10秒的音頻片段。
下載過程中會產生視頻文件mP0bdHdXLyo_1000.m4a,完整的音頻文件mP0bdHdXLyo_1000_temp.wav,剪切好的音頻文件mP0bdHdXLyo_1000.wav。
如果不足十秒,例如這裏檢查相差6892個樣點,我選擇對其補零處理,當然根據需要也可以不這麼做。
爲了統計那些文件是補零過的,我們把補過零的文件放到了單獨的strong_label_train_padded文件夾。
July 06, 21:48:12 @35396 INFO =======mP0bdHdXLyo_1000=======
July 06, 21:48:12 @35396 INFO > download_video
July 06, 21:48:20 @35396 INFO Downloaded: mP0bdHdXLyo_1000.m4a
July 06, 21:48:20 @35396 INFO Duration: converted audio: 11s
July 06, 21:48:20 @35396 INFO > convert_to_audio
July 06, 21:48:20 @35396 INFO Converted: mP0bdHdXLyo_1000.m4a to mP0bdHdXLyo_1000_temp.wav
July 06, 21:48:20 @35396 WARNING Removed: mP0bdHdXLyo_1000.m4a
July 06, 21:48:20 @35396 INFO Duration: converted audio: 11s
July 06, 21:48:20 @35396 INFO > trim_audio
July 06, 21:48:20 @35396 INFO Trimmed: mP0bdHdXLyo_1000_temp.wav to mP0bdHdXLyo_1000.wav
July 06, 21:48:20 @35396 WARNING Removed: mP0bdHdXLyo_1000_temp.wav
July 06, 21:48:20 @35396 WARNING > padding
July 06, 21:48:20 @35396 WARNING Diff: 6892
July 06, 21:48:20 @35396 WARNING Removed: /AudioSet/strong_label_train/mP0bdHdXLyo_1000.wav
July 06, 21:48:20 @35396 INFO > checking
July 06, 21:48:20 @35396 INFO Checking: /AudioSet/strong_label_train_padded/mP0bdHdXLyo_1000.wav
July 06, 21:48:20 @35396 INFO Checking: channels: 1
July 06, 21:48:20 @35396 INFO Checking: samplerate: 44100
July 06, 21:48:20 @35396 INFO Checking: precision: 16-bit
July 06, 21:48:20 @35396 INFO Checking: duration: 00:00:10.00
July 06, 21:48:20 @35396 INFO Finished: mP0bdHdXLyo_1000
所有代碼都在github.com/km4sh/fetch-audioset這個repo中。