[METADATA]
template: "post"
draft: false
category: "Deployment"
tags: "Dataset","Parse"
description: "如何利用Python下載龐大的AudioSet數據集,並且進行數據驗證於後處理。"

寫在前面

涉及的技術點

這裏只挑選幾個值得一說的點來寫,剩下的都在完整代碼中有所涉及。

終端輸出樣例

下面這個樣例是執行程序時終端的輸出,也代表了下載的整體流程。

其中mP0bdHdXLyo_1000裏面的mP0bdHdXLyo代表YouTube的VideoID,1000代表是從第1000毫秒開始,往後10秒的音頻片段。

下載過程中會產生視頻文件mP0bdHdXLyo_1000.m4a,完整的音頻文件mP0bdHdXLyo_1000_temp.wav,剪切好的音頻文件mP0bdHdXLyo_1000.wav

如果不足十秒,例如這裏檢查相差6892個樣點,我選擇對其補零處理,當然根據需要也可以不這麼做。

爲了統計那些文件是補零過的,我們把補過零的文件放到了單獨的strong_label_train_padded文件夾。

July 06, 21:48:12 @35396 INFO     =======mP0bdHdXLyo_1000=======
July 06, 21:48:12 @35396 INFO      > download_video
July 06, 21:48:20 @35396 INFO     Downloaded: 	mP0bdHdXLyo_1000.m4a
July 06, 21:48:20 @35396 INFO     Duration: 	converted audio: 11s
July 06, 21:48:20 @35396 INFO      > convert_to_audio
July 06, 21:48:20 @35396 INFO     Converted: 	mP0bdHdXLyo_1000.m4a to mP0bdHdXLyo_1000_temp.wav
July 06, 21:48:20 @35396 WARNING  Removed: 	mP0bdHdXLyo_1000.m4a
July 06, 21:48:20 @35396 INFO     Duration: 	converted audio: 11s
July 06, 21:48:20 @35396 INFO      > trim_audio
July 06, 21:48:20 @35396 INFO     Trimmed: 	mP0bdHdXLyo_1000_temp.wav to mP0bdHdXLyo_1000.wav
July 06, 21:48:20 @35396 WARNING  Removed: 	mP0bdHdXLyo_1000_temp.wav
July 06, 21:48:20 @35396 WARNING   > padding
July 06, 21:48:20 @35396 WARNING  Diff: 6892
July 06, 21:48:20 @35396 WARNING  Removed: 	/AudioSet/strong_label_train/mP0bdHdXLyo_1000.wav
July 06, 21:48:20 @35396 INFO      > checking
July 06, 21:48:20 @35396 INFO     Checking: 	/AudioSet/strong_label_train_padded/mP0bdHdXLyo_1000.wav
July 06, 21:48:20 @35396 INFO     Checking: 	channels:	1
July 06, 21:48:20 @35396 INFO     Checking: 	samplerate:	44100
July 06, 21:48:20 @35396 INFO     Checking: 	precision:	16-bit
July 06, 21:48:20 @35396 INFO     Checking: 	duration:	00:00:10.00
July 06, 21:48:20 @35396 INFO     Finished: 	mP0bdHdXLyo_1000

總體流程

所有代碼都在github.com/km4sh/fetch-audioset這個repo中。