VGGSound

Code and results for ICASSP2020 "VGGSound: A Large-scale Audio-Visual Dataset".

The repo contains the dataset file and our best audio classification model.

Dataset

To download VGGSound, we provide a csv file. For each YouTube video, we provide YouTube URLs, time stamps, audio labels and train/test split. Each line in the csv file has columns defined by here.

# YouTube ID, start seconds, label,train/test split.

A helpful link for data download!

Audio classification

We detail the audio classfication results here.

Pretrain refers whether the model was pretrained on YouTube-8M dataset.
Dataset (common) means it is a subset of the dataset. This subset only contains data of common classes (listed here) between AudioSet and VGGSound.
ASTest is the intersection of AudioSet and VGGSound testsets.

	Model	Aggregation	Pretrain	Finetune/Train	Test	mAP	AUC	d-prime
A	VGGish	\	✔️	AudioSet (common)	ASTest	0.286	0.899	1.803
B	VGGish	\	✔️	VGGSound (common)	ASTest	0.326	0.916	1.950
C	VGGish	\	❌	VGGSound (common)	ASTest	0.301	0.910	1.900
D	ResNet18	AveragePool	❌	VGGSound (common)	ASTest	0.328	0.923	2.024
E	ResNet18	NetVLAD	❌	VGGSound (common)	ASTest	0.369	0.927	2.058
F	ResNet18	AveragePool	❌	VGGSound	ASTest	0.404	0.944	2.253
G	ResNet18	NetVLAD	❌	VGGSound	ASTest	0.434	0.950	2.327
H	ResNet18	AveragePool	❌	VGGSound	VGGSound	0.516	0.968	2.627
I	ResNet18	NetVLAD	❌	VGGSound	VGGSound	0.512	0.970	2.660
J	ResNet34	AveragePool	❌	VGGSound	ASTest	0.409	0.947	2.292
K	ResNet34	AveragePool	❌	VGGSound	VGGSound	0.529	0.972	2.703
L	ResNet50	AveragePool	❌	VGGSound	ASTest	0.412	0.949	2.309
M	ResNet50	AveragePool	❌	VGGSound	VGGSound	0.532	0.973	2.735

Environment

Python 3.6.8
Pytorch 1.3.0

Pretrained model and evaluation

We provide the pretrained models H an I here,

wget http://www.robots.ox.ac.uk/~vgg/data/vggsound/models/H.pth.tar
wget http://www.robots.ox.ac.uk/~vgg/data/vggsound/models/I.pth.tar

To test the model and generate prediction files,

python test.py --data_path "directory to audios/" --result_path "directory to predictions/" --summaries "path to pretrained models" --pool "avgpool"

To evaluate the model performance using the generated prediction files,

python eval.py --result_path "directory to predictions/"

Citation

@InProceedings{Chen20,
  author       = "Honglie Chen and Weidi Xie and Andrea Vedaldi and Andrew Zisserman",
  title        = "VGGSound: A Large-scale Audio-Visual Dataset",
  booktitle    = "International Conference on Acoustics, Speech, and Signal Processing (ICASSP)",
  year         = "2020",
}

License

The VGG-Sound dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.

Name	Name	Last commit message	Last commit date
Latest commit History 22 Commits
data	data
datasets	datasets
example_audio	example_audio
models	models
LICENCE.txt	LICENCE.txt
README.html	README.html
README.md	README.md
eval.py	eval.py
model.py	model.py
preprocess_audio.py	preprocess_audio.py
test.py	test.py
utils.py	utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VGGSound

Dataset

Audio classification

Environment

Pretrained model and evaluation

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Search code, repositories, users, issues, pull requests...

License

hche11/VGGSound

Folders and files

Latest commit

History

Repository files navigation

VGGSound

Dataset

Audio classification

Environment

Pretrained model and evaluation

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages