audio/ includes all sound examples for the datasets used in the paper. Some of these sound examples are presented on the accompanying web-page. cfg/ includes configuration files for experiments. src/ ...
In this work, we automatically generate large-scale video question answering data from narrated videos, leverage contrastive learning to train on large vocabularies on answers, and show the first zero ...