Deep Learning for Video Analysis
Deep Learning for Video Classification and Captioning
https://arxiv.org/pdf/1609.06782.pdf
Large-scale Video Classification with Convolutional Neural Networks
https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/42455.pdf
Learning Spatiotemporal Features with 3D Convolutional Networks
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Tran_Learning_Spatiotemporal_Features_ICCV_2015_paper.pdf
Two-Stream Convolutional Networks for Action Recognition in Videos
https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Wang_Action_Recognition_With_2015_CVPR_paper.pdf
AENet: Learning Deep Audio Features for Video Analysis
https://arxiv.org/pdf/1701.00599.pdf
https://arxiv.org/pdf/1705.08168.pdf
https://arxiv.org/pdf/1712.06651
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
https://arxiv.org/pdf/1804.02516.pdf
Learning to Separate Object Sounds by Watching Unlabeled Video
https://arxiv.org/pdf/1804.01665.pdf