光学 精密工程, 2020, 28 (5): 1177, 网络出版: 2020-11-06  

多模态特征融合与多任务学习的特种视频分类

Special video classification based on multitask learning and multimodal feature fusion
作者单位
1 中国传媒大学 信息与通信工程学院, 北京 100024
2 清华大学 电子工程系, 北京 100084
摘要
特种视频(本文特指暴力视频)的智能分类技术有助于实现网络信息内容安全的智能监控。针对现有特种视频多模态特征融合时未考虑语义一致性等问题, 本文提出了一种基于音视频多模态特征融合与多任务学习的特种视频识别方法。首先, 提取特种视频的表观信息和运动信息随时空变化的视觉语义特征及音频信息语义特征; 然后, 构建具有语义保持的共享特征子空间, 以实现音视频多种模态特征的融合; 最后, 提出基于音视频特征的语义一致性度量和特种视频分类的多任务学习特种视频分类理论框架, 设计了对应的损失函数, 实现了端到端的特种视频智能识别。实验结果表明, 本文提出的算法在Violent Flow和MediaEval VSD 2015两个数据集上平均精度分别为97.97%和39.76%, 优于已有研究。结果证明了该算法的有效性, 有助于提升特种视频监控的智能化水平。
Abstract
Classification of special videos is significant for intelligent surveillance of internet content. Existing algorithms that fuse multimodal features forclassification of special videoscannot measure multimodal audio-visual semantic correspondence.An algorithm for recognizing special videos based on multimodal audio-visual feature fusion was proposed herein over the framework of multitask learning. First, audio semantic features and spatial-temporal visual semantic cues, including appearance and motion, were extracted. A latent subspace to fuse audio and visual features whilst preserving their semantic information was learned and developed through jointly learning audio-visual semantic correspondence and special video classification. Subsequently, a multitask learning loss function was presented viacombination of the correspondence loss, obtained based on the measured audio-visual semantic information,and the cross-entropy loss of special video classification. Finally, an end-to-end intelligent system for special video recognition was implemented. Experimental results demonstrate that the accuracy of the proposed algorithm is 97.97% with respect to the Violent Flow dataset, and the average accuracy is 39.76% with respect to the Media Eval VSD 2015 dataset, where by the algorithm outperforms the other existing methods. These results show that the proposed algorithm is effective for improving the intelligence of network content surveillance.

吴晓雨, 顾超男, 王生进. 多模态特征融合与多任务学习的特种视频分类[J]. 光学 精密工程, 2020, 28(5): 1177. WU Xiao-yu, GU Chao-nan, WANG Sheng-jin. Special video classification based on multitask learning and multimodal feature fusion[J]. Optics and Precision Engineering, 2020, 28(5): 1177.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!