光学 精密工程, 2020, 28 (5): 1177, 网络出版: 2020-11-06  

多模态特征融合与多任务学习的特种视频分类

Special video classification based on multitask learning and multimodal feature fusion
作者单位
1 中国传媒大学 信息与通信工程学院, 北京 100024
2 清华大学 电子工程系, 北京 100084
引用该论文

吴晓雨, 顾超男, 王生进. 多模态特征融合与多任务学习的特种视频分类[J]. 光学 精密工程, 2020, 28(5): 1177.

WU Xiao-yu, GU Chao-nan, WANG Sheng-jin. Special video classification based on multitask learning and multimodal feature fusion[J]. Optics and Precision Engineering, 2020, 28(5): 1177.

参考文献

[1] 马晓晨,韦世奎,蒋翔,等.基于相机溯源的潜在不良视频通话预警 [J].光学 精密工程, 2018, 26 (11): 2785-2794.

    MA X CH, WEI SH K, JIANG X, et al.. Early warning of illegal video chats based on camera source identification [J]. Opt. Precision Eng., 2018, 26 (11): 2785-2794. (in Chinese)

[2] CLAIRE H D, CEDRIC P, MOHAMMAD S, et al.. VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation [J]. Multimedia Tools and Applications, 2014, 74 (17): 7379-7404.

[3] MOREIRA D, AVILA S , PEREZ M , et al.. Multimodal data fusion for sensitive scene localization [J]. Information Fusion, 2019 (45): 307-323.

[4] WANG H M, YANG L, WU X Y, et al.. A review of bloody violence in video classification [C]. International Conference on the Frontiers & Advances in Data Science, 2017: 86-91.

[5] YI Y, WANG H, ZHANG B, et al.. MIC-TJU at affective impact of movies task [C]. MediaEval Workshop, 2015, 7.

[6] LAM, LE S P, DO T, et al.. Computational optimization for violent scenes detection [C]. International Conference on Computer, Control, Informatics and its Applications, 2016: 141-146.

[7] DAI Q, ZHAO R, WU Z, et al.. Fudan-Huawei at mediaeval 2015: Detecting violent scenes and affective impact in movies with deep learning [C]. MediaEval Workshop, 2015, 5.

[8] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos [C]. NeurIPS, 2014: 568-576.

[9] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks [C]. NeurIPS, 2014: 3104-3112.

[10] SWATHIKIRAN S, OSWALD L. Learning to detect violent videos using convolutional long short-term memory [C]. IEEE International Conference on Advanced Video and Signal Based Surveillance, 2017: 1-6.

[11] BALTRUSAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019 41(2): 423-443.

[12] 崔鑫, 彭宗举, 陈芬. 联合多特征的未来视频快速编码 [J]. 光学 精密工程, 2019 , 27 (4): 990-999.

    CUI X, PENG Z J, CHEN F. Joint Multi-feature fast coding for future video coding [J]. Opt. Precision Eng., 2019, 27 (4): 990-999. (in Chinese)

[13] WU Z , JIANG Y G , WANG X , et al.. Multi-Stream multi-class fusion of deep networks for video classification [C]. ACM International Conference on Multimedia, 2016: 791-800.

[14] 潘仙张,张石清, 郭文平. 多模深度卷积神经网络应用于视频表情识别 [J].光学 精密工程, 2019 , 27 (4): 963-970.

    PAN X ZH, ZHANG SH Q, GUO W P. Video-based facial expression recognition using multimodal deep convolutional neural networks [J]. Opt. Precision Eng., 2019, 27 (4): 963-970. (in Chinese)

[15] ATREY P K, HOSSAIN M A, SADDIK A E, et al.. Multimodal fusion for multimedia analysis: a survey [J]. Multimedia Systems, 2010, 16(6): 345-379.

[16] QIU Z, YAO T, TAO M. Learning spatial-temporal representation with pseudo-3d residual networks [C]. IEEE International Conference on Computer Vision, 2017: 5534-5542.

[17] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition A new model and the kinetics dataset [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6299-6308.

[18] HERSHEY S, CHAUDHURI S, ELLIS D P W, et al.. CNN architectures for large-scale audio classification [C]. International Conference on Acoustics, Speech and Signal Processing, 2017: 131-135.

[19] WU Z, JIANG Y G, WANG J, et al.. Exploring inter-feature and inter-class relationships with deep neural networks for video classification [C]. ACM International Conference on Multimedia, 2014: 167-176.

[20] HASSNER T, ITCHER Y, KLIPER C O. Violent flows: Real-time detection of violent crowd behavior [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2012: 1-6

[21] BILINSKI P, BREMOND F. Human violence recognition and detection in surveillance videos [C]. IEEE International Conference on Advanced Video and Signal Based Surveillance, 2016: 30-36.

[22] ZHANG T, JIA W, HE X, et al.. Discriminative dictionary learning with motion weber local descriptor for violence detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(3): 696-709.

[23] MATS S, YOANN B, HANLI W, et al.. The mediaeval 2015 affective impact of movies task [C]. MediaEval Workshop, 2015: 1.

[24] ESRA A, FRANK H, SAHIN A. Breaking down violence detection: combining divide-et-impera and coarse-to-fine strategies [J]. Neurocomputing, 2016, 208: 225-237.

吴晓雨, 顾超男, 王生进. 多模态特征融合与多任务学习的特种视频分类[J]. 光学 精密工程, 2020, 28(5): 1177. WU Xiao-yu, GU Chao-nan, WANG Sheng-jin. Special video classification based on multitask learning and multimodal feature fusion[J]. Optics and Precision Engineering, 2020, 28(5): 1177.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!