光电工程, 2020, 47 (2): 190139, 网络出版: 2020-03-06   

高效 3D密集残差网络及其在人体行为识别中的应用

Efficient 3D dense residual network and its application in human action recognition
作者单位
上海理工大学光电信息与计算机工程学院, 上海 200093
摘要
针对 3D-CNN能够较好地提取视频中时空特征但对计算量和内存要求很高的问题, 本文设计了高效 3D卷积块替换原来计算量大的 3×3×3卷积层, 进而提出了一种融合 3D卷积块的密集残差网络 (3D-EDRNs)用于人体行为识别。高效 3D卷积块由获取视频空间特征的 1×3×3卷积层和获取视频时间特征的 3×1×1卷积层组合而成。将高效 3D卷积块组合在密集残差网络的多个位置中, 不但利用了残差块易于优化和密集连接网络特征复用等优点, 而且能够缩短训练时间, 提高网络的时空特征提取效率和性能。在经典数据集 UCF101、HMDB51和动态多视角复杂 3D人体行为数据库(DMV action3D)上验证了结合 3D卷积块的 3D-EDRNs能够显著降低模型复杂度, 有效提高网络的分类性能, 同时具有计算资源需求少、参数量小和训练时间短等优点。
Abstract
In view of the problem that 3D-CNN can better extract the spatio-temporalfeatures in video, but it requiresa high amount of computation and memory, this paper designs an efficient 3D convolutional block to replace the 3×3×3 convolutional layer with a high amount of computation, and then proposes a 3D-efficient dense residual networks (3D-EDRNs) integrating 3D convolutional blocks for human action recognition. The efficient 3D convolu-tional block is composed of 1×3×3 convolutional layers for obtaining spatial features of video and 3×1×1 convolu-tional layers for obtaining temporal features of video. Efficient 3D convolutional blocks are combined in multiple lo-cations of dense residual network, which not only takes advantage of the advantages of easy optimization of residual blocks and feature reuse of dense connected network, but also can shorten the training time and improve the effi-ciency and performance of spatial-temporal feature extraction of the network. In the classical data set UCF101, HMDB51 and the dynamic multi-view complicated 3D database of human activity (DMV action3D), it is verified that the 3D-EDRNs combined with 3D convolutional block can significantly reduce the complexity of the model, effec-tively improve the classification performance of the network, and have the advantages of less computational re-source demand, small number of parameters and short training time.

李梁华, 王永雄. 高效 3D密集残差网络及其在人体行为识别中的应用[J]. 光电工程, 2020, 47(2): 190139. Li Lianghua, Wang Yongxiong. Efficient 3D dense residual network and its application in human action recognition[J]. Opto-Electronic Engineering, 2020, 47(2): 190139.

本文已被 1 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!