半导体光电, 2020, 41 (3): 414, 网络出版: 2020-06-18  

基于时空注意力网络的中国手语识别

Chinese Sign Language Recognition Based on Spatial-Temporal Attention Network
罗元 1,*李丹 1张毅 2
作者单位
1 重庆邮电大学光电工程学院, 重庆 400065
2 重庆邮电大学信息无障碍与服务机器人工程技术研究中心, 重庆 400065
摘要
手语识别广泛应用于聋哑人与正常人之间的交流中。针对手语识别任务中时空特征提取不充分而导致识别率低的问题, 提出了一种新颖的基于时空注意力的手语识别模型。首先提出了基于残差3D卷积网络(Residual 3D Convolutional Neural Network, Res3DCNN)的空间注意力模块, 用来自动关注空间中的显著区域; 随后提出了基于卷积长短时记忆网络(Convolutional Long Short-Term Memory, ConvLSTM)的时间注意力模块, 用来衡量视频帧的重要性。所提算法的关键在于在空间中关注显著区域, 并且在时间上自动选择关键帧。最后, 在CSL手语数据集上验证了算法的有效性。
Abstract
Sign language recognition is widely used in communication between deaf-mute and ordinary people. In adequate extraction of spatial-temporal features in sign language recognition task is likely to result in low recognition rate. In this paper, proposed is a novel sign language recognition model based on spatial-temporal attention which can learn more discriminative spatial-temporal features. Specially, a new spatial attention module based on residual 3D convolutional neural network (Res3DCNN) is proposed, which automatically focus on the salient areas in the spatial region. Then, to measure the importance of video frames, a new temporal attention module based on convolutional long short-term memory (ConvLSTM) is introduced. The crucial purpose of the proposed model is to focus on the salient areas spatially and pay attention to the key video frames temporally. Lastly, experimental results demonstrate the efficiency of the proposed method on the Chinese sign language (CSL) dataset.

罗元, 李丹, 张毅. 基于时空注意力网络的中国手语识别[J]. 半导体光电, 2020, 41(3): 414. LUO Yuan, LI Dan, ZHANG Yi. Chinese Sign Language Recognition Based on Spatial-Temporal Attention Network[J]. Semiconductor Optoelectronics, 2020, 41(3): 414.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!