首页 > 论文 > 激光与光电子学进展 > 57卷 > 18期(pp:181702--1)


Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文




A long short-term memory (LSTM) recurrent neural network based on an i-vector feature is presented for speech control of laparoscopic supporter to realize short-term isolated word command recognition from the speech of a specific doctor using small training samples. In this model, LSTM recurrent neural network is used as the basic model, Mel-frequency cepstrum coefficient (MFCC) is used as the input characteristic parameter, i-vector feature is used as the deep input information of LSTM recurrent neural network, and the deep feature information behind LSTM layer in the neural network is spliced to achieve the purpose of parameter fusion, so as to realize the accurate recognition of the voice instructions of the specific surgeon and the rejection recognition of the voice instructions of the non surgeon. This approach offers a secure and intelligent speech recognition scheme for laparoscopic surgeries. Further, a self-built speech database is used as a training library to verify speech recognition performance of the proposed algorithm as well as its rejection performance for the speech not included in the training library. Experiments show that compared with dynamic time warping(DTW)and Gaussian mixture model-Hidden Markov model (GMM-HMM), the proposed model exhibits a 99.6% correct recognition rate for voice commands of specific people recorded in the training library while maintaining a false acceptance rate of 0%, with an average false acceptance rate of 2.5% for voices not included in the training library. The proposed model meets the requirements of accuracy and safety expected by laparoscopic supporter control standards.

广告组1 - 空间光调制器+DMD







作者单位    点击查看

任凯龙:天津大学精密仪器与光电子工程学院, 天津 300072
汪毅:天津大学精密仪器与光电子工程学院, 天津 300072
陈晓冬:天津大学精密仪器与光电子工程学院, 天津 300072
蔡怀宇:天津大学精密仪器与光电子工程学院, 天津 300072


【1】Abdulla W H, Chow D, Sin G. Cross-words reference template for DTW-based speech recognition systems[C]∥2003 Conference on Convergent Technologies for Asia-Pacific Region. 15-17 Oct. 2003, Bangalore, India. New York: , 2003, 1576-1579.

【2】Zhao X, Chen X D, Chang X, et al. Parameter extraction and enhancing method for mixed phonetic features based on multi-fisher criterion [J]. Nanotechnology and Precision Engineering. 2017, 15(4): 317-322.
赵鑫, 陈晓冬, 常昕, 等. 基于Multi-Fisher准则的语音混合特征提取和特征增强方法 [J]. 纳米技术与精密工程. 2017, 15(4): 317-322.

【3】Sak H, Senior A, Beanfays F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling . [C]∥ 2014 Proceedings of Annual Conference of International Speech Communication Association. [S.l.:s.n.]. 2014, 338-342.

【4】AAbdel-Hamid O, Mohamed A R, Jiang H, et al. Convolutional neural networks for speech recognition [J]. ACM Transactions on Audio, Speech, and Language Processing. 2014, 22(10): 1533-1545.

【5】Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks[C]∥2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 26-31 May 2013, Vancouver, BC, Canada. New York: , 2013, 6645-6649.

【6】Dehak N, Kenny P J, Dehak R, et al. Front-end factor analysis for speaker verification [J]. IEEE Transactions on Audio, Speech, and Language Processing. 2011, 19(4): 788-798.

【7】Variani E, Lei X. McDermott E, et al. Deep neural networks for small footprint text-dependent speaker verification[C]∥2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4-9 May 2014, Florence, Italy. New York: , 2014, 4052-4056.

【8】Li Y X, Zhang J Q, Pan D, et al. A study of speech recognition based on RNN-RBM language model [J]. Journal of Computer Research and Development. 2014, 51(9): 1936-1944.
黎亚雄, 张坚强, 潘登, 等. 基于RNN-RBM语言模型的语音识别研究 [J]. 计算机研究与发展. 2014, 51(9): 1936-1944.

【9】Yang H J, Yan Z, Wu Z L, et al. Extraction method of interest text in image based on recurrent neural network [J]. Laser & Optoelectronics Progress. 2019, 56(24): 241501.
杨恒杰, 闫铮, 邬宗玲, 等. 基于循环神经网络的图像特定文本抽取方法 [J]. 激光与光电子学进展. 2019, 56(24): 241501.

【10】Li J Y, Yu D, Huang J T, et al. Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM[C]∥2012 IEEE Spoken Language Technology Workshop (SLT). 2-5 Dec. 2012, Miami, FL, USA. New York: , 2012, 131-136.

【11】Chen H K, Chen Y. Speaker identification based on multimodal long short-term memory with depth-gate [J]. Laser & Optoelectronics Progress. 2019, 56(3): 031007.
陈湟康, 陈莹. 基于具有深度门的多模态长短期记忆网络的说话人识别 [J]. 激光与光电子学进展. 2019, 56(3): 031007.

【12】Yao Y S. 04874 [2020-03-05]. 2016-02-16) https:∥arxiv. 1602, org/abs/1602: 04874.

【13】Scheffer N, Bonastre J F. UBM-GMM driven discriminative approach for speaker verification[C]∥2006 IEEE Odyssey - the Speaker and Language Recognition Workshop. 28-30 June 2006, San Juan, Puerto Rico. New York: , 2006, 1-7.

【14】Snyder D, Garcia-Romero D, Povey D. Time delay deep neural network-based universal background models for speaker recognition[C]∥2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). 13-17 Dec. 2015, Scottsdale, AZ, USA. New York: , 2015, 92-97.

【15】Li P, Zhang Y. Video smoke detection based on Gaussian mixture model and convolutional neural network [J]. Laser & Optoelectronics Progress. 2019, 56(21): 211502.
李鹏, 张炎. 基于高斯混合模型和卷积神经网络的视频烟雾检测 [J]. 激光与光电子学进展. 2019, 56(21): 211502.

【16】Garcia-Romero D. Espy-Wilson C Y. Analysis of i-vector length normalization in speaker recognition systems . [C]∥ Proceedings of the Annual Conference of the International Speech Communication Association. Florence, Italy:[s.n.]. 2011, 249-252.

【17】Kenny P, Boulianne G, Ouellet P, et al. Joint factor analysis versus eigenchannels in speaker recognition [J]. IEEE Transactions on Audio, Speech, and Language Processing. 2007, 15(4): 1435-1447.

【18】Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data [J]. IEEE Transactions on Speech and Audio Processing. 2005, 13(3): 345-354.

【19】Kenny P, Ouellet P, Dehak N, et al. A study of interspeaker variability in speaker verification [J]. IEEE Transactions on Audio, Speech, and Language Processing. 2008, 16(5): 980-988.

【20】Gupta V, Kenny P, Ouellet P, et al. I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription[C]∥2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4-9 May 2014, Florence, Italy. New York: , 2014, 6334-6338.

【21】Li Z Y, Zhang W Q, He L, et al. Total variability subspace adaptation based speaker recognition [J]. Acta Automatica Sinica. 2014, 40(8): 1836-1840.
栗志意, 张卫强, 何亮, 等. 基于总体变化子空间自适应的i-vector说话人识别系统研究 [J]. 自动化学报. 2014, 40(8): 1836-1840.

【22】Zhang J C, Inoue N. 00290 [2020-03-05]. 2018-04-01) https:∥arxiv.org/abs/1804.00290v1. 1804.

【23】Glembek O, Burget L, Matějka P, et al. Simplification and optimization of i-vector extraction[C]∥2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New York: , 2011, 12176147.

【24】Chakroborty S, Saha G. Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter [J]. International Journal of Signal Processing. 2009, 5(1): 11-19.

【25】Murty K S R, Yegnanarayana B. Combining evidence from residual phase and MFCC features for speaker recognition [J]. IEEE Signal Processing Letters. 2006, 13(1): 52-55.

【26】Ai O C, Hariharan M, Yaacob S, et al. Classification of speech dysfluencies with MFCC and LPCC features [J]. Expert Systems with Applications. 2012, 39(2): 2157-2165.

【27】Huang G X, Tian Y, Kang J, et al. Long short term memory recurrent neural network acoustic models using i-vector for low resource speech recognition [J]. Application Research of Computers. 2017, 34(2): 392-396.
黄光许, 田垚, 康健, 等. 低资源条件下基于i-vector特征的LSTM递归神经网络语音识别系统 [J]. 计算机应用研究. 2017, 34(2): 392-396.


Ren Kailong,Wang Yi,Chen Xiaodong,Cai Huaiyu. Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181702

任凯龙,汪毅,陈晓冬,蔡怀宇. 用于腹腔镜扶持器控制的特定人语音识别算法[J]. 激光与光电子学进展, 2020, 57(18): 181702

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF