用于腹腔镜扶持器控制的特定人语音识别算法

任凯龙; 汪毅; 陈晓冬; 蔡怀宇

doi:doi:10.3788/LOP57.181702

激光与光电子学进展, 2020, 57 (18): 181702, 网络出版: 2020-09-02

用于腹腔镜扶持器控制的特定人语音识别算法下载： 1048次

Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control

任凯龙汪毅 ^*陈晓冬蔡怀宇

作者单位

天津大学精密仪器与光电子工程学院, 天津 300072

医用光学腹腔镜 i-vector 长短时记忆特定人语音识别 medical optics laparoscope i-vector long short-term memory speaker-dependent speech recognition

AI 词云图 AI一句话精读 AI短摘要

注：本部分内容由 AI 自动生成，请您知悉。

摘要

提出了一种基于融合i-vector特征的长短时记忆(LSTM)循环神经网络模型,用于腹腔镜扶持器语音控制,在小训练样本下实现对特定医生语音中的短时、孤立词指令的识别。该模型以LSTM循环神经网络作为基础模型,以梅尔频率倒谱系数(MFCC)作为输入特征参数,将i-vector特征作为LSTM循环神经网络的深层输入信息,与神经网络中LSTM层后的深层特征信息进行拼接,达到参数融合的目的,实现对特定主刀医生语音指令的准确识别以及对非主刀医生语音指令的拒识别,为腹腔镜操作提供安全智能的语音识别方案。使用自建语音库进行实验,分别验证所提算法对训练库内语音的识别性能以及对训练库外语音的拒识别性能。实验结果表明:与动态时间规整算法(DTW)和混合高斯模型-隐马尔可夫模型(GMM-HMM)相比,所提模型在对训练库内特定人语音指令识别正确率高达99.6%的同时保持着错误接受率为0%,对训练库外语音的平均错误接受率为2.5%,满足腹腔镜扶持器控制的准确性和安全性要求。

Abstract

A long short-term memory (LSTM) recurrent neural network based on an i-vector feature is presented for speech control of laparoscopic supporter to realize short-term isolated word command recognition from the speech of a specific doctor using small training samples. In this model, LSTM recurrent neural network is used as the basic model, Mel-frequency cepstrum coefficient (MFCC) is used as the input characteristic parameter, i-vector feature is used as the deep input information of LSTM recurrent neural network, and the deep feature information behind LSTM layer in the neural network is spliced to achieve the purpose of parameter fusion, so as to realize the accurate recognition of the voice instructions of the specific surgeon and the rejection recognition of the voice instructions of the non surgeon. This approach offers a secure and intelligent speech recognition scheme for laparoscopic surgeries. Further, a self-built speech database is used as a training library to verify speech recognition performance of the proposed algorithm as well as its rejection performance for the speech not included in the training library. Experiments show that compared with dynamic time warping(DTW)and Gaussian mixture model-Hidden Markov model (GMM-HMM), the proposed model exhibits a 99.6% correct recognition rate for voice commands of specific people recorded in the training library while maintaining a false acceptance rate of 0%, with an average false acceptance rate of 2.5% for voices not included in the training library. The proposed model meets the requirements of accuracy and safety expected by laparoscopic supporter control standards.

PDF全文

任凯龙, 汪毅, 陈晓冬, 蔡怀宇. 用于腹腔镜扶持器控制的特定人语音识别算法[J]. 激光与光电子学进展, 2020, 57(18): 181702. Kailong Ren, Yi Wang, Xiaodong Chen, Huaiyu Cai. Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control[J]. Laser & Optoelectronics Progress, 2020, 57(18): 181702.

用于腹腔镜扶持器控制的特定人语音识别算法下载： 1048次

关于本站 Cookie 的使用提示

全站搜索

用于腹腔镜扶持器控制的特定人语音识别算法 下载： 1048次

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索

用于腹腔镜扶持器控制的特定人语音识别算法下载： 1048次