液晶与显示, 2018, 33 (2): 165, 网络出版: 2018-03-21   

基于量子粒子群优化广义回归神经网络的语音转换方法

Voice conversion based on quantum particle swarm optimization of generalized regression neural network
作者单位
西安建筑科技大学 信息与控制工程学院, 陕西 西安710055
摘要
针对粒子群算法优化神经网络进行语音转换时容易产生收敛速度慢、早熟的问题,本文采用一种新的量子粒子群算法优化广义回归神经网络的语音转换模型。该量子粒子群通过改变量子比特相位进而改变位置矢量, 并利用量子非门进行变异操作。因此首先利用量子粒子群对网络进行优化得到最佳的光滑因子参数, 从而建立频谱映射规则。接着, 利用频谱参数和基频参数的相关性, 对韵律特征基频也进行转换。然后, 联立转换后的频谱参数和基频参数, 利用STRAIGHT模型合成目标语音。最后, 采用主观和客观测评方式进行评价。实验结果表明, 与传统粒子群算法优化广义回归神经网络相比, 该方法转换后的语音自然度和相似度得到提升, 谱失真率下降2.1%。本文方法具有比径向基神经网络、广义回归神经网络、粒子群算法优化广义回归神经网络更好的语音转换性能。
Abstract
In this paper, a new quantum particle swarm optimization algorithm is used to optimize the voice conversion model of generalized regression neural network in order to solve the problem of slow convergence and premature phenomenon in particle swarm optimization. The quantum particle swarm optimization algorithm changes the position vector by changing the quantum bit phase and uses the quantum non-gate to perform the mutation operation. Therefore, we first use the quantum particle swarm to optimize the network to get the best smooth factor parameters, so as to establish spectrum mapping rules. After that, we use the correlation between the spectral parameters and the fundamental frequency parameters to convert the prosodic characteristic fundamental frequency. Then, the STRAIGHT model is used to synthesize the target voice in conjunction with the converted spectral parameters and the fundamental frequency parameters. Finally, we use the subjective and objective evaluation methods to evaluate. The experimental results show that the natural and similarity of the proposed method for the transformed voice are improved and the spectral distortion rate is reduced by 2.1% compared with the traditional particle swarm optimization algorithm. The proposed method has better voice conversion performance than radial basis function neural network, generalized regression neural network and generalized regression neural network optimized by particle swarm optimization.
参考文献

[1] GHORBANDOOST M, SAYADIYAN A, AHANGAR M, et al. Voice conversion based on feature combination with limited training data [J]. Speech Communication, 2015, 67: 113-128.

[2] GODOYE, ROSEC O, CHONAVEL T. Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4): 1313-1323.

[3] LIL F, ZHAO Y, JIANG D M, et al. Hybrid deep neural network-hidden markov model (DNN-HMM) based speech emotion recognition [C]//Proceedings of 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. Geneva, Switzerland: IEEE, 2013: 312-317.

[4] NAKAMURA K, TODA T, SARUWATARI H, et al. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech [J]. Speech Communication, 2012, 54(1): 134-146.

[5] BHARTI K S, KOOLAGUDI S G, RAO K S, et al. Voice conversion using linear prediction coefficients and artificial neural network [C]//Proceedings of the Cube International Information Technology Conference. Pune, India: ACM, 2012: 240-245.

[6] NIRMAL J, ZAVERI M, PATNAIK S, et al. Voice conversion using general regression neural network [J]. Applied Soft Computing, 2014, 24: 1-12.

[7] 杨秀峰.基于神经网络的语音转换算法研究[D].西安:西安建筑科技大学, 2017.

    YANG X F. The research of voice conversion based on neural network [D]. Xi’an: Xi’an University of Architecture and Technology, 2017. (in Chinese)

[8] 王洪涛, 李丹.基于改进粒子群算法的图像边缘检测研究[J].液晶与显示, 2014, 29(5):800-804.

    WANG H T, LI D. Image edge detection research based on improved particle swarm optimization algorithm [J]. Chinese Journal of Liquid Crystals and Displays, 2014, 29(5): 800-804. (in Chinese).

[9] 李士勇, 李盼池.求解连续空间优化问题的量子粒子群算法[J].量子电子学报, 2007, 24(5):569-574.

    LI S Y, LI P C. Quantum particle swarms algorithm for continuous space optimization [J]. Chinese Journal of Quantum Electronics, 2007, 24(5): 569-574. (in Chinese)

[10] KENNEDYJ, EBERHART R. Particle swarm optimization [C]//Proceedings of 1995 IEEE International Conference on Neural Networks. Perth, WA, Australia: IEEE, 1995: 1942-1948.

[11] 张玲华, 姚绍芹, 解伟超.基于自适应粒子群优化径向基函数神经网络的语音转换[J].数据采集与处理, 2015, 30(2):336-343.

    ZHANG L H, YAO S Q, XIE W C. Voice conversion based on adaptive particle swarm optimization radial basis function neural network [J]. Journal of Data Acquisition and Processing, 2015, 30(2): 336-343. (in Chinese)

[12] 张国梁, 贾松敏, 张祥银, 等.采用自适应变异粒子群优化SVM的行为识别[J].光学 精密工程, 2017, 25(6):1669-1678.

    ZHANG G L, JIA S M, ZHANG X Y, et al. Action recognition based on adaptive mutation particle swarm optimization for SVM [J]. Optics and Precision Engineering, 2017, 25(6): 1669-1678. (in Chinese)

[13] 张志成, 林君, 石要武, 等.用加权子空间拟合和量子粒子群算法联合估计多普勒频率和波达方向[J].光学 精密工程, 2013, 21(9):2445-2451.

    ZHANG Z C, LIN J, SHI Y W, et al. Joint estimation of Dopplers and DOAs by WSF-QPSO method [J]. Optics and Precision Engineering, 2013, 21(9): 2445-2451. (in Chinese)

[14] 解伟超.语音转换中声道谱参数和基频变换算法的研究[D].南京:南京邮电大学, 2013.

    XIE W C. The research on vocal tract spectrum and pitch frequency transformation in voice conversion [D]. Nanjing: Nanjing University of Posts and Telecommunications, 2013. (in Chinese)

[15] SHAO X, MILNER B. Pitch prediction from MFCC vectors for speech reconstruction [C]//Proceedings of 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada: IEEE, 2004: 97-100.

王民, 赵渊, 刘利, 许娟. 基于量子粒子群优化广义回归神经网络的语音转换方法[J]. 液晶与显示, 2018, 33(2): 165. WANG Min, ZHAO Yuan, LIU Li, XU Juan. Voice conversion based on quantum particle swarm optimization of generalized regression neural network[J]. Chinese Journal of Liquid Crystals and Displays, 2018, 33(2): 165.

本文已被 1 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!