光谱学与光谱分析, 2018, 38 (9): 2763, 网络出版: 2018-10-02  

基于机器学习的玉米单倍体近红外光谱鉴别方法研究

Near Infrared Spectroscopy Analysis Based Machine Learning to Identify Haploids in Maize
作者单位
1 中国农业大学国家玉米改良中心, 教育部玉米育种工程研究中心, 北京 100193
2 中国科学院半导体研究所高速电路与神经网络实验室, 北京 100083
3 北京屯玉种业有限责任公司, 北京 100193
摘要
在玉米单倍体技术中, 单倍体鉴别是非常重要的环节。 该研究对大量玉米单倍体与杂合二倍体的近红外透射光谱进行分析, 以期建立一套在生产上实用的单倍体鉴别模型。 通过采集三组遗传背景不同的玉米单倍体与杂合二倍体籽粒光谱, 进行不同机器学习算法对比, 光谱预处理建模效果比较, 以及分析数据集大小对模型构建的影响。 对比所有单倍体与杂合二倍体的平均光谱, 发现二者在光谱的吸收峰位置基本相同, 但是单倍体的吸光度略高于杂合二倍体, 尤其是在波长940~1 120 nm以及1 180~1 316 nm这两段谱区差异较大。 在构建的几个模型中, 采用偏最小二乘法和神经网络算法的模型单倍体鉴别准确率较高, 分别为93.26%和95.42%。 测试集验证的结果与模型准确率一致, 表明两种算法适宜进行单倍体大规模筛选。 利用偏最小二乘法模型比较了不同光谱预处理方法的模型效果, 发现仅进行移动窗口平滑预处理原始光谱进行建模准确率最高。 对不同大小数据集的建模效果对比发现, 在一定范围内增大数据集有助于提高模型准确率。 而且数据中单倍体所占比例较高时, 单倍体预测召回率可达100%。 此外, 还根据籽粒颜色标记挑选出不易鉴别的单倍体和杂合二倍体, 利用偏最小二乘法构建的机器学习模型预测准确率可达93.39%, 显示出近红外鉴别单倍体的优势, 即有可能在不依赖籽粒颜色的情况下实现准确鉴别。 基于机器学习的近红外单倍体鉴别方法具有较高的准确率, 而且该方法还能在后期数据增加的基础上不断优化, 对其开展理论研究有望为自动化智能鉴别单倍体创造条件。
Abstract
Haploid identification is a very important part of doubled haploid technology in maize. In this reasearch, we studied the near-infrared transmission spectra of a large number seeds of haploids and heterozygous diploids to establish an accurate model for haploid identification. Compared with the average spectrum of all haploids and heterozygous diploids, it was found that the absorption peak position of the two spectra was almost the same, but the haploid absorbance was slightly higher than that of heterozygous diploid, especially at the wavelengths of 940~1 120 and 1 180~1 316 nm which shared larger differences. Based on the near infrared spectra of haploids and heterozygous diploids from three different sourcegermplasm, different machine learning algorithms were called to construct a haploid selection model, accuracy of models developed with different spectral preprocessing methods were compared, and the effects of datasets to model evaluation were also studied. By comparison with several models, the haploid identification accuracy of the partial least squares method and the neural network algorithm reached a high accuracy of 95.42% and 93.26% respectively. The results of the testing set were consistent with the accuracy of the model, indicating that the two algorithms are suitable for large-scale screening of haploids. By using the partial least squares model, the accuracy of the model developed from the spectral preprocessing methods of smoothing was the best. Compared with the modeling results of different data size, it was found that increasing the data set in a certain range could improve the accuracy of the model. And when proportion of haploids was high enough, the recall rate of haploid prediction would reach up to 100%. In addition, haploids and heterozygous diploids which was difficult to be identified by R1-nj color were selected to form a new dataset. The accuracy of the partial least squares method trained by this dataset was 93.39%. This showed the advantages of NIR machine learning method for haploid identification, which could be used to achieve accurate identification in the case independent of R1-nj color expression. The method of NIR haploid identification based on machine learning has high accuracy and efficiency, and the method can be optimized with increasing data. This research paved a way for the intelligent identification of haploid.

李伟, 李金龙, 李卫军, 刘丽威, 李浩光, 陈琛, 陈绍江. 基于机器学习的玉米单倍体近红外光谱鉴别方法研究[J]. 光谱学与光谱分析, 2018, 38(9): 2763. LI Wei, LI Jin-long, LI Wei-jun, LIU Li-wei, LI Hao-guang, CHEN Chen, CHEN Shao-jiang. Near Infrared Spectroscopy Analysis Based Machine Learning to Identify Haploids in Maize[J]. Spectroscopy and Spectral Analysis, 2018, 38(9): 2763.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!