光谱学与光谱分析, 2018, 38 (6): 1766, 网络出版: 2018-06-29   

随机森林算法的水果糖分近红外光谱测量

Fast Measurement of Sugar in Fruits Using Near Infrared Spectroscopy Combined with Random Forest Algorithm
作者单位
1 太原理工大学, 山西 太原 030024
2 北京农业智能装备技术研究中心, 北京 100097
3 国家农业智能装备工程技术研究中心, 北京 100097
摘要
近年来, 有关水果糖分等内部品质的近红外光谱测量方法研究很多, 并有部分商业化仪器问世。 但由于近红外光谱复杂多变, 模型的传递性较差, 往往所建模型只能针对特定品种甚至特定产地的水果。 随机森林(RF)是一种基于决策树的集成算法, 通过对分类回归树(CART)模型的集成来提高预测精度。 相对于偏最小二乘法(PLS), 多元线型回归法(MLR)等方法, 随机森林回归方法对非线性数据的解析能力较强。 考虑到RF模型的随机性, 通过调试决策树数量(ntree)和分裂变量数目(mtry)等变量来进行模型优选。 尝试使用随机森林对不同种类的水果(苹果、 梨)糖分进行预测。 实验表明, 对于同一种类的水果, 随机森林和PLS的建模和预测结果均较好。 但对于不同种类的水果, 随机森林明显增加了模型的预测能力, 将建模R2由PLS的0.878提高到了0.999, 将建模的RMSEC由0.453降低到了0.015。 经过独立的预测集样品对最优RF模型进行检验, 预测R2由PLS的0.731提高到为0.888, 预测RMSEP由1.148降低到0.334。 随机森林在对多种水果糖分预测时, 具有明显的优势。 这一研究证明了随机森林有望应用于多种水果糖分的近红外光谱测定, 进而解决模型的普适性和传递性问题。
Abstract
In recent years, many researchers have studied the measurement methods of fruit sugar and other internal quality by near-infrared (NIR) spectroscopy and some commercial instruments have been produced. However, due to the complexity of the NIR spectra, the transitivity of the models established with NIR is often poorly performed. The model is only built for a particular species or even a certain variety. Random forest (RF) is an integrated algorithm based on decision tree, which improves the prediction accuracy by integrating the classification regression tree (CART) model. Compared with partial least squares (PLS), multiple linear regression (MLR) and other methods, RF algorithm has the strong analytical ability of nonlinear data. Taking into account the randomness of the RF model, the model is optimized by debugging the number of decision tree (ntree) and the number of split variables (mtry). In this study, we used RF to predict the sugar content in different types of fruits (apple and pear). Experimental results showed that for the same kind of fruit, the modeling and predicting results of RF and PLS were better. However, for different types of fruits, RF significantly increased the prediction ability of the model. The R2 of PLS model was 0.878 and the R2 of RF model was increased to 0.999. The RMSEC of PLS model and RF model were respectively 0.453 and 0.015. In addition, the optimal RF model was tested by independent test set samples, the R2 of PLS model was 0.731 and the R2 of RF model was increased to 0.888. The RMSEC of PLS model and RF model were respectively 1.148 and 0.334. RF showed a significant advantage in predicting a variety of fruit sugar. This research proved that the RF method could be applied to detect the sugar content in fruits by NIR spectroscopy, thus solving the model problem of universality and transitivity.

李盛芳, 贾敏智, 董大明. 随机森林算法的水果糖分近红外光谱测量[J]. 光谱学与光谱分析, 2018, 38(6): 1766. LI Sheng-fang, JIA Min-zhi, DONG Da-ming. Fast Measurement of Sugar in Fruits Using Near Infrared Spectroscopy Combined with Random Forest Algorithm[J]. Spectroscopy and Spectral Analysis, 2018, 38(6): 1766.

本文已被 2 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!