光谱学与光谱分析, 2019, 39 (7): 2271, 网络出版: 2019-07-23   

基于高光谱成像技术的不同产地小米判别分析

Discriminant Analysis of Millet from Different Origins Based on Hyperspectral Imaging Technology
作者单位
1 中国农业大学, 现代精细农业系统集成研究教育部重点实验室, 北京 100083
2 中国农业大学, 农业部农业信息获取技术重点实验室, 北京 100083
3 中国农业大学理学院, 北京 100083
摘要
高光谱成像技术被广泛应用于农产品的检测。 基于高光谱成像技术结合机器学习算法无损鉴别不同地区的小米样本。 将来源7个省份共计23份样品的小米样本根据地理区域划分为东北地区、 河北、 陕西、 山东和山西共5大类, 其中东北地区共6份样品, 山西地区5份样品, 河北、 陕西和山东各4份样品。 将每份样品均分为10等份并利用高光谱成像仪采集900~1 700 nm波段内小米的高光谱数据。 为了减少光照不均匀和暗电流对实验的影响, 对采集到的高光谱数据进行黑白校正。 利用ENVI软件选取小米高光谱图像的感兴趣区域(ROI), 每份小米样品选取9个ROI。 计算ROI内的平均光谱值, 以此平均值作为该样本的一条光谱记录, 最后共收集到2 070条光谱曲线, 其中东北类540条, 山西类450条, 其他河北类、 山东类、 陕西类各360条。 为了减少样品表面的不平整性引起的散射现象, 进而影响小米的真实光谱信息, 对收集到的原始光谱进行多元散射校正预处理(MSC)。 采用随机划分法对校正过后的光谱数据划分训练集和测试集, 测试集占的比例为0.3。 利用线性判别分析(LDA)对不同产地小米的光谱数据进行可视化分析, 将测试集代入训练好的LDA模型, 做出预测结果的混淆矩阵(Confusion Matrix), 结果表明LDA对于陕西和山西类的预测准确率为0.84和0.99, 对于东北、 河北和山东的预测准确率仅为0.68, 0.68和0.40。 进而采用递归特征消除(RFE)对小米的光谱信息进行特征选择, 去除冗余的信息, 提高模型的预测准确率。 将RFE分别与支持向量机(SVM)和逻辑回归(LR)结合, 对不同产地小米的判别进行对比分析。 将小米光谱数据的训练集分别代入SVM-RFE和LR-RFE模型并结合3折交叉验证技术, 以模型F值的微平均(Micro-averaging)最优选择出相应的特征子集。 结果表明, LR-RFE选择的波长数为74个, 其模型的Micro_F为0.59; SVM-RFE选择的波长数为220, 其模型的Micro_F为0.66。 将选择后的特征子集应用到测试集并将测试集分别代入SVM和LR模型, 采用模型预测结果的混淆矩阵和模型的受试者工作特征曲线(ROC)作为评价方法。 结果表明SVM-RFE对东北地区、 河北、 陕西、 山东和山西的预测准确率分别为1, 0.37, 0.72, 0和1, 其ROC曲线下面积(AUC)分别为0.82, 0.92, 0.93, 0.70和0.99。 LR-RFE的预测准确率分别为0.92, 0, 0.97, 0和0.80, 其AUC分别为0.72, 0.74, 0.94, 0.66和0.88。 从预测结果可以看出SVM-RFE模型的综合分类性能优于LR-RFE, 而对陕西类的判别LR-RFE要优于SVM-RFE, 对于河北类和山东类两个模型都不能有效判别。 这两个模型的预测准确率相比LDA有了一定的提升。
Abstract
Hyperspectral imaging technology has been widely used in the detection of agricultural products. This paper studies the non-destructive identification of millet samples from different regions based on hyperspectral imaging and machine learning algorithms. The millet samples from seven provinces were divided into five categories according to geographical regions. They were Dongbei, Hebei, Shaanxi, Shandong, and Shanxi, respectively. A total of 23 samples were collected in these areas, including 6 samples in Dongbei, 5 samples in Shanxi, and respective 4 samples in Hebei, Shaanxi, and Shandong. Each sample was equally divided into 10 equal parts and the hyperspectral data of millet in the wavelength band from 900 to 1 700 nm was collected using a hyperspectral imager. In order to reduce the influence of uneven illumination and dark current on the experiment, the collected hyperspectral data was corrected in black and white. The ENVI software was used to select the region of interest (ROI) of millet hyperspectral image, and 9 ROIs were selected for each sample of millet. The average spectral value in the ROI was calculated, which was used as a spectrum record of the sample. Finally, a total of 2 070 spectral curves were collected, of which 540 from Dongbei, 450 from Shanxi, and several 360 from Hebei, Shandong, and Shaanxi respectively. In order to reduce the scattering phenomenon caused by the unevenness of the sample surface, which would affect the true spectral information of millet, the multivariate scatter correction (MSC) pretreatment was performed on the original spectrum. In addition, randomized division method was used to divide the corrected spectral data into training set and test set. The ratio of test set was 0.3. Linear Discriminant Analysis (LDA) was used to visualize spectral data of millet from different origins. Substituting the test set into a well-trained LDA model, and finally a confusion matrix of prediction results was created. The results showed that LDA had a prediction accuracy of 0.84 and 0.99 for Shaanxi and Shanxi, and only 0.68, 0.68, and 0.40 for Dongbei, Hebei, and Shandong. Therefore, the recursive feature elimination (RFE) was used to select useful spectral information, remove redundant information, and improve the prediction accuracy. The RFE combined with support vector machine (SVM) and Logistic Regression (LR) were used to compare and analyze the discriminant of millet from different regions. Substituting training set of millet spectral data into SVM-RFE and LR-RFE models, and the corresponding feature subsets were selected optimally by the micro-averaging of the model F-values and 3-fold cross validation technology. The results showed that the number of wavelengths selected by the LR-RFE was 74 and the Micro_F of the model was 0.59; Meanwhile the number of wavelengths selected by the SVM-RFE was 220 and the Micro_F of the model was 0.66. The selected feature subset was applied to the test set. Substituting the test set into SVM and LR models respectively, and confusion matrix of model prediction results and the receiver operating characteristic curve (ROC) of the model were used as the evaluation method. The results showed that the accuracy of SVM-RFE prediction was 1, 0.37, 0.72, 0, and 1 for Dongbei, Hebei, Shaanxi, Shandong, and Shanxi, and the area under ROC curve (AUC) was 0.82, 0.92, 0.93, 0.70, and 0.99 respectively. The accuracy of LR-RFE prediction was 0.92, 0, 0.97, 0, and 0.80, and the AUC was 0.72, 0.74, 0.94, 0.66, and 0.88 respectively. It can be seen from the prediction results that the overall classification performance of SVM-RFE model was better than that of LR-RFE, while the discrimination of Shaanxi class LR-RFE was better than that of SVM-RFE. For the Hebei and Shandong categories, neither model could effectively discriminate it. Compared with LDA, the prediction accuracy of these two models had been improved.

吉海彦, 任占奇, 饶震红. 基于高光谱成像技术的不同产地小米判别分析[J]. 光谱学与光谱分析, 2019, 39(7): 2271. JI Hai-yan, REN Zhan-qi, RAO Zhen-hong. Discriminant Analysis of Millet from Different Origins Based on Hyperspectral Imaging Technology[J]. Spectroscopy and Spectral Analysis, 2019, 39(7): 2271.

本文已被 1 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!