光谱学与光谱分析, 2019, 39 (7): 2176, 网络出版: 2019-07-23   

基于BiPLS结合SiPLS的组合权值COD浓度预测模型

Combination Weight COD Concentration Prediction Model Based on BiPLS and SiPLS
作者单位
1 燕山大学电气工程学院, 河北省测试计量技术及仪器重点实验室, 河北 秦皇岛 066004
2 河北科技大学信息科学与工程学院, 河北 石家庄 050018
3 河北先河环保科技股份有限公司, 河北 石家庄 050000
摘要
水体中过高浓度的有机物含量危害巨大, 不仅会造成严重的环境污染, 而且危害人类身体健康, 传统化学法检测水体化学需氧量(COD)的步骤繁琐且时效性差, 不利于水体中COD的快速定量检测。 针对这些问题, 提出了一种将紫外光谱与组合权值模型相结合的快速定量检测COD方法, 该组合权值模型是基于反向区间偏最小二乘法(BiPLS)结合组合区间偏最小二乘法(SiPLS)算法对紫外光谱的特征子区间筛选组合, 然后依据特征子区间的权值建立的预测模型。 首先按照一定的浓度梯度配制45份COD标准液样本, 通过实验获取标准液的紫外光谱数据; 对获取到的COD紫外光谱数据做一阶导数和S-G滤波(Savitzky-Golay)的预处理, 消除基线漂移和环境干扰噪声; 应用SPXY(Sample set partitioning based on jiont X-Y)算法将实验样本数据组划分成校正集和预测集。 然后基于BiPLS算法对全光谱区间进行波长筛选, 在BiPLS筛选过程中, 目标区间的划分数量会对建模产生较大影响, 于是对子区间划分数量进行优化, 把子区间分成15~25个, 在不同区间数下都进行偏最小二乘(PLS)建模, 通过交互验证均方根误差(RMSECV)来筛选最优子区间数, 得到区间数为18时, 模型效果最佳。 从18个波长区间筛选出了6个特征波长子区间, 入选的子区间为2, 1, 3, 11, 7和6, 对应波长为234~240, 262~268, 269~275, 290~296, 297~303和304~310 nm, 这6个特征波长区间涵盖了大量的光谱信息, 对最终预测模型的贡献度大; 接下来通过SiPLS算法对这6个初选区间进行进一步的筛选组合, 采用不同的组合数构建不同特征区间上的PLS模型, 在相同组合数下, 筛选出一个区间组合数最优的结果, 对比不同组合数下预测模型的误差与相关性, 将6个区间筛选组合为3个特征波长区间, 分别为234~240, 262~275和290~310 nm, 这三个特征区间最佳因子数分别为4, 4和3。 对传统SiPLS的特征区间组合方法进行改进, 基于权值的大小来对这3个特征区间进行线性组合, 代替过去特征区间直接组合的方法。 通过权值公式计算出这3个特征区间的权重大小分别为0.509, 0.318和0.173, 最终建立线性组合权值COD浓度预测模型。 为了验证组合权重预测模型的精度, 另外建立了全波长范围内的PLS预测模型、 单个特征波长区间的PLS预测模型、 直接组合特征波长区间的PLS模型, 并使用评价参数相关系数的平方(R2)、 预测值与真实浓度值的均方根误差(RMSEP)和预测回收率(T)来对模型评价。 验证结果表明, 相比其他预测模型, 组合权值模型相关系数的平方达到了0.999 7, 明显优于直接组合特征区间建模的0.968 0, 预测均方根误差为0.532, 比直接组合特征区间的预测模型误差降低了29.3%, 预测回收率为96.4%~103.1%, 显著地提高了预测精度。 该方法简单可行, 不会产生二次污染, 可为在线监测水体中COD浓度提供一定的技术支持。
Abstract
The excessively high concentration of organic matter in water poses a great harm, which not only causes serious environmental pollution, but also harms human health. The traditional chemical method for detecting COD(Chemical oxygen denmand, COD) in water usually takes a long time, which is not conducive to rapid quantitative detection of COD in water. In order to solve these problems, a rapid and quantitative detection of COD using a combination of UV spectroscopy and combined weight models is proposed in this paper, the prediction model is based on the backward interval partial least squares (BiPLS) and synergy interval partial least squares (SiPLS) algorithm for screening the characteristic Intervals of UV spectra, and then based on the weights of the characteristic Intervals, a combination weight concentration prediction model is established. In this paper, 45 samples of COD standard solution are experimented; The first derivative and S-G screening of COD UV spect rum date are preprocessed to eliminate baseline drift and environmental noise; The SPXY algorithm is used to divide the experimental data sets into calibration sets and prediction sets. Then, the wavelength of the whole spectral range is screened based on the BiPLS algorithm. In the process of BiPLS screening, the selection of the number of target interval division will have a great influence on the model, so the number of Interval divisions is optimized, subintervals are divided into 15 to 25, and PLS modeling is performed under different interval numbers. The optimal subinterval number is selected by cross-validating root mean square error (RMSECV). When the number of intervals is 18, the effect of the model is the best. 6 characteristic wavelengths are selected from 18 wavelengths. The selected Intervals are 2, 1, 3, 11, 7, 6, and the corresponding wavelengths are 234~240, 262~268, 269~275, 290~296, 297~303, 304~310 nm, respectively. These 6 characteristic wavelength ranges cover a large amount of spectral information and contribute greatly to the final prediction model. Then, these 6 regions are further screened and combined through the SiPLS algorithm, PLS models with different characteristic intervals are constructed using different combinations under the same combination number, the optimal results of an interval combination number are screened out, and the error and correlation of the prediction models under different combinations are compared, the 6 interval are combined into 3 characteristic wavelength intervals, which are 234~240, 262~275 and 290~310 nm respectively. The optimal factor of the optimal PLS model for these three characteristic intervals is 4, 4 and 3, respectively. The characteristic interval combination method of the traditional SiPLS is improved, and the three characteristic intervals are linearly combined based on the weight value instead of the previous direct combination of characteristic intervals. The weights of these three characteristic intervals are calculated by the weight formula as 0.509, 0.318 and 0.173 respectively. Finally, a linear combination weight COD concentration prediction model is established. In order to verify the accuracy of the combined weight prediction model, a PLS prediction model over the full wavelength range, a PLS prediction model with a single characteristic wavelength interval, and a PLS model directly combining characteristic wavelength intervals are established, and the square of the correlation coefficient of the evaluation parameter (R2), the root mean square error of the predicted value and the true concentration value (RMSEC) as well as the Predicted recovery (T) are used to evaluate the model. Compared with other predictive models, the verification results show that the square of the correlation coefficient of the combined weight model reaches 0.999 7, which is obviously higher than the 0.968 0 of the direct combined characteristic interval model, the prediction root mean square error is 0.532, which is more than the prediction of the direct combination characteristic intervals. The model error is reduced by 29.3%, the predicted recovery rate is 96.4%~103.1%, which significantly improves the prediction accuracy. The method is simple and feasible without generating twice pollution, which can provide some technical support for on-line monitoring of COD concentration in water.

陈颖, 邸远见, 唐心亮, 崔行宁, 高新贝, 曹景刚, 李少华. 基于BiPLS结合SiPLS的组合权值COD浓度预测模型[J]. 光谱学与光谱分析, 2019, 39(7): 2176. CHEN Ying, DI Yuan-jian, TANG Xin-liang, CUI Xing-ning, GAO Xin-bei, CAO Jing-gang, LI Shao-hua. Combination Weight COD Concentration Prediction Model Based on BiPLS and SiPLS[J]. Spectroscopy and Spectral Analysis, 2019, 39(7): 2176.

本文已被 1 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!