光谱学与光谱分析, 2020, 40 (2): 403, 网络出版: 2020-05-12  

吸收峰混叠的太赫兹光谱区间组合特征提取算法

Terahertz Spectral Interval Combination Feature Extraction Algorithm in the Case of Aliasing Absorption Peak
作者单位
广东工业大学自动化学院, 广东 广州 510006
摘要
太赫兹光谱是物质识别的前沿方法之一。 由于不同物质的分子组成或结构各异, 许多物质的太赫兹吸收谱会在特定频率上出现吸收峰, 可以作为混合物成分检测的重要特征。 有效准确地提取这些吸收峰的参数, 是提高识别率的关键。 多峰拟合算法将光谱曲线拟合成若干个标准峰函数之和, 能够同时提取到吸收峰的频率、 峰高、 峰宽等信息。 但是该算法以寻峰算法结果为基础确定吸收峰的大致位置和数量, 寻峰结果不一定是最优的拟合结果, 而且很难准确识别定位混叠状态的吸收峰。 为了提高混叠光谱中吸收峰的识别定位精度, 提出以大幅度平滑后的曲线波谷为分界点, 将预处理后的光谱分成若干个子区间。 然后将子区间组合起来进行多峰拟合, 通过遗传算法得到最优的拟合子区间组合和吸收峰频率近似值, 拟合时每个子区间中通过峰数递增最优化方法确定拟合的吸收峰数, 最后微调优化得到最优的吸收峰频率、 峰高值。 为了实现物质的识别, 通过密度聚类算法得到同一类纯净物在多次测量中的共同吸收峰, 以此作为标准数据, 通过提出的基于吸收峰特征的光谱匹配算法实现了纯净物和不同含量混合物的快速识别。 对10类纯净物的实际光谱数据进行拟合聚类, 得到其吸收峰参数, 结果与太赫兹光谱数据库一致。 通过识别算法对纯净物测试集进行识别的识别率为100%, 证明了特征提取和物质识别算法的有效性。 对于含有混叠峰的混合物光谱, 二阶导数法对葡萄糖-乳糖混合物光谱中被掩盖吸收峰(1.280 THz)的识别率仅为70%, 提取到的频率平均值为1.316 THz; 而该算法提高识别率至95%, 频率平均值为1.281 THz, 该算法提高了对混叠峰的分辨能力, 能够精确定位混叠峰。 对10类纯净物构成的6类不同程度混叠的二元混合物前二、 三识别率分别达到90.8%和98.3%, 提取到的特征能够有效应用于混合物的成分检测。 该算法能够以纯净物数据为标准数据实现成分各异的混合物成分检测, 对于太赫兹光谱混合物成分检测有重要意义。
Abstract
Terahertz spectrum is an advanced method for material recognition. Due to the different molecular organizations and structures of different substances, the terahertz absorption spectrum of many substances will have many absorption peaks at certain frequency, which can be used as important features of the mixture for component detection. Effective and accurate extraction of the parameters of these absorption peaks is the key to improving the recognition rate. The multi-peak fitting algorithm fits the spectral curve into the sum of several standard peak functions, which can extract the frequency, wave height and wave width of the absorption peaks at the same time. However, based on the results of the peak finding algorithm, fitting algorithm determines the approximate position and number of the absorption peaks before fitting. The peak finding result is not necessarily the optimal fitting result, and it is difficult to accurately identify the aliasing absorption peaks. In order to improve the recognition and positioning accuracy of the absorption peaks in the aliasing spectrum, this thesis proposes to divide the pre-processed spectrum into several sub-intervals by the wave troughs of sharp smoothed curve. Then the sub-intervals are combined for multi-peak fitting, and the optimal fitting sub-interval combination and the approximate value of the absorption peak frequency are obtained by genetic algorithm. The number of absorption peaks is determined by the peak number increment optimization method in each subinterval during fitting. In order to realize the identification of matter, the density clustering algorithm is used to obtain the common absorption peaks of the same kind of pure substance in multiple measurements. Using those peak data as the standard data, the proposed spectral matching algorithm based on the absorption peak characteristics enables rapid identification of pure substances and mixtures of different contents. The actual spectral data of ten kinds of pure substance are fitted and clustered to obtain parameters of absorption peaks, which are basically consistent with the terahertz spectral database. The recognition rate for identifying the test set of pure substances by the recognition algorithm of this thesis is 100%, which proves the effectiveness of the feature extraction algorithm and material recognition algorithm. For the spectrum of mixtures with aliasing peaks, the recognition rate of the second derivative method for the masked absorption peak (1.280 THz) in the glucose-lactose mixture spectrum is only 70%, and the extracted frequency average value is 1.316 THz; The algorithm in this thesis improves the recognition rate to 95% and the average frequency is 1.281 THz, that is to say, this method improves the resolution of the aliasing peak and can accurately locate the aliasing peak. The Top-2 and Top-3 accuracy of the six types of binary mixtures which have different degrees of aliasing and consist of 10 pure materials are 90.8% and 98.3%, respectively. The extracted features can be effectively applied to the component detection of the mixture. The algorithm in this thesis can realize the component detection of mixture by using the data of pure substances as the standard data, which is of great significance to the component detection of mixture in terahertz spectroscopy.

何伟健, 程良伦, 邓广水. 吸收峰混叠的太赫兹光谱区间组合特征提取算法[J]. 光谱学与光谱分析, 2020, 40(2): 403. HE Wei-jian, CHENG Liang-lun, DENG Guang-shui. Terahertz Spectral Interval Combination Feature Extraction Algorithm in the Case of Aliasing Absorption Peak[J]. Spectroscopy and Spectral Analysis, 2020, 40(2): 403.

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!