利用融合数据分布特征的模糊双支持向量机对恒星光谱分类

刘忠宝; 秦振涛; 罗学刚; 周方晓; 张靖

doi:doi:10.3964/j.issn.1000-0593(2019)04-1307-05

光谱学与光谱分析, 2019, 39 (4): 1307, 网络出版: 2019-04-11

利用融合数据分布特征的模糊双支持向量机对恒星光谱分类

Stellar Spectra Classification by Support Vector Machine with Spectral Distribution Properties

刘忠宝 ^1,2,*秦振涛 ¹罗学刚 ¹周方晓 ¹张靖 ¹

作者单位

¹ 攀枝花学院数学与计算机学院, 四川攀枝花 617000

² 中北大学软件学院, 山西太原 030051

摘要

恒星光谱分类是天文学研究的一个热点问题。随着观测光谱数量的急剧增加, 传统的人工分类无法满足实际需求, 急需利用自动化技术, 特别是数据挖掘算法来对恒星光谱进行自动分类。关联规则、神经网络、自组织网络等数据挖掘算法已广泛应用于恒星光谱分类。其中, 支持向量机（SVM）分类能力突出, 被广泛应用于恒星光谱分类。该方法试图在两类样本之间找到一个最优分类面将两类分开。该方法具有较高的时间复杂度, 计算效率有限。双支持向量机（TWSVM）的出现有效地解决了SVM面临的效率问题。该方法通过构造两个非平行的分类面将两类分开, 每一类靠近某个分类面, 而远离另一个分类面。 TWSVM的计算效率较之传统SVM提高近4倍, 因此, 自TWSVM提出后便受到研究人员的持续关注。但上述方法在分类决策时, 一方面没有考虑数据的分布特征, 另一方面较易受噪声点和奇异点的影响, 分类效率难以显著提升。鉴于此, 在双支持向量机的基础上, 提出融合数据分布特征的模糊双支持向量机（TWSVM-SDP）。该方法引入线性判别分析（LDA）的类间离散度和类内离散度, 用以表征光谱数据的分布性状; 引入模糊隶属度函数用以降低噪声点和奇异点对分类结果的影响。在SDSS DR8恒星光谱数据集上的比较实验表明, 与支持向量机SVM、双支持向量机TWSVM等传统分类方法相比, 融合数据分布特征的模糊双支持向量机TWSVM-SDP具有更优的分类能力。该方法亦存在一定的局限性, 其中一大难题是其无法处理海量光谱数据。接下来将利用大数据处理技术, 来对所提方法在大数据环境下的适应性展开进一步研究。

Abstract

Stellar spectra classification is one of hot spots in astronomy. With hundreds and thousands of spectra obtained by researchers, it is a big challenge to process them manually. It’s urgent to apply the automatic technologies, especially the data mining algorithms, to classify the stellar spectra. Neural networks, self organization mapping, association rules and other data mining algorithms have been utilized to classify the stellar spectra in recent years. In these methods, Support Vector Machine (SVM), as a typical classification method, is widely used in the stellar spectra classification due to its good learning capability and excellent classification performance. The basic idea of standard SVM is to find an optimal separating hyper-plane between the positive and negative samples. Its time complexity is so high that its classification efficiencies can’t be greatly improved. Twin Support Vector Machine (TWSVM) is proposed to deal with the above problem. It aims at generating two non-parallel hyper-planes such that each plane is close to one class and as far as possible from the other one. The learning speed of TWSVM is approximately four times faster than the classical SVM. The limitation of TWSVM is that it doesn’t take spectral distribution properties into consideration, and its efficiencies are prone to be influenced by noise and singular points. In view of this, Fuzzy Twin Support Vector Machine with Spectral Distribution Properties (TWSVM-SDP) is proposed, in which between-class scatter and within-class scatter in Linear Discriminant Analysis (LDA) is introduced to describe the spectral distribution properties and the fuzzy membership function is introduced to decrease the influences of noise and singular points. Comparative experiments on SDSS DR8 stellar spectra datasets verity TWSVM-SDP performs better than SVM and TWSVM. However, some limitations exist in TWSVM-SDP, for example, how to deal with the mass spectra is quite difficult to solve. We will research the adaptability of our proposed method in the big data environment based on big data technologies.

PDF全文

刘忠宝, 秦振涛, 罗学刚, 周方晓, 张靖. 利用融合数据分布特征的模糊双支持向量机对恒星光谱分类[J]. 光谱学与光谱分析, 2019, 39(4): 1307. LIU Zhong-bao, QIN Zhen-tao, LUO Xue-gang, ZHOU Fang-xiao, ZHANG Jing. Stellar Spectra Classification by Support Vector Machine with Spectral Distribution Properties[J]. Spectroscopy and Spectral Analysis, 2019, 39(4): 1307.

利用融合数据分布特征的模糊双支持向量机对恒星光谱分类

关于本站 Cookie 的使用提示

全站搜索

利用融合数据分布特征的模糊双支持向量机对恒星光谱分类

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索