光谱学与光谱分析, 2020, 40 (9): 2913, 网络出版: 2020-11-29   

基于t-SNE的恒星光谱降维与分类研究

Decomposition and Classification of Stellar Spectra Based on t-SNE
作者单位
山东大学机电与信息工程学院, 山东 威海 264209
摘要
随着天文学的发展以及天文望远镜观测能力的提升, 国内外许多大型巡天望远镜将产生PB级的恒星光谱数据。 恒星光谱是来自恒星的电磁辐射, 通常由连续谱与吸收线叠加而成, 其差异源于恒星的有效温度、 表面重力加速度以及元素的化学丰度等。 恒星光谱自动分类是天文数据处理的一项重要研究内容, 是研究恒星演化和参数测量的基础。 海量的恒星光谱对分类方法提出了高效、 准确的要求。 传统的人工分类方法存在速度慢、 精度低等缺点, 已经无法满足海量恒星光谱特别是低信噪比恒星光谱自动分类的实际需要, 机器学习算法目前已经被广泛地应用于恒星光谱分类。 恒星光谱的一个显著特征是数据维度较高, 降维不但可以实现特征提取, 而且可以降低计算量, 是光谱分类的首要任务。 传统的线性降维方法如主成分分析仅依据方差对光谱进行降维, 不同类型的光谱在投影到低维特征空间后会出现交叉现象, 而流形学习能够产生优良的分类边界, 很好地避开重叠, 有利于后续的分类。 针对光谱数据维度较高的特点, 研究了光谱数据在高维空间内的分布以及流形学习对高维线性数据降维的原理, 比较了t-SNE和主成分分析两种降维方法对光谱数据降维的效果, 并使用基于属性值相关距离的改进的K近邻算法进行光谱分类, 最终对实验结果进行了分析并使用多种机器学习分类器进行比较和验证。 采用Python语言及Scikit-learn第三方库实现了算法, 对SDSS的12 000条低信噪比的恒星光谱进行实验, 最终实现了光谱数据的高精度自动处理和分类。 实验结果表明, 对于光谱数据的降维处理, 基于流形学习的t-SNE方法能够在高维光谱数据中恢复低维流形结构, 即找出高维空间中的低维流形, 并解出与之对应的嵌入映射, 在降维过程中最大程度地保留不同类别光谱样本之间的差异从而产生明显的分类边界。 特征提取后, 使用机器学习分类器能够在测试数据集上达到满意的分类准确率。 所使用的方法也可以应用于其他的巡天望远镜产生的海量光谱的自动分类以及稀少天体的数据挖掘。
Abstract
With the development of astronomy and the improvement of telescope observation ability, many large sky survey telescopes have produced petabytes of stellar spectra. Stellar spectra are a kind of complex frequency domain signal, which is usually composed of continuous spectrum and absorption lines. The differences are mainly caused by the effective temperature, surface gravity acceleration and chemical abundance of elements of stars. The automatic classification of stellar spectra is an important part of astronomical data processing and the basis of studying stellar evolution and parameter measurement. The massive stellar spectra require efficient and accurate classification methods. The traditional manual classification methods have the disadvantages of low speed and accuracy, which cannot meet the actual needs of automatic classification of massive stellar spectra. Machine learning algorithms have been widely used in spectra classification. A significant feature of the stellar spectra is the high data dimension. Dimensionality reduction can not only achieve feature extraction, but also reduce the amount of computation, which is the primary task of spectra classification. The traditional linear dimensionality reduction method only reduces the spectra according to the variance, and different types of spectra will cross in the feature space, while manifold learning can produce good classification boundaries to avoid overlap, which is conducive to subsequent classification. In this paper, the distribution of spectra in high dimensional space and the principle of manifold learning to dimensionality reduction of high dimensional linear data are studied. The effects of two dimensionality reduction methods: t-SNE and principal component analysis were compared and the improved k-nearest neighbor algorithm based on the correlation distance of attribute values was used for spectra classification. Python and Scikit-learn were used to implement the algorithm. 12 000 low signal/noise stellar spectra from SDSS were tested and high precision automatic processing and classification of spectral data are realized finally. Experimental results show that the t-SNE method based on manifold learning can restore the low-dimensional manifold structure in high dimensional spectral data. The low-dimensional manifold features in high-dimensional spaces are found and the corresponding embedded mappings are solved. In the process of dimension reduction, the differences between spectral samples of different categories are preserved to the greatest extent. The three-dimensional visualization of the experimental results shows that PCA can lead to the crossover of the distribution of stellar spectra of different categories, while the t-SNE algorithm can produce more obvious category boundaries. The k-nearest neighbor algorithm based on attribute value correlation distance can achieve satisfactory classification accuracy on test data sets after feature extraction. The method used in this paper can also be applied to the automatic classification of massive spectra generated by other telescopes and data mining of rare objects.

姜斌, 赵梓良, 王淑婷, 韦纪宇, 曲美霞. 基于t-SNE的恒星光谱降维与分类研究[J]. 光谱学与光谱分析, 2020, 40(9): 2913. JIANG Bin, ZHAO Zi-liang, WANG Shu-ting, WEI Ji-yu, QU Mei-xia. Decomposition and Classification of Stellar Spectra Based on t-SNE[J]. Spectroscopy and Spectral Analysis, 2020, 40(9): 2913.

本文已被 2 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!