首页 > 论文 > 光谱学与光谱分析 > 39卷 > 7期(pp:2288-2292)

基于集成树的M型星光谱分类

Spectral Classification of M-Type Stars Based on Ensemble Tree Models

  • 摘要
  • 论文信息
  • 参考文献
  • 被引情况
  • PDF全文
分享:

摘要

在赫罗图中, M巨星位于红巨星的顶端, 是由类太阳的主序星逐渐演化而成的最明亮的一类恒星。 M巨星的研究对于理解银河系, 特别是银河系晕的性质至关重要。 中低分辨率的M巨星光谱, 常因为特征不显著、 噪声影响等因素而与M矮星的光谱混在一起, 不易区分。 现有研究一般利用CaH2+CaH3 vs. TiO5分子谱指数初步筛选M巨星光谱候选体, 再通过人眼检查确认。 但这种方法仅利用了三个巨星相关的分子带指数, 没有利用识别M巨星的其他光谱特征, 可能会由于噪声对指数的污染而导致分类错误。 而且, 人眼检查数量众多的光谱不仅耗时而且检查质量依赖于人的经验, 可靠性无法得到保证。 LAMOST望远镜自2011年开始先导巡天到2017年6月, 已经发布了900多万天体的光谱, 最新释放的光谱数据DR5包含了52万的M型星光谱数据, 需要采用自动、 准确、 有效的方法来区分其中不同光度级的M子样本。 本研究利用集成树模型分类M巨星和M矮星光谱, 分别采用随机森林、 GBDT、 XGBoost和LightGBM算法, 构建区分M巨星和M矮星的光度分类器。 四种分类器的测试准确率分别达到97.23%, 98%, 98.05%和98.32%。 实验表明LightGBM模型比其他三种集成树模型准确率更高, 训练时间更少, 分类效率更高。 对分类器模型获取到的重要特征分析的结果表明, 集成树算法有效提取并表达了用于区分M巨星和M矮星的结构性特征, 模型提取到的重要特征不仅包括原子线或分子带吸收的波长位置, 还包含了它们相邻的伪连续谱, 这与传统上计算指数所需要特征波长和伪连续谱是一致的。 相比于传统M巨星和M矮星分类方法, 集成树模型能够采用光谱中的多个重要特征组合进行分类, 避免仅依赖某一种特征易受噪声影响而得出错误的分类结果。 研究结果表明集成树算法在巨星识别过程中具有显著优势, 完全可以替代传统上只利用CaH和TiO指数的巨星光谱判别方法。 基于集成树模型对M型星光谱的分类研究, 为LAMOST高效、 准确地处理海量天体光谱提供了有效的方法。 随着LAMOST巡天项目不断开展, 积累的M巨星和M矮星样本将为研究银河系的结构和演化提供重要的数据基础。

Abstract

Located at the top of the red giants in Hertzsprung-Russell diagram, M giants are the brightest stars that evolved from the sun-like main sequence stars. The study of M giants is crucial to understand the Milky Way, especially the Galactic haloes. The spectrum of an M giants in medium and low resolution is often mixed with spectra of M dwarfs because of insignificant features, noise effects, and other factors. Previous studies often used the molecular index of CaH2+CaH3 vs. TiO5 to search for M giant candidates, then checked them with human eyes. However, this method only used three important molecular band indices associated with giants, without using other spectral features to identify the M giants, which may cause misclassification due to noise pollution of the index. Moreover, relying on human eyes to check a large number of spectra is time-consuming, and the quality of the inspection dependings on people’s experience and its reliability is not guaranteed. Since 2011, LAMOST has released more than 9 million celestial spectra. The latest spectral data product data release 5(DR5) contains 520 000 M-type spectral data, which needs an automatic, accurate and effective method to distinguish the M sub-samples of different luminosity levels. This study uses four ensemble tree models: Random Forest, GBDT, XGBoost, and LightGBM to construct classifiers that distinguish between M giants and M dwarfs. The accuracy of four classifiers is 97.23%, 98%, 98.05%, and 98.32%, respectively. Experiments showed that LightGBM has higher accuracy and less training time when compared to the other threemodels. The analysis of important features obtained by the classifier models showed that ensemble tree model can efficiently extract and express the structural features that distinguish M giants and M dwarfs. These features include not only the atomic lines, molecular bands, but also their adjacent pseudo-continuum spectrum, which is consistent with the features and pseudo-continuum spectra that we traditionally need to calculate the indices. Compared to the traditional classification methods, ensemble tree can use the combination of tens or hundreds important features in the spectrum rather than only several features to avoid misclassification affected by noises. The results of this study showed that the ensemble tree algorithm has significant advantages in the process of M giant recognition, and it can completely replace the traditional M giant spectral discrimination method using only CaH and TiO indices. In this study an effective method has been provided for LAMOST to efficiently and effectively process the massive celestial spectra. As the LAMOST survey continues, more and more M spectra will be accumulated, which provides massive data for the studies of structure and evolution of the Milky Way.

广告组1 - 空间光调制器+DMD
补充资料

中图分类号:P144.1

DOI:10.3964/j.issn.1000-0593(2019)07-2288-05

基金项目:国家自然科学基金项目(11603014, 11603012)和山东大学青年学者未来计划(2016WHWLJH09)资助

收稿日期:2018-06-06

修改稿日期:2018-10-28

网络出版日期:--

作者单位    点击查看

王 晶:山东大学(威海)机电与信息工程学院, 山东 威海 264209
衣振萍:山东大学(威海)机电与信息工程学院, 山东 威海 264209
岳丽丽:山东大学(威海)机电与信息工程学院, 山东 威海 264209
董慧芬:山东大学(威海)机电与信息工程学院, 山东 威海 264209
潘景昌:山东大学(威海)机电与信息工程学院, 山东 威海 264209
卜育德:山东大学(威海)数学与统计学院, 山东 威海 264209

联系人作者:王晶(wangjing7dhr@163.com)

备注:王 晶, 女, 1997年生, 山东大学(威海)机电与信息工程学院本科生

【1】Cui X Q, Zhao Y H, Chu Y Q, et al. Research in Astronomy and Astrophysics, 2012, 12(9): 1197.

【2】Luo A L, Zhang H T, Zhao Y H, et al. Research in Astronomy and Astrophysics, 2012, 12(9): 1243.

【3】Luo A L, Zhao Y H, Zhao G, et al. Research in Astronomy and Astrophysics, 2015, 15(8): 1095.

【4】ZHAO Yong-heng(赵永恒). Physics(物理), 2015, 44(4): 205.

【5】Zhong J, Lepine S, Li J, et al. Research in Astronomy and Astrophysics, 2015, 15(8): 1154.

【6】Yi Z, Luo A, Song Y, et al. The Astronomical Journal, 2014, 147(2): 33.

【7】Bates S D, Bailes M, Barsdell B R, et al. Monthly Notices of the Royal Astronomical Society, 2012, 427(2): 1052.

【8】Ichikawa D, Saito T, Ujita W, et al. Journal of Biomedical Informatics, 2016, 64: 20.

【9】Devine T R, Goseva-Popstojanova K, McLaughlin M. Monthly Notices of the Royal Astronomical Society, 2016, 459(2): 1519.

【10】Li N, Yu Y, Zhou Z H. Diversity Regularized Ensemble Pruning. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2012: 330.

【11】Breiman L. Machine Learning, 2001, 45(1): 5.

【12】Friedman J H. Annals of Statistics, 2001, 29(5): 1189.

【13】Roe B P, Yang H J, Zhu J, et al. Nuclear Instruments and Methods in Physics Research, 2005, 543(2-3): 577.

【14】Mller A, Ruhlmann-Kleider V, Leloup C, et al. Journal of Cosmology and Astroparticle Physics, 2016, 2016(12): 8.

【15】Chen T, Guestrin C. Xgboost: A Scalable Tree Boosting System. Proceedings of the 22nd Acmsigkdd International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 785.

【16】Kadiyala A, Kumar A. Environmental Progress & Sustainable Energy, 2018, 37(2): 618.

【17】Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree (c). Advances in Neural Information Processing Systems, 2017. 3149.

引用该论文

WANG Jing,YI Zhen-ping,YUE Li-li,DONG Hui-fen,PAN Jing-chang,BU Yu-de. Spectral Classification of M-Type Stars Based on Ensemble Tree Models[J]. Spectroscopy and Spectral Analysis, 2019, 39(7): 2288-2292

王 晶,衣振萍,岳丽丽,董慧芬,潘景昌,卜育德. 基于集成树的M型星光谱分类[J]. 光谱学与光谱分析, 2019, 39(7): 2288-2292

您的浏览器不支持PDF插件,请使用最新的(Chrome/Fire Fox等)浏览器.或者您还可以点击此处下载该论文PDF