Data analysis methods in astronomic objects classification (Sloan Digital Sky Survey DR14)
https://doi.org/10.32362/2500-316X-2021-9-3-66-77
Abstract
In the paper Sloan Digital Sky Survey DR14 dataset was investigated. It contains statistical information about many astronomical objects. The information was obtained within the framework of the Sloan Digital Sky Survey project. There are telescopes at the Earth surface, at the Earth orbit and in the Lagrange points of some systems (Earth–Moon, Sun–Earth). The telescopes gain information in different frequency ranges. The large quantity of statistical information leads to the demand for analytical algorithms and systems capable of making classification. Such information is marked up well enough to build machine learning classification systems. The paper presents the results of a number of classifiers. The handled data contains measures of three types of astronomical objects of the Sloan Digital Sky Survey DR14 dataset (star, quasar, galaxy). The CART decision tree, logistic regression, naïve Bayes classifiers and ensembles of classifiers (random forest, gradient boosting) were implemented. Conclusions about special features of each machine learning classifier trained to solve this task are made at the end of the paper. In some cases, classifiers’ structure can be explained physically. The accuracy of the classifiers built in this research is more than 90% (metrics F1, precision and recall are implemented, because the classes are unbalanced). Taking these values into account classification task is supposed to be successfully solved. At the same time, the structure of classifiers and importance of features can be used as a physical explanation of the solution.
About the Authors
V. A. GolovRussian Federation
Vladislav A. Golov, Student, Higher Mathematics Department, Institute of Cybernetics
78, Vernadskogo pr., Moscow, 119454
D. A. Petrusevich
Russian Federation
Denis A. Petrusevich, Cand. Sci. (Phys.–Math.), Associate Professor, Higher Mathematics Department, Institute of Cybernetics
ResearcherID: AAA-6661-2020, Scopus Author ID: 55900513600
78, Vernadskogo pr., Moscow, 119454
References
1. Finch A., Said J.L. Galactic rotation dynamics in f(T) gravity. Eur. Phys. J. C. 2018;78:560. https://doi.org/10.1140/epjc/s10052-018-6028-1
2. Садовникова Е.В., Шатина А.В. Эволюция вращательного движения спутника с гибкими вязкоупругими стержнями на эллиптической орбите. Российский технологический журнал. 2018;6(4):89−104. https://doi.org/10.32362/2500-316X-2018-6-4-89-104 [Sadovnikova E.V., Shatina A.V. Evolution of the rotational motion of a satellite with flexible viscoelastic rods on the elliptic orbit. Rossiiskii Tekhnologicheskii Zhurnal = Russian Technological Journal. 2018;6(4):89−104 (in Russ.).]
3. Lee K.J., Stovall K., Jenet F.A., et al. PEACE: pulsar evaluation algorithm for candidate extraction – a software package for post-analysis processing of pulsar survey candidates.Mon. Not. R. Astron. Soc. 2013;433(1):688−694. https://doi.org/10.1093/mnras/stt758
4. Wang Y.-C., Li M.-T., Pan Z.-C., Zheng J.-H. Pulsar candidate classification with deep convolutional neural networks. Res. Astron. Astrophys. 2019;19(9):133. https://doi.org/10.1088/1674-4527/19/9/133
5. Wang L., Jin J., Jiang Y., Shen Y. A Method for weak pulsar signal detection combining the bispectrum and a deep convolutional neural network. Astrophys. J. 2019;873(1):17. https://doi.org/10.3847/1538-4357/ab0308
6. Zhu W.W., Berndsten A., Madsen E.C., et al. Searching for pulsars using image pattern recognition. Astrophys. J. 2014;781(2):117. https://doi.org/10.1088/0004-637X/781/2/117
7. Abbott B.P., Abbot R., Abbott T.D., et al. Gravitational waves and gamma-rays from a binary neutron star merger: GW170817 and GRB 170817A. Astrophys. J. Lett. 2017;848(2):L13. https://doi.org/10.3847/2041-8213/aa920c
8. Vasconcellos E.C., de Carvalho R.R., Gal R.R., LaBarbera F.L., Capelato H.V., Frago Campos Velho H., Trevisan M., Ruiz R.S.R. Decision tree classifiers for star/galaxy separation. Astrophys. J. 2011;141(6):189. https://doi.org/10.1088/0004-6256/141/6/189
9. Ball N.M., Brunner R.J., Myers A.D. Robust machine learning applied to astronomical data sets. I. Star-galaxy classification of the Sloan digital sky survey DR3 using decision trees. Astrophys. J. 2006;650(1):497−509. https://doi.org/10.1086/507440
10. Ackermann M., Ajello M., Allafort A., et al. A statistical approach to recognizing source classes for unassociated sources in the first Fermi-LAT catalog. Astrophys. J. 2012;753(1):83. https://doi.org/10.1088/0004-637X/753/1/83
11. Saz Parkinson P.M., Xu H., Yu P.L. H., Salvetti D., Marelli M., Falcone A.D. Classification and ranking of Fermi-LAT gamma-ray sources from the 3FGL catalog using machine learning techniques. Astrophys. J. 2016;820(1):8. https://doi.org/10.3847/0004-637X/820/1/8
12. Farrell S.A., Murphy T., Lo K.K. Autoclassification of the variable 3XMM sources using the random forest machine learning algorithm. Astrophys. J. 2015;813(1):28. https://doi.org/10.1088/0004-637X/813/1/28
13. Weaver W.B. Spectral classification of unresolved binary stars with artificial neural networks. Astrophys. J. 2000;541(1):298−305. https://doi.org/10.1086/309425
14. Richards G.T., Nichol R.C., Gray A., et al. Efficient photometric selection of quasars from the Sloan digital sky survey: 100,000 z < 3 quasars from data release one. Astrophys. J. Suppl. Ser. 2004;155(2):257−269. https://doi.org/10.1086/425356
15. Richards G.T., Myers A.D., Peters C.M., et al. Bayesian high-redshift quasar classification from optical and midir photometry. Astrophys. J. Suppl. Ser. 2015;219(2):39. https://doi.org/10.1088/0067-0049/219/2/39
16. Sloan Digital Sky Survey DR14. Classification of stars, galaxies and quasars. Available from URL: https://www.kaggle.com/lucidlenn/sloan-digital-sky-survey
17. James G., Witten D., Hastie T., Tibshirani R. An introduction to statistical learning with applications in R. New York: Springer-Verlag; 2015. 426 p. https://doi.org/10.1007/978-1-4614-7138-7
18. Hastie T., Tibshirani R., Friedman J. The elements of statistical learning. 2nd ed. New York: Springer-Verlag; 2009. 763 p. ISBN 978-0-387-84857-0
19. Sloan digital sky survey data release 7. Available from URL: http://classic.sdss.org/dr7
20. Fermi-LAT 3FGL Catalog. Available from URL: https://fermi.gsfc.nasa.gov/ssc/data/access/lat/4yr_catalog/3FGLtable/#aitoff
21. Davis J., Goadrich M. The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, PA, USA; 2006, p. 233−240. https://doi.org/10.1145/1143844.1143874
22. Breiman L., Freidman J.H., Olshen R.A., Stone C.J. Classification and regression trees. Monterey, CA, USA: Wadsworth & Brooks/Cole Advanced Books & Software; 1984. 358 p. ISBN: 0534980548
23. Predicting a pulsar star. Available from URL: https://www.kaggle.com/pavanraj159/predicting-a-pulsar-star
24. VizieR online data catalog: XMM-Newton Serendipitous Source Catalogue 3XMM-DR8 (XMM-SSC, 2018). Available from URL: https://ui.adsabs.harvard.edu/abs/2019yCat.9055....0R/abstract
25. Sloan digital sky survey data release 1. Available from URL: http://classic.sdss.org/dr1/
26. Sloan digital sky survey data release 3. Available from URL: http://classic.sdss.org/dr3/
27. The wide-field infrared survey explorer at IPAC. The AllWise Data Release. Available from URL: http://wise2.ipac.caltech.edu/docs/release/allwise/
Supplementary files
|
1. Dependence of the first and second principal components in the extended dataset | |
Subject | ||
Type | Исследовательские инструменты | |
View
(44KB)
|
Indexing metadata ▾ |
Sloan Digital Sky Survey DR14 statistical dataset about many astronomical objects was investigated. The handled data contains measures of three types of astronomical objects of the Sloan Digital Sky Survey DR14 dataset (star, quasar, galaxy). The CART decision tree, logistic regression, naïve Bayes classifiers and ensembles of classifiers (random forest, gradient boosting) were implemented. Conclusions about special features of each machine learning classifier trained to solve this task are made. The accuracy of the classifiers built in this research is more than 90%.
Review
For citations:
Golov V.A., Petrusevich D.A. Data analysis methods in astronomic objects classification (Sloan Digital Sky Survey DR14). Russian Technological Journal. 2021;9(3):66-77. (In Russ.) https://doi.org/10.32362/2500-316X-2021-9-3-66-77