Preview

Russian Technological Journal

Advanced search

Data analysis methods in astronomic objects classification (Sloan Digital Sky Survey DR14)

https://doi.org/10.32362/2500-316X-2021-9-3-66-77

Abstract

In the paper Sloan Digital Sky Survey DR14 dataset was investigated. It contains statistical information about many astronomical objects. The information was obtained within the framework of the Sloan Digital Sky Survey project. There are telescopes at the Earth surface, at the Earth orbit and in the Lagrange points of some systems (Earth–Moon, Sun–Earth). The telescopes gain information in different frequency ranges. The large quantity of statistical information leads to the demand for analytical algorithms and systems capable of making classification. Such information is marked up well enough to build machine learning classification systems. The paper presents the results of a number of classifiers. The handled data contains measures of three types of astronomical objects of the Sloan Digital Sky Survey DR14 dataset (star, quasar, galaxy). The CART decision tree, logistic regression, naïve Bayes classifiers and ensembles of classifiers (random forest, gradient boosting) were implemented. Conclusions about special features of each machine learning classifier trained to solve this task are made at the end of the paper. In some cases, classifiers’ structure can be explained physically. The accuracy of the classifiers built in this research is more than 90% (metrics F1, precision and recall are implemented, because the classes are unbalanced). Taking these values into account classification task is supposed to be successfully solved. At the same time, the structure of classifiers and importance of features can be used as a physical explanation of the solution.

About the Authors

V. A. Golov
MIREA – Russian Technological University
Russian Federation

Vladislav A. Golov, Student, Higher Mathematics Department, Institute of Cybernetics

78, Vernadskogo pr., Moscow, 119454 



D. A. Petrusevich
MIREA – Russian Technological University
Russian Federation

Denis A. Petrusevich, Cand. Sci. (Phys.–Math.), Associate Professor, Higher Mathematics Department, Institute of Cybernetics

ResearcherID: AAA-6661-2020, Scopus Author ID: 55900513600 

78, Vernadskogo pr., Moscow, 119454



References

1. Finch A., Said J.L. Galactic rotation dynamics in f(T) gravity. Eur. Phys. J. C. 2018;78:560. https://doi.org/10.1140/epjc/s10052-018-6028-1

2. Садовникова Е.В., Шатина А.В. Эволюция вращательного движения спутника с гибкими вязкоупругими стержнями на эллиптической орбите. Российский технологический журнал. 2018;6(4):89−104. https://doi.org/10.32362/2500-316X-2018-6-4-89-104 [Sadovnikova E.V., Shatina A.V. Evolution of the rotational motion of a satellite with flexible viscoelastic rods on the elliptic orbit. Rossiiskii Tekhnologicheskii Zhurnal = Russian Technological Journal. 2018;6(4):89−104 (in Russ.).]

3. Lee K.J., Stovall K., Jenet F.A., et al. PEACE: pulsar evaluation algorithm for candidate extraction – a software package for post-analysis processing of pulsar survey candidates.Mon. Not. R. Astron. Soc. 2013;433(1):688−694. https://doi.org/10.1093/mnras/stt758

4. Wang Y.-C., Li M.-T., Pan Z.-C., Zheng J.-H. Pulsar candidate classification with deep convolutional neural networks. Res. Astron. Astrophys. 2019;19(9):133. https://doi.org/10.1088/1674-4527/19/9/133

5. Wang L., Jin J., Jiang Y., Shen Y. A Method for weak pulsar signal detection combining the bispectrum and a deep convolutional neural network. Astrophys. J. 2019;873(1):17. https://doi.org/10.3847/1538-4357/ab0308

6. Zhu W.W., Berndsten A., Madsen E.C., et al. Searching for pulsars using image pattern recognition. Astrophys. J. 2014;781(2):117. https://doi.org/10.1088/0004-637X/781/2/117

7. Abbott B.P., Abbot R., Abbott T.D., et al. Gravitational waves and gamma-rays from a binary neutron star merger: GW170817 and GRB 170817A. Astrophys. J. Lett. 2017;848(2):L13. https://doi.org/10.3847/2041-8213/aa920c

8. Vasconcellos E.C., de Carvalho R.R., Gal R.R., LaBarbera F.L., Capelato H.V., Frago Campos Velho H., Trevisan M., Ruiz R.S.R. Decision tree classifiers for star/galaxy separation. Astrophys. J. 2011;141(6):189. https://doi.org/10.1088/0004-6256/141/6/189

9. Ball N.M., Brunner R.J., Myers A.D. Robust machine learning applied to astronomical data sets. I. Star-galaxy classification of the Sloan digital sky survey DR3 using decision trees. Astrophys. J. 2006;650(1):497−509. https://doi.org/10.1086/507440

10. Ackermann M., Ajello M., Allafort A., et al. A statistical approach to recognizing source classes for unassociated sources in the first Fermi-LAT catalog. Astrophys. J. 2012;753(1):83. https://doi.org/10.1088/0004-637X/753/1/83

11. Saz Parkinson P.M., Xu H., Yu P.L. H., Salvetti D., Marelli M., Falcone A.D. Classification and ranking of Fermi-LAT gamma-ray sources from the 3FGL catalog using machine learning techniques. Astrophys. J. 2016;820(1):8. https://doi.org/10.3847/0004-637X/820/1/8

12. Farrell S.A., Murphy T., Lo K.K. Autoclassification of the variable 3XMM sources using the random forest machine learning algorithm. Astrophys. J. 2015;813(1):28. https://doi.org/10.1088/0004-637X/813/1/28

13. Weaver W.B. Spectral classification of unresolved binary stars with artificial neural networks. Astrophys. J. 2000;541(1):298−305. https://doi.org/10.1086/309425

14. Richards G.T., Nichol R.C., Gray A., et al. Efficient photometric selection of quasars from the Sloan digital sky survey: 100,000 z < 3 quasars from data release one. Astrophys. J. Suppl. Ser. 2004;155(2):257−269. https://doi.org/10.1086/425356

15. Richards G.T., Myers A.D., Peters C.M., et al. Bayesian high-redshift quasar classification from optical and midir photometry. Astrophys. J. Suppl. Ser. 2015;219(2):39. https://doi.org/10.1088/0067-0049/219/2/39

16. Sloan Digital Sky Survey DR14. Classification of stars, galaxies and quasars. Available from URL: https://www.kaggle.com/lucidlenn/sloan-digital-sky-survey

17. James G., Witten D., Hastie T., Tibshirani R. An introduction to statistical learning with applications in R. New York: Springer-Verlag; 2015. 426 p. https://doi.org/10.1007/978-1-4614-7138-7

18. Hastie T., Tibshirani R., Friedman J. The elements of statistical learning. 2nd ed. New York: Springer-Verlag; 2009. 763 p. ISBN 978-0-387-84857-0

19. Sloan digital sky survey data release 7. Available from URL: http://classic.sdss.org/dr7

20. Fermi-LAT 3FGL Catalog. Available from URL: https://fermi.gsfc.nasa.gov/ssc/data/access/lat/4yr_catalog/3FGLtable/#aitoff

21. Davis J., Goadrich M. The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, PA, USA; 2006, p. 233−240. https://doi.org/10.1145/1143844.1143874

22. Breiman L., Freidman J.H., Olshen R.A., Stone C.J. Classification and regression trees. Monterey, CA, USA: Wadsworth & Brooks/Cole Advanced Books & Software; 1984. 358 p. ISBN: 0534980548

23. Predicting a pulsar star. Available from URL: https://www.kaggle.com/pavanraj159/predicting-a-pulsar-star

24. VizieR online data catalog: XMM-Newton Serendipitous Source Catalogue 3XMM-DR8 (XMM-SSC, 2018). Available from URL: https://ui.adsabs.harvard.edu/abs/2019yCat.9055....0R/abstract

25. Sloan digital sky survey data release 1. Available from URL: http://classic.sdss.org/dr1/

26. Sloan digital sky survey data release 3. Available from URL: http://classic.sdss.org/dr3/

27. The wide-field infrared survey explorer at IPAC. The AllWise Data Release. Available from URL: http://wise2.ipac.caltech.edu/docs/release/allwise/


Supplementary files

1. Dependence of the first and second principal components in the extended dataset
Subject
Type Исследовательские инструменты
View (44KB)    
Indexing metadata ▾

Sloan Digital Sky Survey DR14 statistical dataset about many astronomical objects was investigated. The handled data contains measures of three types of astronomical objects of the Sloan Digital Sky Survey DR14 dataset (star, quasar, galaxy). The CART decision tree, logistic regression, naïve Bayes classifiers and ensembles of classifiers (random forest, gradient boosting) were implemented. Conclusions about special features of each machine learning classifier trained to solve this task are made. The accuracy of the classifiers built in this research is more than 90%.

Review

For citations:


Golov V.A., Petrusevich D.A. Data analysis methods in astronomic objects classification (Sloan Digital Sky Survey DR14). Russian Technological Journal. 2021;9(3):66-77. (In Russ.) https://doi.org/10.32362/2500-316X-2021-9-3-66-77

Views: 614


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2782-3210 (Print)
ISSN 2500-316X (Online)