Preview

Russian Technological Journal

Advanced search

Genetic clustering algorithm

https://doi.org/10.32362/2500-316X-2019-7-6-134-150

Abstract

The genetic algorithm of clustering of analysis objects in different data domains has been offered within the hybrid concept of intelligent information technologies development aimed to support decision-making. The algorithm makes it possible to account for different preferences of the analyst in clustering reflected in a calculation formula of fitness function. The place of this algorithm among those used for cluster analysis has been shown. The algorithm is simple in its program implementation, which increases its usage reliability. The used technology of evolutionary modeling is rather expanded in the mentioned algorithm. Firstly, the decimal chromosomes coding is used instead of the traditional binary coding. This has resulted from the fact that the chromosome genes condition is multiple and not binary. Moreover, this is due to the absence of the genetic operator of inversion in this algorithm. Secondly, a new genetic operator used for filtering has been implemented. This operator eliminates chromosomes that do not meet the required clusters quantity condition in a task. Such chromosomes can appear in the stochastic process of their evolution. The presented algorithm has been studied in a series of simulation experiments. As a result, it has been found that stabilization of splitting into clusters is reached when the number of completed generations of evolution is 200 and more, and the population size is rather small: from 150 chromosomes (in this case no considerable amount of random-access store is required). The calculations carried out on real data showed for this algorithm the high quality of clustering and the acceptable computing speed of the same order with the computing speed of SOM and “k-means” algorithms.

About the Author

M. A. Anfyorov
MIREA – Russian Technological University
Russian Federation

Mikhail A. Аnfyorov, Dr. of Sci. (Engineering), Professor of the Chair “Applied and Business Informatics,”

78, Vernadskogo pr., Moscow 119454, Russia



References

1. Fuzzy systems, soft calculations and intellectual technologies: Proceedings of the VII All-Russian Scientific and Practical Conference. St. Petersburg, July 03-07, 2017. V. 2. St. Petersburg: Politekhnika-servis Publ., 2017. 210 p. (in Russ.).

2. Yudin V.N., Karpov L.E. Incompletely described objects in decision support. Programming and Computer Software. 2017;43(5):294-299.

3. Аnfyorov M.А. System optimization of high technologies. Izv. VUZ. Аviatsionnaya tekhnika = Russian Aeronautics. 2002;2:57-60 (in Russ.).

4. Batyrshin I.Z., Nedosekin А.А., Stetsko А.А., Tarasov V.B., Yazenin A.V., Yarushkina N.G. Fuzzy hybrid systems: theory and practice. Ed. N.G. Yarushkina. Moscow: Fizmatlit Publ., 2007. 207 p. (in Russ.).

5. Аdzhemov S.S., Klenov N.V., Tereshonok M.V., Chirov D.S. The use of artificial neural networks for classification of signal sources in cognitive radio systems. Programming and Computer Software. 2016;42(3):121-128. 10.1134/S0361768816030026

6. Hu Z., Bodyanskiy Y., Tyshchenko O.K. Self-learning procedures for a kernel fuzzy clustering system. Advances in Computing Science for Engineering and Education. 2019;754:487-497. http://dx.doi.org/10.1007/978-3-319-91008-6_49

7. Аnfyorov M.А., Khannanov M.G. Cluster approach to design in CAM. In: Provedeniye nauchnykh issledovaniy v oblasti obrabotki, khraneniya, peredachi i zashchity informatsii = Conducting scientific research in the field of processing, storage, transmission and protection of information. Collection of scientific papers in 4 v. V. 3. Ul’yanovsk: UlGTU Publ., 2009; pp. 60-65 (in Russ.).

8. Borozdina N.А. The use of hierarchical cluster analysis for segmentation of consumers of the market of cellular communication. Molodoi uchenyi = Young scientist. 2016;29:365-367. (in Russ.). URL: https://moluch.ru/archive/133/37358/ (accessed November 13, 2019).

9. Dudarin P.V., Yarushkina N.G. Approaches to fuzzy and hierarchical clustering and classification of key process indicators of the strategic planning system of the Russian Federation. In: Proceedings of the VII All-Russian Scientific and Practical Conference “Fuzzy Systems, Soft Computing and Intelligent Technology”. St. Petersburg, July 03–07, 2017. V. 2. SPb.: Polytekhnik servis Publ., 2017; pp. 65-73 (in Russ.).

10. Petukhova M.V. Clustering of borrowers at the level of defaults: Rating approach (regions of Siberian Federal District). Zhurnal Novoi ekonomicheskoi assotsiatsii = J. New Economic Association. 2012;4(16):71-102 (in Russ.).

11. Аnfyorov M.А. Kohonen networks in a problem of identification of economically unstable regional structures. Proceedings of the XV All-Russian Scientific and Practical Conference “Nejroinformatika 13” [Neuroinformatics-2013]. Moscow, 21–25 January, 2013. V. 3. М.: NIYaU MIFI, 2013; pp. 177-184 (in Russ.).

12. Аnfyorov M.А., Rashitova O.B. SADT modeling of the Russian Federation tax system. Ekonomika i upravlenie: nauchno-prakticheskii zhurnal = Economics and Management: Research and Practice Journal. 2015;2(124):94-101 (in Russ.).

13. González del Pozo R., García-Lapresta J.L., Pérez-Román D. Clustering U.S. 2016 presidential candidates through linguistic appraisals. Advances in Intelligent Systems and Computing. 2018;642:143-153. https://doi.org/10.1007/978-3-319-66824-6_13

14. Kvostikov А.V., Krylov А.S., Kamalov U.R. Ultrasound image texture analysis for liver fibrosis stage diagnostics. Programming and Computer Software. 2015;41(5):273-278. https://doi.org/10.1134/S0361768815050059

15. Kumar S., Mishra S., Asthana P. Automated detection of acute leukemia using k-mean clustering algorithm. Advances in Intelligent Systems and Computing. 2018;554:655-670. https://doi.org/10.1007/978-981-10-3773-3_64

16. Abadi S., Sari T.I., Maseleno A., Muslihudin M., Mat The K.S., Nasir B.M., Huda M., Ivanova N.L., Satria F. Application model of k-means clustering: insights into promotion strategy of vocational high school. International Journal of Engineering and Technology. 2018;7(2.27):182-187. http://dx.doi.org/10.14419/ijet.v7i2.11491

17. Hussain S., Atallah R., Kamsin A., Hazarika J. Classification, Clustering and Association Rule Mining in Educational Datasets Using Data Mining Tools: A Case Study. Advances in Intelligent Systems and Computing. 2019;765:196-211. https://doi.org/10.1007/978-3-319-91192-2_21

18. Kharinov M.V. Pixel clustering for color image segmentation. Programming and Computer Software. 2015:41(5):258-266. https://doi.org/10.1134/S0361768815050047

19. Аstrakhantsev N.А., Fedorenko D.G. Turdakov D.YU. Methods for automatic term recognition in domainspecific text collections: A survey. Programming and Computer Software. 2015;41(6):336-349. https://doi.org/10.1134/S036176881506002X

20. Lakhno V., Zaitsev S., Tkach Y., Petrenko T. Adaptive expert systems development for cyber attacks recognition in information educational systems on the basis of signs’ clustering. Advances in Intelligent Systems and Computing. 2019;754:673-682. https://doi.org/10.1007/978-3-319-91008-6_66

21. Hartigan, J.A., Wong, M. A. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics). 1979;28(1):100-108.

22. Kohonen T. Self-Organizing Maps: 3rd edition. Berlin - New York: Springer-Verlag, 2001. 521 p.

23. Zhang T., Ramakrishnan R., Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on Management of data (SIGMOD ’96). 1996; pp. 103-114. https://doi.org/10.1145/235968.233324

24. Päivinen N. Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recognition Letters. 2005;26(7):921-930. https://doi.org/10.1016/j.patrec.2004.09.039

25. Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. CURE: An Efficient Clustering Algorithm for Large Databases. Information Systems. 1998;26(1):35-58. https://doi.org/10.1016/S0306-4379(01)00008-4

26. Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. ROCK: a robust clustering algorithm for categorical attributes. Information Systems. 2000;25(5):345-366. https://doi.org/10.1016/S0306-4379(00)00022-3

27. Bodyanskiy Y., Didyk O. On-line robust fuzzy clustering for anomalies detection. Advances in Intelligent Systems and Computing. 2019;754:402-409. https://doi.org/10.1007/978-3-319-91008-6_40

28. Ivanova E.V., Sokolinsky L.B. Parallel processing of very large databases with using distributed columnar indexes. Programming and Computer Software. 2017;43(3):131-144. https://doi.org/10.1134/S0361768817030069

29. Shao J., Yang Q., Schmidt B., Dang H-V., Kramer S. Scalable Clustering by Iterative Partitioning and Point Attractor Representation. ACM Transactions on Knowledge Discovery from Data. 2016;11(1):5:1-5:23. https://doi.org/10.1145/2934688

30. Songlei J.https://orcid.org/0000-0001-5760-6431, Guansong P., Longbing C., Kai L., Hang G. CURE: Flexible Categorical Data Representation by Hierarchical Coupling Learning. IEEE Transactions on Knowledge and Data Engineering. 2019;31(5):853-866. https://doi.org/10.1109/TKDE.2018.2848902

31. Sheikholeslami G., Chatterjee S., Zhang A. WaveCluster: A Wavelet-Based Clustering Approach for Spatial Data. VLDB Journal. 2000;8(2-4):289-304. http://dx.doi.org/10.1007/s007780050009

32. Gionis A., Mannila H., Tsaparas P. Clustering Aggregation. ACM Transactions on Knowledge Discovery from Data. 2007;1(1):Article 4. 30 p. https://doi.org/10.1145/1217299.1217303

33. Wang C., She Z., Stantic B., Chi C.H., Cao L. Coupled Clustering Ensemble by Exploring Data Interdependence. ACM Transactions on Knowledge Discovery from Data. 2018;12(6):63:1-63:38. https://doi.org/10.1145/3230967

34. Zhang X., Zhang X., Liu H. Smart Multitask Bregman Clustering and Multitask Kernel Clustering. ACM Transactions on Knowledge Discovery from Data. 2015;10(1):8:1-8:29. https://doi.org/10.1145/2747879

35. Abasi A., Sajedi H. Fuzzy-clustering based data gathering in wireless sensor network. International Journal on Soft Computing (IJSC). 2016:7(1):1-15. https://doi.org/10.5121/ijsc.2016.7101

36. Gorbatkov S.А., Rashitova O.B. Modeling of tax administrative decisions on the basis of Kohonen’s neural networks. Informatsionnye tekhnologii = Information Technology. 2013;5:60-65 (in Russ.).


Supplementary files

1. Fig. 4. Operation of the algorithm on transient modes (strong clustering)
Subject
Type Исследовательские инструменты
View (2MB)    
Indexing metadata ▾

Review

For citations:


Anfyorov M.A. Genetic clustering algorithm. Russian Technological Journal. 2019;7(6):134-150. (In Russ.) https://doi.org/10.32362/2500-316X-2019-7-6-134-150

Views: 1745


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2782-3210 (Print)
ISSN 2500-316X (Online)