Preview

Russian Technological Journal

Advanced search

Topic modeling in the stream of short messages in Russian

https://doi.org/10.32362/2500-316X-2025-13-1-38-48

EDN: HJHQTR

Abstract

Objectives. This work is devoted to the topic modeling of short messages received through social networks or in another way in the form of a series of short messages. This need arises in public relations systems in state and municipal structures, in public opinion polling centers, as well as in customer service systems and marketing departments. The aim of the work is to develop and experimentally test a set of algorithms for a thematic model for automatically determining the main topics of information exchange and typical messages illustrating these topics.

Methods. The work uses methods of variable statistical distributions applied to collocation statistics and approaches typical for resolving problems of topic modeling of short texts, but applied to successive messages. In this way, online machine learning and topic modeling are considered jointly.

Results. The work considered the construction of a thematic model in which clusters found with the presentation of their typical representatives and current weight can help decision-making in accordance with the subject of these most important messages. The proposed method was experimentally tested on a corpus of real messages. The results of topic modeling (the constructed thematic models) are consistent with the results obtained manually. The messages selected illustrate that the topics with the highest weight are seen as such from the point of view of human experts.

Conclusions. The proposed algorithm of topic modeling allows the most important topics in current communication to be automatically identified. It shows posts that serve as indicators of these topics, and thereby significantly simplifies the solution of the problem.

About the Author

Elena S. Mozaidze
V.G. Shukhov Belgorod State Technological University
Russian Federation

Elena S. Mozaidze, Postgraduate Student, Department of Computer Software and Automated Systems, 

46, Kostyukova ul., Belgorod, 308012 .


Competing Interests:

The author declares no conflicts of interest.



References

1. Brusentsev A.G., Zueva E.S. Thematic models and tools for processing the natural language in application to the problems of municipal structures. In: Actual Theoretical and Applied Issues of the Socio-Economic Systems Management: Proc. Second International Scientific and Practical Conference. Moscow; 2020. V. 2. P. 262–269 (in Russ.). https://elibrary.ru/fkgyxn

2. Zueva E.S. Probabilistic classification of incoming calls based on a controlled recurrent neurons algorithm. In: Proc. International Scientific and Technical Conference of Young Scientists of V.G. Shukhov BSTU. Belgorod: V.G. Shukhov BSTU; 2021. Р. 3564–3575 (in Russ.). https://www.elibrary.ru/nhlzpv

3. Polyakov V.M., Mozaidze E.S. Collaborative filtering algorithm as a possible tool for detecting a dangerous tweet (short message) in social networks of a representative office of the government of the Belgorod region. In: Modern Issues of Sustainable Development of Society in the Era of Transformation Processes: Collection of Materials of the 4th International Scientific and Practical conference. Moscow; 2022. Р. 136–148 (in Russ.). https://doi.org/10.34755/IROK.2022.14.90.027, https://www.elibrary.ru/mzrsgm

4. Papadimitriou C.H., Tamaki H., Raghavan P., Vempala S. Latent semantic indexing: A probabilistic analysis. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM; 1998. P. 159–168. https://doi.org/10.1145/275487.275505

5. Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM; 1999. P. 50–57. https://doi.org/10.1145/312624.312649

6. Blei D., McAuliffe J. Supervised topic models. In: Advances in Neural Information Processing Systems 20 (NIPS 2007). 2008. P. 121–128.

7. Blei D.M., Lafferty J.D. Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine learning (ICML ‘06). ACM; 2006. P. 113–120. https://doi.org/10.1145/1143844.1143859

8. Blei D.M. Probabilistic topic models. Communications of the ACM. 2012;55(4):77–84. https://doi.org/10.1145/2133806.2133826

9. Vorontsov K.V. Additive regularization for topic models of text collections. Dokl. Math. 2014;89(3):301–304. https://doi. org/10.1134/S1064562414020185 [Original Russian Text: Additive regularization for topic models of text collections. Doklady Akademii Nauk. 2014;456(3): 268–271 (in Russ.). https://doi.org/10.7868/S0869565214090096 ]

10. Vorontsov K.V., Potapenko A.A. EM-like algorithms for probabilistic topic modeling. Mashinnoe obuchenie i analiz dannykh = Machine Learning and Data Analysis. 20131(6):657–686 (in Russ).

11. Nokel M.A., Lukashevich N.V. Topic Models: Adding Bigrams and Taking Account of the Similarity between Unigramsand Bigrams. Vychislitel’nye metody i programmirovanie = Numerical Methods and Programming. 2015;16(2):215–234 (in Russ.). https://doi.org/10.26089/NumMet.v16r222

12. Korshunov A., Gomzin A. Topic modeling in natural language texts. Trudy Instituta sistemnogo programmirovaniya RAN (Trudy ISP RAN) = Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2012;23: 215–240 (in Russ.). https://doi.org/10.15514/ISPRAS-2012-23-13

13. Nakshatri N., Liu S., Chen S., Roth D., Goldwasser D., Hopkins D. Using LLM for Improving Key Event Discovery: Temporal-Guided News Stream Clustering with Event Summaries. Findings of the Association for Computational Linguistics: EMNLP. 2023:4162–4173. https://doi.org/10.18653/v1/2023.findings-emnlp.274

14. Rijcken E., Scheepers F., Zervanou K., Spruit M., Mosteiro P., Kaymak U. Towards Interpreting Topic Models with ChatGPT. 2023. Paper presented at The 20th World Congress of the International Fuzzy Systems Association, Daegu, Republic of Korea. 2023. V. 5. Available from URL: https://pure.tue.nl/ws/portalfiles/portal/300364784/IFSA_InterpretingTopicModelsWithChatGPT.pdf

15. Amigo E., Gonzalo J., Artiles J., Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval. 2009;12(4):461486.


Supplementary files

1. The most popular topics in references to the Mayor’s Office of the city of Belgorod
Subject
Type Исследовательские инструменты
View (24KB)    
Indexing metadata ▾
  • The work aims to develop and experimentally test a set of algorithms for a thematic model for automatically determining the main topics of information exchange and typical messages illustrating these topics.
  • A thematic model in which clusters found with the presentation of their typical representatives and current weight can help decision-making in accordance with the subject of these most important messages was developed.
  • The proposed algorithm of topic modeling allows the most important topics in current communication to be automatically identified. It shows posts that serve as indicators of these topics, and thereby significantly simplifies the solution of the problem.

Review

For citations:


Mozaidze E.S. Topic modeling in the stream of short messages in Russian. Russian Technological Journal. 2025;13(1):38-48. https://doi.org/10.32362/2500-316X-2025-13-1-38-48. EDN: HJHQTR

Views: 275


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2782-3210 (Print)
ISSN 2500-316X (Online)