Heterogeneous computing systems with hardware acceleration of massively parallel stream processing design

A. S. Zuev; P. N. Sovietov; I. E. Tarasov

doi:10.32362/2500-316X-2026-14-2-29-41

Heterogeneous computing systems with hardware acceleration of massively parallel stream processing design

A. S. Zuev, P. N. Sovietov, I. E. Tarasov

https://doi.org/10.32362/2500-316X-2026-14-2-29-41

EDN: XHLRAX

Full Text:

PDF (Rus) PDF (Eng) Suppl.

Generate QR code

Abstract

Objectives. The growing demand for higher computational performance and energy efficiency has motivated the increasing adoption of specialized heterogeneous computing systems incorporating hardware accelerators with massive parallelism. This paper aims to develop a methodology for the analysis and evaluation of hardware accelerator implementation strategies for large-scale parallel stream data processing which systematically captures all major directions of performance improvement.

Methods. The study employs established techniques of digital system design and modeling.

Results. A comparative evaluation method is introduced to assess the efficiency of heterogeneous computing architectures based on massively parallel hardware accelerators composed of independently programmable nodes. A computational acceleration ratio is defined which consolidates three key dimensions of accelerator improvement: algorithmic support and microarchitecture; design automation tools; and fabrication technologies (lithography). Furthermore, the study proposes an optimization-based methodology for the systematic analysis and evaluation of the alternatives for hardware accelerator implementation.

Conclusions. The expressions derived herein for calculating the computational acceleration ratio and the aggregate throughput of hardware accelerators account for both multichannel and block-based massively parallel data stream processing. In contrast to conventional architectural exploration approaches, the evaluation method proposed herein enables hardware accelerator design alternatives to be assessed at the earliest stages of the design cycle. This incorporates variations in algorithmic versions and implementation strategies which influence hardware architecture optimization. The proposed methodology for analyzing and evaluating implementation options for hardware accelerators can be used to develop technical specifications for their manufacture, design them according to specified requirements, and justify configuration decisions. It can also support research and development assignments to achieve target characteristics for certain domain-specific tasks of massively parallel stream data processing and CAD capabilities.

Keywords

processor, hardware accelerator, coprocessor, special-purpose processor, architecture, compiler

About the Authors

A. S. Zuev

MIREA – Russian Technological University
Russian Federation

Andrey S. Zuev, Cand. Sci. (Eng.), Associate Professor, Head of the Department of Quantum Information Technologies, Practical and Applied Informatics, Institute of Information Technologies

Competing Interests:

The authors declare no conflicts of interest.

P. N. Sovietov

MIREA – Russian Technological University
Russian Federation

Peter N. Sovietov, Cand. Sci. (Eng.), Associated Professor, Department of Corporate Information Systems, Institute of Information Technologies

Competing Interests:

The authors declare no conflicts of interest.

I. E. Tarasov

MIREA – Russian Technological University
Russian Federation

Ilya E. Tarasov, Dr. Sci. (Eng.), Associated Professor, Professor, Department of Corporate Information Systems, Institute of Information Technologies

Competing Interests:

The authors declare no conflicts of interest.

References

1. Dennard R.H., Gaensslen F.H., Yu H.-N., Rideout V.L., Bassous E., LeBlanc A.R. Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE Journal of Solid-State Circuits. 1974;9(5):256–268. https://doi.org/10.1109/JSSC.1974.1050511

2. Jain P.U., Tomar V.K. FinFET Technology: As A Promising Alternatives for Conventional MOSFET Technology. In: 2020 International Conference on Emerging Smart Computing and Informatics (ESCI). 2020. P. 43–47. https://doi.org/10.1109/ESCI48226.2020.9167646

3. Yakimets D., Eneman G., Schuddinck P., et al. Vertical GAAFETs for the Ultimate CMOS Scaling. IEEE Transactions on Electron Devices. 2015;62(5):1433–1439. https://doi.org/10.1109/TED.2015.2414924

4. Lee S.-Y., Kim S.-M., Yoon E.-J., et al. A Novel Multibridge-Channel MOSFET (MBCFET): Fabrication Technologies and Characteristics. IEEE Transactions on Nanotechnology. 2003;2(4):253–257. https://doi.org/10.1109/TNANO.2003.820777

5. Hennessy J.L., Patterson D.A. Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design). 6th ed. 2017, 936 p.

6. Annaratone M. MPPs, Amdahl’s Law, and Comparing Computers. In: Proceedings of The Fourth Symposium on the Frontiers of Massively Parallel Computation. 1992. P. 465–470. https://doi.org/10.1109/FMPC.1992.234879

7. Verhelst M., Benini L., Verma N. How to keep pushing ML accelerator performance? Know your rooflines! IEEE Journal of Solid-State Circuits. 2025;6(60):1888–1905. https://doi.org/10.1109/JSSC.2025.3553765

8. Altaf M.S.B., Wood D.A. LogCA: A high-level performance model for hardware accelerators. ACM SIGARCH Computer Architecture News. 2017;45(2):375–388. https://doi.org/10.1145/3079856.3080216

9. Molina R.S., Gil-Costa V., Crespo M.L., et al. High-level synthesis hardware design for FPGA-based accelerators: Models, methodologies, and frameworks. IEEE Access. 2022;10:90429–90455. https://doi.org/10.1109/ACCESS.2022.3201107

10. A New Golden Age for Computer Architecture: Domain-Specific Hardware/Software Co-Design, Enhanced Security, Open Instruction Sets, and Agile Chip Development. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 2018. P. 27–29. https://doi.org/10.1109/ISCA.2018.00011

11. Liu K., Lu A., Fang Z. BitBlender: Scalable Bloom Filter Acceleration on FPGAs with Dynamic Scheduling. In: 34th International Conference on Field-Programmable Logic and Applications (FPL). 2024. P. 325–331. https://doi.org/10.1109/FPL64840.2024.00052

12. Kulkarni A., Chiosa M., Preuber T.B., et al. HyperLogLog Sketch Acceleration on FPGA. In: 30th International Conference on Field-Programmable Logic and Applications (FPL). 2020. P. 47–56. https://doi.org/10.1109/FPL50879.2020.00019

13. Marchisio A., Teodonio F., Rizzi A., et al. ISMatch: A Real-Time Hardware Accelerator for Inexact String Matching of DNA Sequences on FPGA. Microprocess. Microsyst. 2023;97:104763. https://doi.org/10.1016/j.micpro.2023.104763

14. Zhang C., Tang X., Peng Y. Enhancing Regular Expression Processing through Field-Programmable Gate Array-Based Multi Character Non-Deterministic Finite Automata. Electronics. 2024;13(9):1635. https://doi.org/10.3390/electronics13091635

15. Dann J., Wagner R., Ritter D., et al. PipeJSON: Parsing JSON at Line Speed on FPGAs. In: Proceedings of the 18th International Workshop on Data Management on New Hardware. 2022;Article 3:1–7. https://doi.org/10.1145/3533737.3535094

16. Karandikar S., Udipi A.N., Choi J., et al. CDPU: Co-Designing Compression and Decompression Processing Units for Hyperscale Systems. In: Proceedings of the 50th Annual International Symposium on Computer Architecture. 2023;Article 39:1–17. https://doi.org/10.1145/3579371.3589074

17. Hahn T., Wildermann S., Teich J. JSON-CooP: A JSON Decompression/Parsing Co-Design for FPGAs. In: 34th International Conference on Field-Programmable Logic and Applications (FPL). 2024. P. 11–18. https://doi.org/10.1109/FPL64840.2024.00012

18. Fang J., Mulder Y., Hidders J., et al. In-Memory Database Acceleration on FPGAs: A Survey. The VLDB Journal. 2020;29(10):33–59. https://doi.org/10.1007/s00778-019-00581-w

19. Dann J., Götz T., Ritter D., et al. GraphMatch: Subgraph Query Processing on FPGAs. arXiv. arXiv:2402.17559. 2024.

20. Kejariwal A., Kulkarni S., Ramasamy K. Real Time Analytics: Algorithms and Systems. arXiv. arXiv:1708.02621. 2017. https://doi.org/10.48550/arXiv.1708.02621

21. Alcolea A., Resano J. FPGA Accelerator for Gradient Boosting Decision Trees. Electronics. 2021;10(3):314. https://doi.org/10.3390/electronics10030314

22. Graf J.R., Perera D.G. Optimizing Density-Based Ant Colony Stream Clustering Using FPGA-Based Hardware Accelerator. In: 2023 IEEE International Symposium on Circuits and Systems (ISCAS). 2023. https://doi.org/10.1109/ISCAS46773.2023.10181665

23. Shen J.P., Lipasti M.H. Modern Processor Design: Fundamentals of Superscalar Processors. Waveland Press; 2013, 658 p.

24. Tarasov I.Е., Sovietov P.N., Lulyava D.V., Mirzoyan D.I. Method for designing specialized computing systems based on hardware and software co-optimization. Russian Technological Journal. 2024;12(3):37−45. https://doi.org/10.32362/2500-316X-2024-12-3-37-45

Supplementary files

	1. Diagram showing the ratio of the complexity of iterations and their number, with the preferred computing device architectures
	Subject
	Type	Исследовательские инструменты
	View (42KB)	Indexing metadata ▾

A comparative evaluation method is introduced to assess the efficiency of heterogeneous computing architectures based on massively parallel hardware accelerators composed of independently programmable nodes.
A computational acceleration ratio is defined which consolidates three key dimensions of accelerator improvement.
The study proposes an optimization-based methodology for the systematic analysis and evaluation of the alternatives for hardware accelerator implementation.

Review

For citations:

Zuev A.S., Sovietov P.N., Tarasov I.E. Heterogeneous computing systems with hardware acceleration of massively parallel stream processing design. Russian Technological Journal. 2026;14(2):29-41. https://doi.org/10.32362/2500-316X-2026-14-2-29-41. EDN: XHLRAX

JATS XML

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2782-3210 (Print)
ISSN 2500-316X (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

	Title	Diagram showing the ratio of the complexity of iterations and their number, with the preferred computing device architectures
	Type	Исследовательские инструменты
	Date	2026-04-13

User

Russian Technological Journal

Heterogeneous computing systems with hardware acceleration of massively parallel stream processing design

Full Text:

Abstract

Keywords

About the Authors

References

Supplementary files

Review

For citations:

Cookies policy