Heterogeneous computing systems with hardware acceleration of massively parallel stream processing design
https://doi.org/10.32362/2500-316X-2026-14-2-29-41
EDN: XHLRAX
Abstract
Objectives. The growing demand for higher computational performance and energy efficiency has motivated the increasing adoption of specialized heterogeneous computing systems incorporating hardware accelerators with massive parallelism. This paper aims to develop a methodology for the analysis and evaluation of hardware accelerator implementation strategies for large-scale parallel stream data processing which systematically captures all major directions of performance improvement.
Methods. The study employs established techniques of digital system design and modeling.
Results. A comparative evaluation method is introduced to assess the efficiency of heterogeneous computing architectures based on massively parallel hardware accelerators composed of independently programmable nodes. A computational acceleration ratio is defined which consolidates three key dimensions of accelerator improvement: algorithmic support and microarchitecture; design automation tools; and fabrication technologies (lithography). Furthermore, the study proposes an optimization-based methodology for the systematic analysis and evaluation of the alternatives for hardware accelerator implementation.
Conclusions. The expressions derived herein for calculating the computational acceleration ratio and the aggregate throughput of hardware accelerators account for both multichannel and block-based massively parallel data stream processing. In contrast to conventional architectural exploration approaches, the evaluation method proposed herein enables hardware accelerator design alternatives to be assessed at the earliest stages of the design cycle. This incorporates variations in algorithmic versions and implementation strategies which influence hardware architecture optimization. The proposed methodology for analyzing and evaluating implementation options for hardware accelerators can be used to develop technical specifications for their manufacture, design them according to specified requirements, and justify configuration decisions. It can also support research and development assignments to achieve target characteristics for certain domain-specific tasks of massively parallel stream data processing and CAD capabilities.
About the Authors
A. S. ZuevRussian Federation
Andrey S. Zuev, Cand. Sci. (Eng.), Associate Professor, Head of the Department of Quantum Information Technologies, Practical and Applied Informatics, Institute of Information Technologies
Competing Interests:
The authors declare no conflicts of interest.
P. N. Sovietov
Russian Federation
Peter N. Sovietov, Cand. Sci. (Eng.), Associated Professor, Department of Corporate Information Systems, Institute of Information Technologies
Competing Interests:
The authors declare no conflicts of interest.
I. E. Tarasov
Russian Federation
Ilya E. Tarasov, Dr. Sci. (Eng.), Associated Professor, Professor, Department of Corporate Information Systems, Institute of Information Technologies
Competing Interests:
The authors declare no conflicts of interest.
References
1. Dennard R.H., Gaensslen F.H., Yu H.-N., Rideout V.L., Bassous E., LeBlanc A.R. Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE Journal of Solid-State Circuits. 1974;9(5):256–268. https://doi.org/10.1109/JSSC.1974.1050511
2. Jain P.U., Tomar V.K. FinFET Technology: As A Promising Alternatives for Conventional MOSFET Technology. In: 2020 International Conference on Emerging Smart Computing and Informatics (ESCI). 2020. P. 43–47. https://doi.org/10.1109/ESCI48226.2020.9167646
3. Yakimets D., Eneman G., Schuddinck P., et al. Vertical GAAFETs for the Ultimate CMOS Scaling. IEEE Transactions on Electron Devices. 2015;62(5):1433–1439. https://doi.org/10.1109/TED.2015.2414924
4. Lee S.-Y., Kim S.-M., Yoon E.-J., et al. A Novel Multibridge-Channel MOSFET (MBCFET): Fabrication Technologies and Characteristics. IEEE Transactions on Nanotechnology. 2003;2(4):253–257. https://doi.org/10.1109/TNANO.2003.820777
5. Hennessy J.L., Patterson D.A. Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design). 6th ed. 2017, 936 p.
6. Annaratone M. MPPs, Amdahl’s Law, and Comparing Computers. In: Proceedings of The Fourth Symposium on the Frontiers of Massively Parallel Computation. 1992. P. 465–470. https://doi.org/10.1109/FMPC.1992.234879
7. Verhelst M., Benini L., Verma N. How to keep pushing ML accelerator performance? Know your rooflines! IEEE Journal of Solid-State Circuits. 2025;6(60):1888–1905. https://doi.org/10.1109/JSSC.2025.3553765
8. Altaf M.S.B., Wood D.A. LogCA: A high-level performance model for hardware accelerators. ACM SIGARCH Computer Architecture News. 2017;45(2):375–388. https://doi.org/10.1145/3079856.3080216
9. Molina R.S., Gil-Costa V., Crespo M.L., et al. High-level synthesis hardware design for FPGA-based accelerators: Models, methodologies, and frameworks. IEEE Access. 2022;10:90429–90455. https://doi.org/10.1109/ACCESS.2022.3201107
10. A New Golden Age for Computer Architecture: Domain-Specific Hardware/Software Co-Design, Enhanced Security, Open Instruction Sets, and Agile Chip Development. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 2018. P. 27–29. https://doi.org/10.1109/ISCA.2018.00011
11. Liu K., Lu A., Fang Z. BitBlender: Scalable Bloom Filter Acceleration on FPGAs with Dynamic Scheduling. In: 34th International Conference on Field-Programmable Logic and Applications (FPL). 2024. P. 325–331. https://doi.org/10.1109/FPL64840.2024.00052
12. Kulkarni A., Chiosa M., Preuber T.B., et al. HyperLogLog Sketch Acceleration on FPGA. In: 30th International Conference on Field-Programmable Logic and Applications (FPL). 2020. P. 47–56. https://doi.org/10.1109/FPL50879.2020.00019
13. Marchisio A., Teodonio F., Rizzi A., et al. ISMatch: A Real-Time Hardware Accelerator for Inexact String Matching of DNA Sequences on FPGA. Microprocess. Microsyst. 2023;97:104763. https://doi.org/10.1016/j.micpro.2023.104763
14. Zhang C., Tang X., Peng Y. Enhancing Regular Expression Processing through Field-Programmable Gate Array-Based Multi Character Non-Deterministic Finite Automata. Electronics. 2024;13(9):1635. https://doi.org/10.3390/electronics13091635
15. Dann J., Wagner R., Ritter D., et al. PipeJSON: Parsing JSON at Line Speed on FPGAs. In: Proceedings of the 18th International Workshop on Data Management on New Hardware. 2022;Article 3:1–7. https://doi.org/10.1145/3533737.3535094
16. Karandikar S., Udipi A.N., Choi J., et al. CDPU: Co-Designing Compression and Decompression Processing Units for Hyperscale Systems. In: Proceedings of the 50th Annual International Symposium on Computer Architecture. 2023;Article 39:1–17. https://doi.org/10.1145/3579371.3589074
17. Hahn T., Wildermann S., Teich J. JSON-CooP: A JSON Decompression/Parsing Co-Design for FPGAs. In: 34th International Conference on Field-Programmable Logic and Applications (FPL). 2024. P. 11–18. https://doi.org/10.1109/FPL64840.2024.00012
18. Fang J., Mulder Y., Hidders J., et al. In-Memory Database Acceleration on FPGAs: A Survey. The VLDB Journal. 2020;29(10):33–59. https://doi.org/10.1007/s00778-019-00581-w
19. Dann J., Götz T., Ritter D., et al. GraphMatch: Subgraph Query Processing on FPGAs. arXiv. arXiv:2402.17559. 2024.
20. Kejariwal A., Kulkarni S., Ramasamy K. Real Time Analytics: Algorithms and Systems. arXiv. arXiv:1708.02621. 2017. https://doi.org/10.48550/arXiv.1708.02621
21. Alcolea A., Resano J. FPGA Accelerator for Gradient Boosting Decision Trees. Electronics. 2021;10(3):314. https://doi.org/10.3390/electronics10030314
22. Graf J.R., Perera D.G. Optimizing Density-Based Ant Colony Stream Clustering Using FPGA-Based Hardware Accelerator. In: 2023 IEEE International Symposium on Circuits and Systems (ISCAS). 2023. https://doi.org/10.1109/ISCAS46773.2023.10181665
23. Shen J.P., Lipasti M.H. Modern Processor Design: Fundamentals of Superscalar Processors. Waveland Press; 2013, 658 p.
24. Tarasov I.Е., Sovietov P.N., Lulyava D.V., Mirzoyan D.I. Method for designing specialized computing systems based on hardware and software co-optimization. Russian Technological Journal. 2024;12(3):37−45. https://doi.org/10.32362/2500-316X-2024-12-3-37-45
Supplementary files
|
|
1. Diagram showing the ratio of the complexity of iterations and their number, with the preferred computing device architectures | |
| Subject | ||
| Type | Исследовательские инструменты | |
View
(42KB)
|
Indexing metadata ▾ | |
- A comparative evaluation method is introduced to assess the efficiency of heterogeneous computing architectures based on massively parallel hardware accelerators composed of independently programmable nodes.
- A computational acceleration ratio is defined which consolidates three key dimensions of accelerator improvement.
- The study proposes an optimization-based methodology for the systematic analysis and evaluation of the alternatives for hardware accelerator implementation.
Review
For citations:
Zuev A.S., Sovietov P.N., Tarasov I.E. Heterogeneous computing systems with hardware acceleration of massively parallel stream processing design. Russian Technological Journal. 2026;14(2):29-41. https://doi.org/10.32362/2500-316X-2026-14-2-29-41. EDN: XHLRAX
JATS XML


























