Method for designing specialized computing systems based on hardware and software co-optimization
https://doi.org/10.32362/2500-316X-2024-12-3-37-45
EDN: PXKDKR
Abstract
Objectives. Following the completion of development stages due to transistor scaling (Dennard’s law) and an increased number of general-purpose processor cores (limited by Amdahl’s law), further improvements in the performance of computing systems naturally proceeds to the stage of developing specialized computing subsystems for performing specific tasks within a limited computational subclass. The development of such systems requires both the selection of the relevant high-demand tasks and the application of design techniques for achieving desired indicators within the developed specializations at very large scales of integration. The purpose of the present work is to develop a methodology for designing specialized computing systems based on the joint optimization of hardware and software in relation to a selected subclass of problems.
Methods. The research is based on various methods for designing digital systems.
Results. Approaches to the analysis of computational problems involving the construction of a computational graph abstracted from the computing platform, but limited by a set of architectural solutions, are considered. The proposed design methodology based on a register transfer level (RTL) representation synthesizer of a computing device is limited to individual computing architectures for which the relevant circuit is synthesized and optimized based on a high-level input description of the algorithm. Among computing node architectures, a synchronous pipeline and a processor core with a tree-like arithmetic-logical unit are considered. The efficiency of a computing system can be increased by balancing the pipeline based on estimates of the technological basis, and for the processor—based on optimizing the set of operations, which is performed based on the analysis of the abstract syntax tree graph with its optimal coverage by subgraphs corresponding to the structure of the arithmetic logic unit.
Conclusions. The considered development approaches are suitable for accelerating the process of designing specialized computing systems with a massively parallel architecture based on pipeline or processor computing nodes.
About the Authors
I. Е. TarasovRussian Federation
Ilya E. Tarasov, Dr. Sci. (Eng.), Associated Professor, Head of the Laboratory of Specialized Computing Systems
78, Vernadskogo pr., Moscow, 119454
Scopus Author ID 57213354150, RSCI SPIN-code 4628-7514
Competing Interests:
The authors declare no conflicts of interest.
P. N. Sovietov
Russian Federation
Peter N. Sovietov, Cand. Sci. (Eng.), Senior Researcher, Laboratory of Specialized Computing Systems
78, Vernadskogo pr., Moscow, 119454
Scopus Author ID 57221375427
Competing Interests:
The authors declare no conflicts of interest.
D. V. Lulyava
Russian Federation
Daniil V. Lulyava, Junior Researcher, Laboratory of Specialized Computing Systems
78, Vernadskogo pr., Moscow, 119454
Scopus Author ID 58811698000
Competing Interests:
The authors declare no conflicts of interest.
D. I. Mirzoyan
Russian Federation
Dmitry I. Mirzoyan, Senior Researcher, Laboratory of Specialized Computing Systems
78, Vernadskogo pr., Moscow, 119454
Scopus Author ID 57432027000, ResearcherID JJE-7844-2023
Competing Interests:
The authors declare no conflicts of interest.
References
1. Hennessy J.L., Patterson D.A. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development. In: Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE; 2018. https://doi.org/10.1109/ISCA.2018.00011
2. Hennessy J.L, Patterson D.A. Computer Architecture: A Quantitative Approach. 6th ed. The Morgan Kaufmann Series in Computer Architecture and Design. Morgan Kaufmann; 2017. 936 p.
3. Sesin I.Yu., Bolbakov R.G. Comparative analysis of software optimization methods in context of branch predication on GPUs. Russ. Technol. J. 2021;9(6):7–15 (in Russ.). https://doi.org/10.32362/2500-316X-2021-9-6-7-15
4. Sleptsov V.V., Afonin V.L., Ablaeva A.E., Dinh B. Development of an information measuring and control system for a quadrocopter. Russ. Technol. J. 2021;9(6):26–36 (in Russ.). https://doi.org/10.32362/2500-316X-2021-9-6-26-36
5. Smirnov A.V. Optimization of digital filters performances simultaneously in frequency and time domains. Russ. Technol. J. 2020;8(6):63–77 (in Russ.). https://doi.org/10.32362/2500-316X-2020-8-6-63-77
6. Umnyashkin S.V. Osnovy teorii tsifrovoi obrabotki signalov (Fundamentals of the Theory of Digital Signal Processing). 3rd ed. Moscow: Litres; 2022. 551 p. (in Russ.). ISBN 978-5-4576-1810-7
7. Abadi M., Barham P., Chen J., et al. TensorFlow: A system for Large-Scale Machine Learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16). USENIX Association; 2016. P. 265–283.
8. Nurvitadhi E., Sheffield D., Sim J., et al. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC. In: 2016 International Conference on Field-Programmable Technology (FPT). IEEE; 2016. P. 77–84. https://doi.org/10.1109/FPT.2016.7929192
9. Sovetov P.N. Synthesis of linear programs for a stack machine. Vysokoproizvoditel’nye vychislitel’nye sistemy i tekhnologii = High-Performance Computing Systems and Technologies. 2019;3(1):17–22 (in Russ.).
10. Aho A.V., Lam M.S., Sethi R., Ullman J.D. Kompilyatory: printsipy, tekhnologii i instrumentarii (Compilers: Principles, Techniques, & Tools): transl. from Engl. Moscow: Vil’yams; 2018. 1184 p. ISBN 978-5-8459-1932-8 (in Russ.). [Aho A.V., Lam M.S., Sethi R., Ullman J.D. Compilers: Principles, Techniques, & Tools. Pearson Addison Wesley; 2007. 1035 p.]
11. Pratt T.W., Zelkowitz M.V. Yazyki programmirovaniya: razrabotka i realizatsiya (Programming Languages. Design and Implementation): transl. from Engl. St. Petersburg: Piter; 2002. 688 p. (in Russ.). [Pratt T.W., Zelkowitz M.V. Programming Languages. Design and Implementation. Prentice Hall; 2001. 649 p.]
12. Tarasov I.E., Potekhin D.S., Khrenov M.A., Sovetov P.N. Computer-aided design of multicore system for embedded applications. Ekonomika i Menedzhment Sistem Upravleniya. 2017;25(3–1):179–185 (in Russ.).
13. Huang S., Wu K., Jeong H., Wang C., Chen D., Hwu W.M. PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow. IEEE Trans. Comput. 2021;70(12):2015–2028. https://doi.org/10.1109/TC.2021.3123465
14. Jiang S., Pan P., Ou Y., Batten C. PyMTL3: A Python Framework for Open-Source Hardware Modeling, Generation, Simulation, and Verification. IEEE Micro. 2020;40(4):58–66. https://doi.org/10.1109/MM.2020.2997638
15. Oishi R., Kadomoto J., Irie H., Sakai S. FPGA-based Garbling Accelerator with Parallel Pipeline Processing. IEICE Transactions on Information and Systems. 2023;E106-D(12):1988–1996. https://doi.org/10.1587/transinf.2023PAP0002
Supplementary files
|
1. Diagrams of the main computing nodes (architectural templates) | |
Subject | ||
Type | Исследовательские инструменты | |
View
(161KB)
|
Indexing metadata ▾ |
- Approaches to the analysis of computational problems involving the construction of a computational graph abstracted from the computing platform, but limited by a set of architectural solutions, are considered.
- The proposed design methodology based on a register transfer level (RTL) representation synthesizer of a computing device is limited to individual computing architectures for which the relevant circuit is synthesized and optimized based on a high-level input description of the algorithm.
- Among computing node architectures, a synchronous pipeline and a processor core with a tree-like arithmetic-logical unit are considered.
- The efficiency of a computing system can be increased by balancing the pipeline based on estimates of the technological basis, and for the processor—based on optimizing the set of operations, which is performed based on the analysis of the abstract syntax tree graph with its optimal coverage by subgraphs corresponding to the structure of the arithmetic logic unit.
Review
For citations:
Tarasov I.Е., Sovietov P.N., Lulyava D.V., Mirzoyan D.I. Method for designing specialized computing systems based on hardware and software co-optimization. Russian Technological Journal. 2024;12(3):37−45. https://doi.org/10.32362/2500-316X-2024-12-3-37-45. EDN: PXKDKR