Method for designing specialized computing systems based on hardware and software co-optimization

I. Е. Tarasov; P. N. Sovietov; D. V. Lulyava; D. I. Mirzoyan

doi:10.32362/2500-316X-2024-12-3-37-45

Method for designing specialized computing systems based on hardware and software co-optimization

I. Е. Tarasov, P. N. Sovietov, D. V. Lulyava, D. I. Mirzoyan

https://doi.org/10.32362/2500-316X-2024-12-3-37-45

EDN: PXKDKR

Full Text:

PDF (Rus) PDF (Eng)

Generate QR code

Abstract

Objectives. Following the completion of development stages due to transistor scaling (Dennard’s law) and an increased number of general-purpose processor cores (limited by Amdahl’s law), further improvements in the performance of computing systems naturally proceeds to the stage of developing specialized computing subsystems for performing specific tasks within a limited computational subclass. The development of such systems requires both the selection of the relevant high-demand tasks and the application of design techniques for achieving desired indicators within the developed specializations at very large scales of integration. The purpose of the present work is to develop a methodology for designing specialized computing systems based on the joint optimization of hardware and software in relation to a selected subclass of problems.

Methods. The research is based on various methods for designing digital systems.

Results. Approaches to the analysis of computational problems involving the construction of a computational graph abstracted from the computing platform, but limited by a set of architectural solutions, are considered. The proposed design methodology based on a register transfer level (RTL) representation synthesizer of a computing device is limited to individual computing architectures for which the relevant circuit is synthesized and optimized based on a high-level input description of the algorithm. Among computing node architectures, a synchronous pipeline and a processor core with a tree-like arithmetic-logical unit are considered. The efficiency of a computing system can be increased by balancing the pipeline based on estimates of the technological basis, and for the processor—based on optimizing the set of operations, which is performed based on the analysis of the abstract syntax tree graph with its optimal coverage by subgraphs corresponding to the structure of the arithmetic logic unit.

Conclusions. The considered development approaches are suitable for accelerating the process of designing specialized computing systems with a massively parallel architecture based on pipeline or processor computing nodes.

Keywords

processor, RTL, synthesis, translator

About the Authors

I. Е. Tarasov

MIREA – Russian Technological University
Russian Federation

Ilya E. Tarasov, Dr. Sci. (Eng.), Associated Professor, Head of the Laboratory of Specialized Computing Systems

78, Vernadskogo pr., Moscow, 119454

Scopus Author ID 57213354150, RSCI SPIN-code 4628-7514

Competing Interests:

The authors declare no conflicts of interest.

P. N. Sovietov

MIREA – Russian Technological University
Russian Federation

Peter N. Sovietov, Cand. Sci. (Eng.), Senior Researcher, Laboratory of Specialized Computing Systems

78, Vernadskogo pr., Moscow, 119454

Scopus Author ID 57221375427

Competing Interests:

The authors declare no conflicts of interest.

D. V. Lulyava

MIREA – Russian Technological University
Russian Federation

Daniil V. Lulyava, Junior Researcher, Laboratory of Specialized Computing Systems

78, Vernadskogo pr., Moscow, 119454

Scopus Author ID 58811698000

Competing Interests:

The authors declare no conflicts of interest.

D. I. Mirzoyan

MIREA – Russian Technological University
Russian Federation

Dmitry I. Mirzoyan, Senior Researcher, Laboratory of Specialized Computing Systems

78, Vernadskogo pr., Moscow, 119454

Scopus Author ID 57432027000, ResearcherID JJE-7844-2023

Competing Interests:

The authors declare no conflicts of interest.

References

1. Hennessy J.L., Patterson D.A. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development. In: Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE; 2018. https://doi.org/10.1109/ISCA.2018.00011

2. Hennessy J.L, Patterson D.A. Computer Architecture: A Quantitative Approach. 6th ed. The Morgan Kaufmann Series in Computer Architecture and Design. Morgan Kaufmann; 2017. 936 p.

3. Sesin I.Yu., Bolbakov R.G. Comparative analysis of software optimization methods in context of branch predication on GPUs. Russ. Technol. J. 2021;9(6):7–15 (in Russ.). https://doi.org/10.32362/2500-316X-2021-9-6-7-15

4. Sleptsov V.V., Afonin V.L., Ablaeva A.E., Dinh B. Development of an information measuring and control system for a quadrocopter. Russ. Technol. J. 2021;9(6):26–36 (in Russ.). https://doi.org/10.32362/2500-316X-2021-9-6-26-36

5. Smirnov A.V. Optimization of digital filters performances simultaneously in frequency and time domains. Russ. Technol. J. 2020;8(6):63–77 (in Russ.). https://doi.org/10.32362/2500-316X-2020-8-6-63-77

6. Umnyashkin S.V. Osnovy teorii tsifrovoi obrabotki signalov (Fundamentals of the Theory of Digital Signal Processing). 3rd ed. Moscow: Litres; 2022. 551 p. (in Russ.). ISBN 978-5-4576-1810-7

7. Abadi M., Barham P., Chen J., et al. TensorFlow: A system for Large-Scale Machine Learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16). USENIX Association; 2016. P. 265–283.

8. Nurvitadhi E., Sheffield D., Sim J., et al. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC. In: 2016 International Conference on Field-Programmable Technology (FPT). IEEE; 2016. P. 77–84. https://doi.org/10.1109/FPT.2016.7929192

9. Sovetov P.N. Synthesis of linear programs for a stack machine. Vysokoproizvoditel’nye vychislitel’nye sistemy i tekhnologii = High-Performance Computing Systems and Technologies. 2019;3(1):17–22 (in Russ.).

10. Aho A.V., Lam M.S., Sethi R., Ullman J.D. Kompilyatory: printsipy, tekhnologii i instrumentarii (Compilers: Principles, Techniques, & Tools): transl. from Engl. Moscow: Vil’yams; 2018. 1184 p. ISBN 978-5-8459-1932-8 (in Russ.). [Aho A.V., Lam M.S., Sethi R., Ullman J.D. Compilers: Principles, Techniques, & Tools. Pearson Addison Wesley; 2007. 1035 p.]

11. Pratt T.W., Zelkowitz M.V. Yazyki programmirovaniya: razrabotka i realizatsiya (Programming Languages. Design and Implementation): transl. from Engl. St. Petersburg: Piter; 2002. 688 p. (in Russ.). [Pratt T.W., Zelkowitz M.V. Programming Languages. Design and Implementation. Prentice Hall; 2001. 649 p.]

12. Tarasov I.E., Potekhin D.S., Khrenov M.A., Sovetov P.N. Computer-aided design of multicore system for embedded applications. Ekonomika i Menedzhment Sistem Upravleniya. 2017;25(3–1):179–185 (in Russ.).

13. Huang S., Wu K., Jeong H., Wang C., Chen D., Hwu W.M. PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow. IEEE Trans. Comput. 2021;70(12):2015–2028. https://doi.org/10.1109/TC.2021.3123465

14. Jiang S., Pan P., Ou Y., Batten C. PyMTL3: A Python Framework for Open-Source Hardware Modeling, Generation, Simulation, and Verification. IEEE Micro. 2020;40(4):58–66. https://doi.org/10.1109/MM.2020.2997638

15. Oishi R., Kadomoto J., Irie H., Sakai S. FPGA-based Garbling Accelerator with Parallel Pipeline Processing. IEICE Transactions on Information and Systems. 2023;E106-D(12):1988–1996. https://doi.org/10.1587/transinf.2023PAP0002

Supplementary files

	1. Diagrams of the main computing nodes (architectural templates)
	Subject
	Type	Исследовательские инструменты
	View (161KB)	Indexing metadata ▾

Approaches to the analysis of computational problems involving the construction of a computational graph abstracted from the computing platform, but limited by a set of architectural solutions, are considered.
The proposed design methodology based on a register transfer level (RTL) representation synthesizer of a computing device is limited to individual computing architectures for which the relevant circuit is synthesized and optimized based on a high-level input description of the algorithm.
Among computing node architectures, a synchronous pipeline and a processor core with a tree-like arithmetic-logical unit are considered.
The efficiency of a computing system can be increased by balancing the pipeline based on estimates of the technological basis, and for the processor—based on optimizing the set of operations, which is performed based on the analysis of the abstract syntax tree graph with its optimal coverage by subgraphs corresponding to the structure of the arithmetic logic unit.

Review

For citations:

Tarasov I.Е., Sovietov P.N., Lulyava D.V., Mirzoyan D.I. Method for designing specialized computing systems based on hardware and software co-optimization. Russian Technological Journal. 2024;12(3):37−45. https://doi.org/10.32362/2500-316X-2024-12-3-37-45. EDN: PXKDKR

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2782-3210 (Print)
ISSN 2500-316X (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

	Title	Diagrams of the main computing nodes (architectural templates)
	Type	Исследовательские инструменты
	Date	2024-06-20

User

Russian Technological Journal

Method for designing specialized computing systems based on hardware and software co-optimization

Full Text:

Abstract

Keywords

About the Authors

References

Supplementary files

Review

For citations:

Cookies policy