Preview

Russian Technological Journal

Advanced search

Comparative analysis of software optimization methods in context of branch predication on GPUs

https://doi.org/10.32362/2500-316X-2021-9-6-7-15

Abstract

General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.

About the Authors

I. Yu. Sesin
MIREA – Russian Technological University
Russian Federation

Igor Yu. Sesin, Postgraduate Student, Department of the Tool and Applied Software, Institute of Information Technologies

78, Vernadskogo pr., Moscow, 119454 Russia



R. G. Bolbakov
MIREA – Russian Technological University
Russian Federation

Roman G. Bolbakov, Cand. Sci. (Eng.), Associate Professor, Head of the Department of the Tool and Applied Software, Institute of Information Technologies

78, Vernadskogo pr., Moscow, 119454 Russia



References

1. Markidis S., Chien S.W.D., Laure E., Peng I.B., Vetter J.S. NVIDIA Tensor Core Programmability, Performance & Precision. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Vancouver, BC, Canada; 2018, p. 522−531. https://doi.org/10.1109/IPDPSW.2018.00091

2. Sanzharov V.V., Gorbonosov A.I., Frolov V.A., Voloboy A.G. Examination of the Nvidia RTX. CEUR Workshop Proceedings. 2019;2485:7−12. http://dx.doi.org/10.30987/graphicon-2019-2-7-12

3. Flynn M.J. Very high speed computing systems. Proceedings of the IEEE. 1966;54(12):1901−1909. https://doi.org/10.1109/PROC.1966.5273

4. Fisher J.A., Faraboschi P., Young C. Embedded computing: A VLIW approach to architecture, compilers, and tools. Elsevier; 2004. ISBN: 978-1-55860-766-8. URL: https://www.researchgate.net/publication/220690439_Embedded_computing_a_VLIW_approach_to_architecture_compilers_and_tools

5. Knoop J., Rüthing O., Steffen B. Partial dead code elimination. In: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation (PLDI ‘94). 1994, p. 147−158. https://doi.org/10.1145/178243.178256

6. Fink S., Knobe K., Sarkar V. Unified analysis of array and object references in strongly typed languages. In: Palsberg J. (Ed.). Static Analysis. SAS 2000. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. 2000. V. 1824. P. 155−174. https://doi.org/10.1007/978-3-540-45099-3_9

7. Runeson J., Nyström S.-O. Retargetable graph-coloring register allocation for irregular architectures. In: Krall A. (Ed.). Software and Compilers for Embedded Systems (SCOPES 2003). Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. 2003. V. 2826. P. 240−254. https://doi.org/10.1007/978-3-540-39920-9_17

8. Blindell G.H. Instruction Selection: Principles, Methods, & Applications. Springer; 2016. 171 p. ISBN 978-3-319-34017-3. http://dx.doi.org/10.1007/978-3-319-34019-7

9. Gibbons P.B., Muchnick S.S. Efficient instruction scheduling for a pipelined architecture. ACM SIGPLAN Notices. 1986;21(7):11−16. https://doi.org/10.1145/13310.13312

10. Su Ch.-L., Tsui Ch.-Y., Despain A.M. Low power architecture design and compilation techniques for high-performance processors. In: Proceedings of COMPCON ʼ94. 1994, p. 489−498. https://doi.org/10.1109/CMPCON.1994.282878

11. Aycock J. A brief history of just-in-time. ACM Comput. Surv. 2003;35(2):97−113. https://doi.org/10.1145/857076.857077

12. Ogihara M. Fundamentals of Java Programming. Springer; 2018. 532 p.

13. Sage K. The Origins of Programming. In: Concise Guide to Object-Oriented Programming. Undergraduate Topics in Computer Science. Springer, Cham.; 2019, p. 1−9. https://doi.org/10.1007/978-3-030-13304-7_1

14. Saabith A.S., Fareez M.M.M., Vinothraj T. Python current trend applications-an overview. IJAERD. 2019;6(10):6−12. URL: http://ijaerd.com/papers/finished_papers/IJAERDV06I1085481.pdf

15. McFarling S. Combining Branch Predictors. Digital Western Research Lab (WRL). Technical Report, TN-36. 1993. 29 р. URL: https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-TN-36.pdf

16. Skadron K., Martonosi M., Clark D.W. A Taxonomy of branch mispredictions, and alloyed prediction as a robust solution to wrong-history mispredictions. In: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques. Philadelphia. 2000. https://doi.org/10.1109/PACT.2000.888344

17. Vintan L.N., Iridon M. Towards a high performance neural branch predictor. In: IJCNNʼ99. International Joint Conference on Neural Networks Proceedings. 1999. https://doi.org/10.1109/IJCNN.1999.831066

18. Kocher P., Horn J., Fogh A., Genkin D., et al. Spectre attacks: Exploiting speculative execution. In: 2019 IEEE Symposium on Security and Privacy (SP). 2019. 19 p. https://doi.org/10.1109/SP.2019.00002

19. Bhattacharyya A., Sandulescu A., Neugschwandtner M., Sorniotti A., et al. SMoTherSpectre: exploiting speculative execution through port contention. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS ʼ19). 2019, p. 785−800. https://doi.org/10.1145/3319535.3363194

20. Chen G., Chen S., Xiao Y., Zhang Y., Lin Z., Lai T.H. SgxPectre: Stealing intel secrets from SGX enclaves via speculative execution. In: 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE. 2019, p. 142−157. https://doi.org/10.1109/EuroSP.2019.00020

21. Arnold M., Fink S., Grove D., Hind M., Sweeney P.F. Adaptive optimization in the Jalapeno JVM. In: Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. 2000, p. 47–65. https://doi.org/10.1145/353171.353175

22. Riazanov A. Implementing an Efficient Theorem Prover. PhD thesis. The University of Manchester; 2003. 210 p. URL: https://www.researchgate.net/publication/2906405_Implementing_an_Efficient_Theorem_Prover

23. Grant B., Mock M., Philipose M., Chambers C., Eggers S.J. Annotation-directed run-time specialization in C. ACM SIGPLAN Not. 1997;32(12):163−178. https://doi.org/10.1145/258994.259016

24. Pettis K., Hansen R.C. Profile guided code positioning. In: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation (PLDI ‘90). 1990, p. 16−27. https://doi.org/10.1145/93542.93550

25. Wicht B., Vitillo R.A, Chen D., Levinthal D. Hardware Counted Profile-Guided Optimization. 24 November 2014. URL: https://arxiv.org/pdf/1411.6361.pdf


General Purpose computing for Graphical Processing Units technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This paper analyzed the existing approaches to improving software performance: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, and profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned.

Review

For citations:


Sesin I.Yu., Bolbakov R.G. Comparative analysis of software optimization methods in context of branch predication on GPUs. Russian Technological Journal. 2021;9(6):7-15. https://doi.org/10.32362/2500-316X-2021-9-6-7-15

Views: 643


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2782-3210 (Print)
ISSN 2500-316X (Online)