Comparative analysis of software optimization methods in context of branch predication on GPUs
https://doi.org/10.32362/2500-316X-2021-9-6-7-15
Abstract
General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.
About the Authors
I. Yu. SesinRussian Federation
Igor Yu. Sesin, Postgraduate Student, Department of the Tool and Applied Software, Institute of Information Technologies
78, Vernadskogo pr., Moscow, 119454 Russia
R. G. Bolbakov
Russian Federation
Roman G. Bolbakov, Cand. Sci. (Eng.), Associate Professor, Head of the Department of the Tool and Applied Software, Institute of Information Technologies
78, Vernadskogo pr., Moscow, 119454 Russia
References
1. Markidis S., Chien S.W.D., Laure E., Peng I.B., Vetter J.S. NVIDIA Tensor Core Programmability, Performance & Precision. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Vancouver, BC, Canada; 2018, p. 522−531. https://doi.org/10.1109/IPDPSW.2018.00091
2. Sanzharov V.V., Gorbonosov A.I., Frolov V.A., Voloboy A.G. Examination of the Nvidia RTX. CEUR Workshop Proceedings. 2019;2485:7−12. http://dx.doi.org/10.30987/graphicon-2019-2-7-12
3. Flynn M.J. Very high speed computing systems. Proceedings of the IEEE. 1966;54(12):1901−1909. https://doi.org/10.1109/PROC.1966.5273
4. Fisher J.A., Faraboschi P., Young C. Embedded computing: A VLIW approach to architecture, compilers, and tools. Elsevier; 2004. ISBN: 978-1-55860-766-8. URL: https://www.researchgate.net/publication/220690439_Embedded_computing_a_VLIW_approach_to_architecture_compilers_and_tools
5. Knoop J., Rüthing O., Steffen B. Partial dead code elimination. In: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation (PLDI ‘94). 1994, p. 147−158. https://doi.org/10.1145/178243.178256
6. Fink S., Knobe K., Sarkar V. Unified analysis of array and object references in strongly typed languages. In: Palsberg J. (Ed.). Static Analysis. SAS 2000. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. 2000. V. 1824. P. 155−174. https://doi.org/10.1007/978-3-540-45099-3_9
7. Runeson J., Nyström S.-O. Retargetable graph-coloring register allocation for irregular architectures. In: Krall A. (Ed.). Software and Compilers for Embedded Systems (SCOPES 2003). Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. 2003. V. 2826. P. 240−254. https://doi.org/10.1007/978-3-540-39920-9_17
8. Blindell G.H. Instruction Selection: Principles, Methods, & Applications. Springer; 2016. 171 p. ISBN 978-3-319-34017-3. http://dx.doi.org/10.1007/978-3-319-34019-7
9. Gibbons P.B., Muchnick S.S. Efficient instruction scheduling for a pipelined architecture. ACM SIGPLAN Notices. 1986;21(7):11−16. https://doi.org/10.1145/13310.13312
10. Su Ch.-L., Tsui Ch.-Y., Despain A.M. Low power architecture design and compilation techniques for high-performance processors. In: Proceedings of COMPCON ʼ94. 1994, p. 489−498. https://doi.org/10.1109/CMPCON.1994.282878
11. Aycock J. A brief history of just-in-time. ACM Comput. Surv. 2003;35(2):97−113. https://doi.org/10.1145/857076.857077
12. Ogihara M. Fundamentals of Java Programming. Springer; 2018. 532 p.
13. Sage K. The Origins of Programming. In: Concise Guide to Object-Oriented Programming. Undergraduate Topics in Computer Science. Springer, Cham.; 2019, p. 1−9. https://doi.org/10.1007/978-3-030-13304-7_1
14. Saabith A.S., Fareez M.M.M., Vinothraj T. Python current trend applications-an overview. IJAERD. 2019;6(10):6−12. URL: http://ijaerd.com/papers/finished_papers/IJAERDV06I1085481.pdf
15. McFarling S. Combining Branch Predictors. Digital Western Research Lab (WRL). Technical Report, TN-36. 1993. 29 р. URL: https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-TN-36.pdf
16. Skadron K., Martonosi M., Clark D.W. A Taxonomy of branch mispredictions, and alloyed prediction as a robust solution to wrong-history mispredictions. In: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques. Philadelphia. 2000. https://doi.org/10.1109/PACT.2000.888344
17. Vintan L.N., Iridon M. Towards a high performance neural branch predictor. In: IJCNNʼ99. International Joint Conference on Neural Networks Proceedings. 1999. https://doi.org/10.1109/IJCNN.1999.831066
18. Kocher P., Horn J., Fogh A., Genkin D., et al. Spectre attacks: Exploiting speculative execution. In: 2019 IEEE Symposium on Security and Privacy (SP). 2019. 19 p. https://doi.org/10.1109/SP.2019.00002
19. Bhattacharyya A., Sandulescu A., Neugschwandtner M., Sorniotti A., et al. SMoTherSpectre: exploiting speculative execution through port contention. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS ʼ19). 2019, p. 785−800. https://doi.org/10.1145/3319535.3363194
20. Chen G., Chen S., Xiao Y., Zhang Y., Lin Z., Lai T.H. SgxPectre: Stealing intel secrets from SGX enclaves via speculative execution. In: 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE. 2019, p. 142−157. https://doi.org/10.1109/EuroSP.2019.00020
21. Arnold M., Fink S., Grove D., Hind M., Sweeney P.F. Adaptive optimization in the Jalapeno JVM. In: Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. 2000, p. 47–65. https://doi.org/10.1145/353171.353175
22. Riazanov A. Implementing an Efficient Theorem Prover. PhD thesis. The University of Manchester; 2003. 210 p. URL: https://www.researchgate.net/publication/2906405_Implementing_an_Efficient_Theorem_Prover
23. Grant B., Mock M., Philipose M., Chambers C., Eggers S.J. Annotation-directed run-time specialization in C. ACM SIGPLAN Not. 1997;32(12):163−178. https://doi.org/10.1145/258994.259016
24. Pettis K., Hansen R.C. Profile guided code positioning. In: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation (PLDI ‘90). 1990, p. 16−27. https://doi.org/10.1145/93542.93550
25. Wicht B., Vitillo R.A, Chen D., Levinthal D. Hardware Counted Profile-Guided Optimization. 24 November 2014. URL: https://arxiv.org/pdf/1411.6361.pdf
General Purpose computing for Graphical Processing Units technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This paper analyzed the existing approaches to improving software performance: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, and profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned.
Review
For citations:
Sesin I.Yu., Bolbakov R.G. Comparative analysis of software optimization methods in context of branch predication on GPUs. Russian Technological Journal. 2021;9(6):7-15. https://doi.org/10.32362/2500-316X-2021-9-6-7-15