Preview

Russian Technological Journal

Advanced search

A tool for automatic parallelization of affine programs for systems with shared and distributed memory

https://doi.org/10.32362/2500-316X-2019-7-5-7-19

Abstract

Effective programming of parallel architectures has always been a challenge, and it is especially complicated with their modern diversity. The task of automatic parallelization of program code was formulated from the moment of the appearance of the first parallel computers made in Russia (for example, PS2000). To date, programming languages and technologies have been developed that simplify the work of a programmer (T-System, MC#, Erlang, Go, OpenCL), but do not make parallelization automatic. The current situation requires the development of effective programming tools for parallel computing systems. Such tools should support the development of parallel programs for systems with shared and distributed memory. The paper deals with the problem of automatic parallelization of affine programs for such systems. Methods for calculating space-time mappings that optimize the locality of the program are discussed. The implementation of developed methods is done in Haskell within the source-to-source translator performing automatic parallelization. A comparison of the performance of parallel variants of lu, atax, syr2k programs obtained using the developed tool and the modern Pluto tool is made. The experiments were performed on two x86_64 machines connected by the InfiniBand network. OpenMP and MPI were used as parallelization technologies. The performance of the resulting parallel program indicates the practical applicability of the developed tool for affine programs parallelization.

About the Authors

Sh. G. Magomedov
MIREA – Russian Technological University
Russian Federation

Cand of Sci. (Engineering), Associate Professor of the Chair CS-4 “Automated Control Systems”, Institute of Integrated Security and Special Instrumentation, 

78, Vernadskogo pr., Moscow 119454



A. S. Lebedev
MIREA – Russian Technological University
Russian Federation

Lecturer of the Chair CS-4 “Automated Control Systems”, Institute of Integrated Security and Special Instrumentation,

78, Vernadskogo pr., Moscow 119454



References

1. Griebl M., Lengauer C. On the space-time mapping of WHILE-loops. Parallel Processing Lett. 1994; 4(3):221-232. https://doi.org/10.1142/S0129626494000223

2. Griebl M., Lengauer C. The loop parallelizer LooPo – announcement. Int. Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, Heidelberg, 1996. P. 603-604. https://doi.org/10.1007/BFb0017283

3. Irigoin F., Jouvelot P., Triolet R. Semantical interprocedural parallelization: An overview of the PIPS project. Proceed. of the 5th Int. Conf. on Supercomputing, ACM, New York, ICS ’91. 1991. P. 244-251. https://doi.org/10.1145/109025.109086

4. Bondhugula U., Hartono A., Ramanujam J., Sadayappan P. A practical automatic polyhedral parallelizer and locality optimizer. ACM SIGPLAN Notices. 2008;43(6):101-113. https://doi.org/10.1145/1379022.1375595

5. Bondhugula U. Compiling affine loop nests for distributed-memory parallel architectures. SC'13: Proceed. of the Int. Conf. on High Performance Computing, Networking, Storage and Analysis. IEEE, 2013. P. 1-12. https://doi.org/10.1145/2503210.2503289

6. Bondhugula U., Bandishti V., Pananilath I. Diamond tiling: Tiling techniques to maximize parallelism for stencil computations. IEEE Trans. on Parallel and Distributed Systems. 2016;28(5):1285- 1298. https://doi.org/10.1109/TPDS.2016.2615094

7. Malas T.M., Hager G., Ltaief H., Keyes D. Multidimensional intratile parallelization for memory-starved stencil computations. ACM Trans. on Parallel Computing (TOPC). 2018;4(3):12. https://doi.org/10.1145/3155290

8. Park E., Cavazos J., Pouchet L.-N., Bastoul C., Cohen A., Sadayappan P. Predictive modeling in a polyhedral optimization space. Int. J. Parallel Program. 2013;41(5):704-750. https://doi.org/10.1007/s10766-013-0241-1

9. Baghdadi R., Beaugnon U., Cohen A., Grosser T., Kruse M., Reddy C., Verdoolaege S., Betts A., Donaldson A.F., Ketema J., Absar J., Van Haastregt S., Kravets A., Lokhmotov A., David R., Hajiyev E. Pencil: A platform-neutral compute intermediate language for accelerator programming. 2015 Int. Conf. on Parallel Architecture and Compilation (PACT). IEEE, 2015. P. 138-149. https://doi.org/10.1109/pact.2015.17

10. Lee S., Vetter J. S. OpenARC: Extensible OpenACC compiler framework for directive-based accelerator programming study. Proceed. of the First Workshop on Accelerator Programming using Directives. IEEE Press, 2014. P. 1-11. http://dx.doi.org/10.1109/WACCPD.2014.7

11. Grosser T., Zheng H., Aloor R., Simbürger A., Größlinger A., Pouchet L.-N. Polly-polyhedral optimization in LLVM. In: Alias C., Bastoul C. (eds.) Proceed. of the First Int. Workshop on Polyhedral Compilation Techniques (IMPACT). INRIA Grenoble Rhône-Alpes, 2011. P. 1.

12. Intel® C++ Compiler 19.0 Developer Guide and Reference. Submitted March 7, 2019. URL: https:// software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-enabling-auto-parallelization

13. PGI 2019 Version Information and New Features. URL: https://www.pgroup.com/support/release-2019.htm

14. Bastoul C. Code generation in the polyhedral model is easier than you think. Proceed. of the 13th Int. Conf. on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 2004. P. 7-16. https://doi.org/10.1109/PACT.2004.1342537

15. Lebedev A.S. Space-time mappings for parallelization of affine programs. Informatsionnyye tekhnologii i vychislitel’nyye sistemy [Information Technology and Computing Systems]. 2015;1:19-32 (in Russ.).

16. Lebedev A.S. Construction of data placements for automatic parallelization of affine programs for distributed memory systems. Vestnik Rybinskogo gosudarstvennogo aviatsionnogo tekhnologicheskogo universiteta im. P.A. Solov’yova [Bulletin of the P.A. Soloviev Rybinsk State Aviation Technical University]. 2015;(3):92-99 (in Russ.).

17. Lebedev A.S. Organizing communication of parallel processes during automatic parallelization of loop nests with static control flow for cluster systems using polyhedral model. Programmnyye sistemy: teoriya i prilozheniya [Software Systems: Theory and Applications]. 2017;(4):3-20 (in Russ.). https://doi.org/10.25209/2079-3316-2017-8-4-3-20

18. Griebl M., Feautrier P., Größlinger A. Forward communication only placements and their use for parallel program construction. Int. Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, Heidelberg, 2002. P. 16-30. https://doi.org/10.1007/11596110_2

19. Feautrier P. Toward automatic distribution. Parallel Processing Lett. 1994;4(3):233-244. https://doi.org/10.1142/S0129626494000235

20. Griebl M. Automatic parallelization of loop programs for distributed memory architectures. Univ. Passau, 2004. 207 p. URL: http://www.infosun.fim.uni-passau.de/cl/publications/docs/Gri04.pdf

21. Reddy C., Bondhugula U. Effective automatic computation placement and data allocation for parallelization of regular programs. Proceed. of the 28th ACM Int. Conf. on Supercomputing. ACM, 2014. P. 13-22. https://doi.org/10.1145/2597652.2597673

22. Bastoul C., Cohen A., Girbal S., Sharma S., Temam O. Putting polyhedral loop transformations to work. Int. Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, Heidelberg, 2003. P. 209-225. https://doi.org/10.1007/978-3-540-24644-2_14

23. Bastoul C. Openscop: A specification and a library for data exchange in polyhedral compilation tools. Tech. Rep. Paris-Sud University, France, 2011. V. 9. URL: http://icps.u-strasbg.fr/~bastoul/development/openscop/docs/openscop.pdf

24. Bastoul C., Pouchet L.N. Candl: The chunky analyzer for dependences in loops. Tech. Rep. LRI, ParisSud University, France, 2012. URL: http://icps.u-strasbg.fr/~bastoul/development/candl/#DOC


Supplementary files

1. Table 2. The results of parallelization of programs
Subject
Type Исследовательские инструменты
View (20KB)    
Indexing metadata ▾

Review

For citations:


Magomedov Sh.G., Lebedev A.S. A tool for automatic parallelization of affine programs for systems with shared and distributed memory. Russian Technological Journal. 2019;7(5):7-19. (In Russ.) https://doi.org/10.32362/2500-316X-2019-7-5-7-19

Views: 1313


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2782-3210 (Print)
ISSN 2500-316X (Online)