Оптимизация условных переходов с учетом векторных возможностей потока управления Intel GPU

Konstantin Igorevich Vladimirov; Yuly Valeryevich Tarasov

doi:10.25559/SITITO.18.202202.256-262

Konstantin Igorevich Vladimirov Moscow Institute of Physics and Technology (National Research University) http://orcid.org/0000-0003-0925-1300
Yuly Valeryevich Tarasov Moscow Institute of Physics and Technology (National Research University) http://orcid.org/0000-0003-0416-9703

DOI: https://doi.org/10.25559/SITITO.18.202202.256-262

Abstract

Control flow optimizations are of particular importance when optimizing programs for graphics accelerators and video cards. In addition to scalar control flow optimizations, which are widely known and well represented in modern compilers, there is also a current issue of vector control flow optimizations. On the one hand, vector control flow is natural for high-level languages such as ISPC and CM, where vector control constructs are part of the semantics of ordinary programs. On the other hand, vector primitives, including those for vector control flow, are present in modern graphics accelerators, such as Intel XE. Support in the hardware can significantly improve the performance of programs. In this case, the main problem is the lack of vector control structures in a stable intermediate representation. This paper proposes an intermediate scalar representation for vector control structures through explicit predicates and an algorithm for restoring the vector control flow from this representation in a graphical optimizer.

Author Biographies

Konstantin Igorevich Vladimirov, Moscow Institute of Physics and Technology (National Research University)

Senior Lecturer of the Chair of Microprocessor Technologies in Intelligent Systems, Department of Radio Engineering and Cybernetics

Yuly Valeryevich Tarasov, Moscow Institute of Physics and Technology (National Research University)

Master degree student of the Chair of Microprocessor Technologies in Intelligent Systems, Department of Radio Engineering and Cybernetics

References

1. Chandrasekhar A., et al. IGC: The Open Source Intel Graphics Compiler. 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society; 2019. p. 254-265. (In Eng.) doi: https://doi.org/10.1109/CGO.2019.8661189
2. Lueh G.-Y., et al. C-for-Metal: High Performance Simd Programming on Intel GPUs. 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society, Seoul, Korea (South); 2021. p. 289-300. (In Eng.) doi: https://doi.org/10.1109/CGO51591.2021.9370324
3. Ashbaugh B., Bader A., Brodman J., Hammond J., Kinsner M., Pennycook J., Schulz R., Sewall J. Data Parallel C++: Enhancing SYCL Through Extensions for Productivity and Performance. Proceedings of the International Workshop on OpenCL (IWOCL'20). Association for Computing Machinery, New York, NY, USA; 2020. Article number: 7. p. 1-2. (In Eng.) doi: https://doi.org/10.1145/3388333.3388653
4. Reinders J.R. SYCL, DPC++, XPUs, oneAPI. International Workshop on OpenCL (IWOCL'21). Association for Computing Machinery, New York, NY, USA; 2021. Article number: 19. p. 1 (In Eng.) doi: https://doi.org/10.1145/3456669.3456719
5. Vasudevan S. Inner loops in flowgraphs and code optimization. Acta Informatica. 1982; 17(2):143-155. (In Eng.) doi: https://doi.org/10.1007/BF00288967
6. Sarkar V. Optimized Unrolling of Nested Loops. International Journal of Parallel Programming. 2001; 29(5):545-581. (In Eng.) doi: https://doi.org/10.1023/A:1012246031671
7. Weiss S., Smith J.E. A Study of Scalar Compilation Techniques for Pipelined Supercomputers. ACM Transactions on Mathematical Software. 1990; 16(3):223-245. (In Eng.) doi: https://doi.org/10.1145/79505.79508
8. Matoussi O., Pétrot F. Loop aware CFG matching strategy for accurate performance estimation in IR-level native simulation. Integration. 2019; 65:444-454. (In Eng.) doi: https://doi.org/10.1016/j.vlsi.2018.02.001
9. Mansky W., Gunter E.L., Griffith D., Adams M.D. Specifying and executing optimizations for generalized control flow graphs. Science of Computer Programming. 2016; 130:2-23. (In Eng.) doi: https://doi.org/10.1016/j.scico.2016.06.003
10. Carminati A., Starke R.A., de Oliveira R.S. Combining loop unrolling strategies and code predication to reduce the worst-case execution time of real-time software. Applied Computing and Informatics. 2017; 13(2):184-193. (In Eng.) doi: https://doi.org/10.1016/j.aci.2017.03.002
11. Pharr M., Mark W.R. ISPC: A SPMD compiler for high-performance CPU programming. 2012 Innovative Parallel Computing (InPar). IEEE Computer Society; 2012. p. 1-13. (In Eng.) doi: https://doi.org/10.1109/InPar.2012.6339601
12. Muntean P., Neumayer M., Lin Z., Tan G., Grossklags J., Eckert C. Analyzing control flow integrity with LLVM-CFI. Proceedings of the 35th Annual Computer Security Applications Conference (ACSAC'19). Association for Computing Machinery, New York, NY, USA; 2019. p. 584-597. (In Eng.) doi: https://doi.org/10.1145/3359789.3359806
13. Moll S., Hack S. Partial control-flow linearization. Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA; 2018. p. 543–556. (In Eng.) doi: https://doi.org/10.1145/3192366.3192413
14. Wolf M.E., Maydan D.E., Chen D.K. Combining Loop Transformations Considering Caches and Scheduling. International Journal of Parallel Programming. 1998; 26(4):479-503. (In Eng.) doi: https://doi.org/10.1023/A:1018754616274
15. Lattner C., Adve V. The LLVM Compiler Framework and Infrastructure Tutorial. In: Eigenmann R., Li Z., Midkiff S.P. (eds.) Languages and Compilers for High Performance Computing. LCPC 2004. Lecture Notes in Computer Science. Vol. 3602. Springer, Berlin, Heidelberg; 2005. p. 15-16. (In Eng.) doi: https://doi.org/10.1007/11532378_2
16. Chen W.-Y., Lueh G.-Y., Ashar P., Chen K., Cheng B. Register allocation for Intel processor graphics. Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO'2018). Association for Computing Machinery, New York, NY, USA; 2018. p. 352-364. (In Eng.) doi: https://doi.org/10.1145/3168806
17. Tian X., Saito H., Su E., Lin J., Guggilla S., Caballero D., Masten M., Savonichev A., Rice M., Demikhovsky E., Zaks A., Rapaport G., Gaba A., Porpodas V., Garcia E. LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization. Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC'17). Association for Computing Machinery, New York, NY, USA; 2017. Article number: 4. p. 1-11. (In Eng.) doi: https://doi.org/10.1145/3148173.3148191
18. Tian X., Saito H., Su E., Gaba A., Masten M., Garcia E., Zaks A. LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading. Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC (LLVM-HPC'16). IEEE Computer Society; 2016. p. 21-31. (In Eng.) doi: https://dl.acm.org/doi/10.5555/3018869.3018872
19. Racordon D. From ASTs to Machine Code with LLVM. Companion Proceedings of the 5th International Conference on the Art, Science, and Engineering of Programming (Programming '21). Association for Computing Machinery, New York, NY, USA; 2021. p. 68-76. (In Eng.) doi: https://doi.org/10.1145/3464432.3464777
20. Brodman J., Babokin D., Filippov I., Tu P. Writing scalable SIMD programs with ISPC. Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing (WPMVP'14). Association for Computing Machinery, New York, NY, USA; 2014. p. 25-32. (In Eng.) doi: https://doi.org/10.1145/2568058.2568065
21. Pharr M. The ray-tracing engine that could: technical perspective. Communications of the ACM. 2013; 56(5):92. (In Eng.) doi: https://doi.org/10.1145/2447976.2447996
22. Pharr M. Guest Editor’s Introduction: Special Issue on Production Rendering. ACM Transactions on Graphics. 2018; 37(3):1-4. (In Eng.) doi: https://doi.org/10.1145/3212511
23. Moreau P., Pharr M., Clarberg P. Dynamic many-light sampling for real-time ray tracing. Proceedings of the Conference on High-Performance Graphics (HPG'19). Eurographics Association, Goslar, DEU; 2019. p. 21-26. (In Eng.) doi: https://doi.org/10.2312/hpg.20191191
24. Favre J.M., Blass A. A comparative evaluation of three volume rendering libraries for the visualization of sheared thermal convection. Parallel Computing. 2019; 88:102543. (In Eng.) doi: https://doi.org/10.1016/j.parco.2019.07.003
25. Lee J., Hur C.-K., Jung R., Liu Z., Regehr J., Lopes N.P. Reconciling high-level optimizations and low-level code in LLVM. Proceedings of the ACM on Programming Languages. Vol. 2, No. OOPSLA. Association for Computing Machinery, New York, NY, USA; 2018. Article number: 125. p. 1-28. (In Eng.) doi: https://doi.org/10.1145/3276495
26. Fang J., Huang C., Tang T., et al. Parallel programming models for heterogeneous many-cores: a comprehensive survey. CCF Transactions on High Performance Computing. 2020; 2(4):382-400. (In Eng.) doi: https://doi.org/10.1007/s42514-020-00039-4
27. Nozal R., Bosque J.L. Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective. In: Sousa L., Roma N., Tomás P. (eds.) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science. Vol. 12820. Springer, Cham; 2021. p. 501-516. (In Eng.) doi: https://doi.org/10.1007/978-3-030-85665-6_31

Conditional Jumps Optimization Taking into Account the Vector Capabilities of the Intel GPU Control Flow

Abstract

Author Biographies

References

Most read articles by the same author(s)

Journal Sponsorship