Structures Partitioning Optimization for Vector Optimizer in Intel Graphics Compiler

Abstract

Computations on video cards and specialized accelerators are widely used to solve many important practical tasks. Developers working with OpenCL, SYCL, CM, and ISPC rely heavily on the quality of optimizations in graphics compilers. For the Intel GPU, the compiler has two parts: a scalar part that works in the SIMT model, and a vector part that targets SIMD languages. It is the vector part of the compiler that contributes the most when it comes to critical tasks such as training neural networks, solving systems of equations, rendering images, and so on. Unfortunately, until recently, the Intel graphics compiler architecture lacked the ability to properly decompose into vector registers, which led to particular performance problems in programs written in ISPC, such as Embree and OSPRay. To solve this problem, we propose a structure partitioning algorithm for the vector optimizer of the Intel graphics compiler. A detailed description of the algorithm and performance measurements are given, showing an increase of up to 80% on some tasks.

Author Biographies

Konstantin Igorevich Vladimirov, Moscow Institute of Physics and Technology (National Research University)

Senior Lecturer of the Chair of Microprocessor Technologies in Intelligent Systems, Department of Radio Engineering and Cybernetics

Ilya Vitalyevich Andreev, Moscow Institute of Physics and Technology (National Research University)

Master degree student of the Chair of Microprocessor Technologies in Intelligent Systems, Department of Radio Engineering and Cybernetics

References

1. Lueh G.-Y., et al. C-for-Metal: High Performance Simd Programming on Intel GPUs. 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society, Seoul, Korea (South); 2021. p. 289-300. (In Eng.) doi: https://doi.org/10.1109/CGO51591.2021.9370324
2. Chen W.-Y., Lueh G.-Y., Ashar P., Chen K., Cheng B. Register allocation for Intel processor graphics. Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO'2018). Association for Computing Machinery, New York, NY, USA; 2018. p. 352-364. (In Eng.) doi: https://doi.org/10.1145/3168806
3. Chandrasekhar A., et al. IGC: The Open Source Intel Graphics Compiler. 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society; 2019. p. 254-265. (In Eng.) doi: https://doi.org/10.1109/CGO.2019.8661189
4. Castaño G., et al. Evaluation of Intel's DPC++ Compatibility Tool in heterogeneous computing. Journal of Parallel and Distributed Computing. 2022; 165:120-129. (In Eng.) doi: https://doi.org/10.1016/j.jpdc.2022.03.017
5. Lattner C., Adve V. The LLVM Compiler Framework and Infrastructure Tutorial. In: Eigenmann R., Li Z., Midkiff S.P. (eds.) Languages and Compilers for High Performance Computing. LCPC 2004. Lecture Notes in Computer Science. Vol. 3602. Springer, Berlin, Heidelberg; 2005. p. 15-16. (In Eng.) doi: https://doi.org/10.1007/11532378_2
6. Pharr M., Mark W.R. ISPC: A SPMD compiler for high-performance CPU programming. 2012 Innovative Parallel Computing (InPar). IEEE Computer Society; 2012. p. 1-13. (In Eng.) doi: https://doi.org/10.1109/InPar.2012.6339601
7. Brodman J., Babokin D., Filippov I., Tu P. Writing scalable SIMD programs with ISPC. Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing (WPMVP'14). Association for Computing Machinery, New York, NY, USA; 2014. p. 25-32. (In Eng.) doi: https://doi.org/10.1145/2568058.2568065
8. Tian X., Saito H., Su E., Lin J., Guggilla S., Caballero D., Masten M., Savonichev A., Rice M., Demikhovsky E., Zaks A., Rapaport G., Gaba A., Porpodas V., Garcia E. LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization. Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC'17). Association for Computing Machinery, New York, NY, USA; 2017. Article number: 4. p. 1-11. (In Eng.) doi: https://doi.org/10.1145/3148173.3148191
9. Tian X., Saito H., Su E., Gaba A., Masten M., Garcia E., Zaks A. LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading. Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC (LLVM-HPC'16). IEEE Computer Society; 2016. p. 21-31. (In Eng.) doi: https://dl.acm.org/doi/10.5555/3018869.3018872
10. Ashbaugh B., Brodman J.C., Kinsner M., Lueck G., Pennycook J., Schulz R. Toward a Better Defined SYCL Memory Consistency Model. International Workshop on OpenCL (IWOCL'21). Association for Computing Machinery, New York, NY, USA; 2021. Article number: 20. p. 1-3. (In Eng.) doi: https://doi.org/10.1145/3456669.3456696
11. Mrozek M., Ashbaugh B., Brodman J. Taking Memory Management to the Next Level: Unified Shared Memory in Action. Proceedings of the International Workshop on OpenCL (IWOCL'20). Association for Computing Machinery, New York, NY, USA; 2020. Article number: 1. p. 1-3. (In Eng.) doi: https://doi.org/10.1145/3388333.3388644
12. Ashbaugh B., Bader A., Brodman J., Hammond J., Kinsner M., Pennycook J., Schulz R., Sewall J. Data Parallel C++: Enhancing SYCL Through Extensions for Productivity and Performance. Proceedings of the International Workshop on OpenCL (IWOCL'20). Association for Computing Machinery, New York, NY, USA; 2020. Article number: 7. p. 1-2. (In Eng.) doi: https://doi.org/10.1145/3388333.3388653
13. Reinders J.R. SYCL, DPC++, XPUs, oneAPI. International Workshop on OpenCL (IWOCL'21). Association for Computing Machinery, New York, NY, USA; 2021. Article number: 19. p. 1 (In Eng.) doi: https://doi.org/10.1145/3456669.3456719
14. Alpay A., Soproni B., Wünsche H., Heuveline V. Exploring the possibility of a hipSYCL-based implementation of oneAPI. International Workshop on OpenCL (IWOCL'22). Association for Computing Machinery, New York, NY, USA; 2022. Article number: 10. p. 1-12. (In Eng.) doi: https://doi.org/10.1145/3529538.3530005
15. Hardy D.J., Choi J., Jiang W., Tajkhorshid E. Experiences Porting NAMD to the Data Parallel C++ Programming Model. International Workshop on OpenCL (IWOCL'22). Association for Computing Machinery, New York, NY, USA; 2022. Article number: 15. p. 1-5. (In Eng.) doi: https://doi.org/10.1145/3529538.3529560
16. Fang J., Huang C., Tang T., et al. Parallel programming models for heterogeneous many-cores: a comprehensive survey. CCF Transactions on High Performance Computing. 2020; 2(4):382-400. (In Eng.) doi: https://doi.org/10.1007/s42514-020-00039-4
17. Alpay A., Heuveline V. How much SYCL does a compiler need? Experiences from the implementation of SYCL as a library for nvc++. International Workshop on OpenCL (IWOCL'22). Association for Computing Machinery, New York, NY, USA; 2022. Article number: 11. p. 1. (In Eng.) doi: https://doi.org/10.1145/3529538.3529556
18. Aktemur B., Metzger M., Saiapova N., Strasuns M. Debugging SYCL Programs on Heterogeneous Intel® Architectures. Proceedings of the International Workshop on OpenCL (IWOCL'20). Association for Computing Machinery, New York, NY, USA; 2020. Article number: 13. p. 1-10. (In Eng.) doi: https://doi.org/10.1145/3388333.3388646
19. Reyes R., Brown G., Burns R., Wong M. SYCL 2020: More than meets the eye. Proceedings of the International Workshop on OpenCL (IWOCL'20). Association for Computing Machinery, New York, NY, USA; 2020. Article number: 4. p. 1. (In Eng.) doi: https://doi.org/10.1145/3388333.3388649
20. Nozal R., Bosque J.L. Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective. In: Sousa L., Roma N., Tomás P. (eds.) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science. Vol. 12820. Springer, Cham; 2021. p. 501-516. (In Eng.) doi: https://doi.org/10.1007/978-3-030-85665-6_31
21. Constantinescu D.A., Navarro A., Corbera F., et al. Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs. The Journal of Supercomputing. 2021; 77(1):44-65. (In Eng.) doi: https://doi.org/10.1007/s11227-020-03257-3
22. Pharr M. The ray-tracing engine that could: technical perspective. Communications of the ACM. 2013; 56(5):92. (In Eng.) doi: https://doi.org/10.1145/2447976.2447996
23. Pharr M. Guest Editor’s Introduction: Special Issue on Production Rendering. ACM Transactions on Graphics. 2018; 37(3):1-4. (In Eng.) doi: https://doi.org/10.1145/3212511
24. Moreau P., Pharr M., Clarberg P. Dynamic many-light sampling for real-time ray tracing. Proceedings of the Conference on High-Performance Graphics (HPG'19). Eurographics Association, Goslar, DEU; 2019. p. 21-26. (In Eng.) doi: https://doi.org/10.2312/hpg.20191191
25. Favre J.M., Blass A. A comparative evaluation of three volume rendering libraries for the visualization of sheared thermal convection. Parallel Computing. 2019; 88:102543. (In Eng.) doi: https://doi.org/10.1016/j.parco.2019.07.003
26. Zhou K., et. al. Measurement and analysis of GPU-accelerated applications with HPCToolkit. Parallel Computing. 2021; 108:102837. (In Eng.) doi: https://doi.org/10.1016/j.parco.2021.102837
27. Rodríguez A., et. al. Lightweight asynchronous scheduling in heterogeneous reconfigurable systems. Journal of Systems Architecture. 2022; 124:102398. (In Eng.) doi: https://doi.org/10.1016/j.sysarc.2022.102398
28. Purkayastha A.A., Rogers S., Shiddibhavi S.A., Tabkhi H. LLVM-based automation of memory decoupling for OpenCL applications on FPGAs. Microprocessors and Microsystems. 2020; 72:102909. (In Eng.) doi: https://doi.org/10.1016/j.micpro.2019.102909
Published
2022-07-20
How to Cite
VLADIMIROV, Konstantin Igorevich; ANDREEV, Ilya Vitalyevich. Structures Partitioning Optimization for Vector Optimizer in Intel Graphics Compiler. Modern Information Technologies and IT-Education, [S.l.], v. 18, n. 2, p. 249-255, july 2022. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/853>. Date accessed: 29 aug. 2025. doi: https://doi.org/10.25559/SITITO.18.202202.249-255.
Section
Parallel and distributed programming, grid technologies, programming on GPUs