Vectorization of Program Code Containing Low Probability Regions in Computational Geometry Problems
Abstract
Improving application performance is an important practical task for supercomputer calculations. Along with parallelization of calculations between cluster nodes (for example, using MPI tools), as well as multithreaded programming (for example, using OpenMP), program code vectorization is used, which provides parallelism at the level of individual instructions. The AVX-512 vector instruction set of the Intel microprocessor architecture has a number of unique features that can be used to vectorize program code from a very wide range of applications. The use of special mask registers makes it possible to effectively vectorize code containing conditional statements, and the use of profile information about the probability of performing operations in the source code allows performing code transformations that lead to even more efficient use of automatic vectorization. The paper considers the splitting transformation of a flat loop by a condition when we have knowledge about of a high probability of this condition. Practical problems in which this context is present are considered. For these problems, the conditions for splitting cycles to achieve efficient vectorization are highlighted. The described transformations were performed, and the effectiveness of the resulting code was verified by runs on microprocessors of the Intel Xeon Cascade Lake and Intel Xeon Phi Knights Landing families, the results of the runs are presented.
References
2. Pohl A., Cosenza B., Juurlink B. Control Flow Vectorization for ARM NEON. SCOPES'18: Proceedings of the 21th International Workshop on Software and Compilers for Embedded Systems. ACM, New York, NY, USA; 2018. p. 66-75. (In Eng.) doi: https://doi.org/10.1145/3207719.3207721
3. Ali M., et al. Vector Processing Unit: A RISC-V based SIMD Co-processor for Embedded Processing. Proceedings of the 24th Euromicro Conference on Digital System Design. IEEE Press, Palermo, Italy; 2021. p. 30-34. (In Eng.) doi: https://doi.org/10.1109/DSD53832.2021.00014
4. Volkonsky V.Yu., et al. Metody rasparallelivaniya programm v optimiziruyushchem kompilyatore dlya VK semejstva El'brus [Methods for parallelizing programs in an optimizing compiler for a computer complex of the Elbrus family]. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2011; (7):46-59. Available at: https://elibrary.ru/item.asp?id=23020730 (accessed 17.01.2022). (In Russ., abstract in Eng.)
5. Cebrian J.M., Natvig L., Jahre M. Scalability Analysis of AVX-512 Extensions. Journal of Supercomputing. 2020; 76(3):2082-2097. (In Eng.) doi: https://doi.org/10.1007/s11227-019-02840-7
6. Shabanov B.M., Rybakov A.A., Shumilin S.S. Vectorization of High-performance Scientific Calculations Using AVX-512 Intruction Set. Lobachevskii Journal of Mathematics. 2019; 40(5):580-598. (In Eng.) doi: https://doi.org/10.1134/S1995080219050196
7. Hossain M.M., Saule E. Impact of AVX-512 Instructions on Graph Partitioning Problems. Proceedings of the 50th International Conference on Parallel Processing Workshop. ACM, New York, NY, USA; 2021. Article number: 33. p. 1-9. (In Eng.) doi: https://doi.org/10.1145/3458744.3473362
8. Malas T., Kurth T., Deslippe J. Optimization of the Sparse Matrix-Vector Products of an IDR Krylov Iterative Solver in EMGeo for the Intel KNL Manycore Processor. In: Taufer M., Mohr B., Kunkel J. (eds.) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science. Vol. 9945. Springer, Cham; 2016. p. 378-389. (In Eng.) doi: https://doi.org/10.1007/978-3-319-46079-6_27
9. Bramas B. A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake. International Journal of Advanced Computer Science and Applications. 2017; 8(10):337-344. (In Eng.) doi: http://dx.doi.org/10.14569/IJACSA.2017.081044
10. McDoniel W., Höhnerbach M., Canales R., Ismail A.E., Bientinesi P. LAMMPS' PPPM Long-Range Solver for the Second Generation Xeon Phi. In: Kunkel J.M., Yokota R., Balaji P., Keyes D. (eds.) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science. Vol. 10266. Springer, Cham; 2017. p. 61-78. (In Eng.) doi: https://doi.org/10.1007/978-3-319-58667-0_4
11. Savin G.I., Shabanov B.M., Rybakov A.A., Shumilin S.S. Vectorization of Flat Loops of Arbitrary Structure Using Instructions AVX-512. Lobachevskii Journal of Mathematics. 2020; 41(12):2566-2574. (In Eng.) doi: https://doi.org/10.1134/S1995080220120331
12. Krzikalla O., Wende F., Höhnerbach M. Dynamic SIMD Vector Lane Scheduling. In: Taufer M., Mohr B., Kunkel J. (eds.) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science. Vol. 9945. Springer, Cham; 2016. p. 354-365. (In Eng.) doi: https://doi.org/10.1007/978-3-319-46079-6_25
13. Rybakov A.A., Shumilin S.S. Vectorization of the Riemann solver using the AVX-512 instruction set. Program Systems: Theory and Applications. 2019; 10(3):41-58. (In Eng.) doi: https://doi.org/10.25209/2079-3316-2019-10-3-41-58
14. Rybakov A.A. Optimization of the problem of conflict detection with dangerous aircraft movement areas to execute on Intel Xeon Phi. Programmnye produkty i sistemy = Software & Systems. 2017; 30(3):524-528. (In Russ., abstract in Eng.) doi: https://doi.org/10.15827/0236-235X.030.3.524-528
15. Abalakin I.V., Zhdanova N.S., Kozubskaya T.K. Immersed Boundary Method for Numerical Simulation of Inviscid Compressible Flows. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki = Computational Mathematics and Mathematical Physics. 2018; 58(9):1411-1419. (In Eng.) doi: https://doi.org/10.1134/S0965542518090026
16. Mori Y., Peskin C.S. Implicit second-order immersed boundary methods with boundary mass. Computer Methods in Applied Mechanics and Engineering. 2008; 197(25-28):2049-2067. (In Eng.) doi: https://doi.org/10.1016/j.cma.2007.05.028
17. Peter S., De A.K. A parallel implementation of the ghost-cell immersed boundary method with application to stationary and moving boundary problems. Sadhana. 2016; 41(4):441-450. (In Eng.) doi: https://doi.org/10.1007/s12046-016-0484-9
18. Rybakov A.A. Vektorizaciya nahozhdeniya peresecheniya ob"emnoj i poverhnostnoj setok dlya mikroprocessorov s podderzhkoj AVX-512 [Vectorization of finding the intersection of volume grid and surface grid for microprocessors with AVX-512 support]. Trudy NIISI RAN = Proceedings of NIISI RAS. 2019; 9(5):5-14. Available at: https://elibrary.ru/item.asp?id=41595664 (accessed 17.01.2022). (In Russ., abstract in Eng.)
19. Chernikov S. N. Svertyvanie konechnyh sistem linejnyh neravenstv [Collapse of finite systems of linear inequalities]. Doklady AN SSSR = Doklady of the USSR Academy of Sciences. 1963; 152(5):1075-1078. (In Russ.)
20. Bourgault Y., Beaugendre H., Habashi W. Development of a shallow-water icing model in FENSAP-ICE. Journal of Aircraft. 2000; 37(4):640-646. (In Eng.) doi: https://doi.org/10.2514/2.2646
21. Fu P., Farzaneh M., Bouchard G. Modeling a Water Flow on an Icing Surface. Proceedings of the 11th International Workshop on Atmospheric Icing of Structures (IWAIS’2005). Montreal, Canada; 2005. Available at: https://www.compusult.com/web/iwais/iwais-2005 (accessed 17.01.2022). (In Eng.)
22. Thompson D., et al. Discrete Surface Evolution and Mesh Deformation for Aircraft Icing Applications. Proceedings of the 5th AIAA Atmospheric and Space Environments Conference. San Diego, CA; 2013. (In Eng.) doi: https://doi.org/10.2514/6.2013-2544
23. Tong X., Thompson D., Arnoldus Q., Collins E., Luke E. Three-Dimensional Surface Evolution and Mesh Deformation for Aircraft Icing Application. Journal of Aircraft. 2017; 54(3):1047-1063. (In Eng.) doi: https://doi.org/10.2514/1.C033949
24. Jung W., Shin H., Choi B.K. Self-intersection Removal in Triangular Mesh Offsetting. Computer-Aided Design and Applications. 2004; 1:477-484. (In Eng.) doi: https://doi.org/10.1080/16864360.2004.10738290
25. Skorkovská V., Kolingerová I., Benes B. A Simple and Robust Approach to Computation of Meshes Intersection. Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Vol. 1. SciTePress; 2018. p. 175-182. (In Eng.) doi: https://doi.org/10.5220/0006538401750182

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.