Vectorization of Loops with Conditional Operations by Combining Vector Masks

  • Alexey Anatolyevich Rybakov National Research Centre "Kurchatov Institute"; Scientific Research Institute for System Analysis of the Russian Academy of Sciences http://orcid.org/0000-0002-9755-8830

Abstract

The article is devoted to the problem of increasing the efficiency of vectorization for calculations on real numbers. When performing calculations, several similar scalar operations can be combined into a single vector command, significantly increasing the speed of program execution. This optimization is crucial for computational tasks of supercomputer modeling. The main object that the vectorization of calculations is aimed at is a loop with independent iterations. With a relatively simple form of the body of the considered loop, problems with vectorization, as a rule, do not arise. When complex controls, nested loops, and function calls appear in the body of the loop, the optimizing compiler often fails to cope with vectorization. However, the features of the AVX-512 vector instruction set with support of selective processing of vector data elements make it possible to vectorize loops with a body of almost arbitrary structure. This article discusses an approach to vectorization of a loop which contains conditions. The approach is based on merging of program execution paths under the appropriate predicates. A vectorized predicate is a mask for processing vector elements. Such masks are used in AVX-512 vector instructions. When vectorizing loops whose body contains complex controls, the main problem is the low density of masks of the vectorized code, which leads to decrease in performance. The article discusses methods to increase the density of vector masks and the efficiency of vector code execution. The developed methods have been tested on the progamm context of a gas-dynamic solver. Data on vectorization efficiency were obtained in vector instruction emulation mode and on a real machine (Intel Xeon Phi Knights Landing microprocessor). After applying vector code optimizations, vectorization efficiency indicators were achieved up to 0.75 in emulation mode and up to 0.47 on a real machine.

Author Biography

Alexey Anatolyevich Rybakov, National Research Centre "Kurchatov Institute"; Scientific Research Institute for System Analysis of the Russian Academy of Sciences

Head of the Department of Supercomputer Technologies and Systems, Division of Supercomputer Systems and Parallel Calculations;
Lead Researcher in Joint Supercomputer Center of the Russian Academy of Sciences - Branch of SRISA, Cand. Sci. (Phys.-Math.)

Published
2024-10-15
How to Cite
RYBAKOV, Alexey Anatolyevich. Vectorization of Loops with Conditional Operations by Combining Vector Masks. Modern Information Technologies and IT-Education, [S.l.], v. 20, n. 3, p. 563-572, oct. 2024. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/1090>. Date accessed: 12 sep. 2025. doi: https://doi.org/10.25559/SITITO.020.202403.563-572.
Section
Parallel and distributed programming, grid technologies, programming on GPUs