Using Dynamic-Length Vector Operations to Efficiently Emulate Fixed-Length Vector Operations

Abstract

Using vector registers can significantly enhance processor performance, especially in tasks that involve parallel data processing. To ensure code portability across different architectures, developers often use third-party libraries. These libraries abstract architecture-specific vector operations using high-level programming constructs. However, creating transparent wrappers over the base data types requires a deep understanding of both the library's features and the architectural differences between the vector extensions. This is especially important when porting to new platforms such as RISC-V, where the principles of working with vectors may differ significantly from other architectures.
This article reviews existing libraries for creating generalized vector algorithms and proposes solutions for adding support for the RISC-V vector extension to the EVE library, which is currently in active development. Although EVE supports scalable vectorization through SVE, integrating with RISC-V presents a number of additional challenges that need to be addressed. The main challenge is to efficiently utilize the unique features of RISC-V, such as the grouping of multiple vector registers. Additionally, we need to adapt existing libraries to work with the new hardware features. These features include limited vector operations that are not available at the architectural level.

Author Biographies

Konstantin Igorevich Vladimirov, Moscow Institute of Physics and Technology (National Research University)

Senior Lecturer of the Chair of Microprocessor Technologies in Intelligent Systems, Department of Radio Engineering and Cybernetics

Ivan Andreevich Tetyushkin, Moscow Institute of Physics and Technology (National Research University)

Postgraduate Student of the Chair of Microprocessor Technologies in Intelligent Systems, Department of Radio Engineering and Cybernetics

References

1. Denning P.J., Lewis T.G. Exponential laws of computing growth. Communications of the ACM. 2016;60(1):54-65. https://doi.org/10.1145/2976758
2. Gordon A. A quick overview of OpenMP for multi-core programming. Journal of Computing Sciences in Colleges. 2021;28(2):48.
3. Burns R., Davidson C., Dodds A. Enabling OpenCL and SYCL for RISC-V processors. In: Proceedings of the 9th International Workshop on OpenCL (IWOCL '21). New York, NY, USA: Association for Computing Machinery; 2021. Article number: 15. https://doi.org/10.1145/3456669.3456687
4. Hammond J., Dalcin L., Schnetter E., PéRache M., Besnard J.-B., Brown J., Gadeschi G.B., Byrne S., Schuchart J., Zhou H. MPI Application Binary Interface Standardization. In: Proceedings of the 30th European MPI Users' Group Meeting (EuroMPI '23). New York, NY, USA: Association for Computing Machinery; 2023. Article number: 1. https://doi.org/10.1145/3615318.3615319
5. Ijaz M., Saleem F., Shahid U., Waheed S., Coulon J.-R. Implementation and Performance Evaluation of Bit Manipulation Extension on CVA6 RISC-V. In: Proceedings of the 20th ACM International Conference on Computing Frontiers (CF '23). New York, NY, USA: Association for Computing Machinery; 2023. p. 385-386. https://doi.org/10.1145/3587135.3591439
6. Zhong D., Cao Q., Bosilca G., Dongarra J. Using Advanced Vector Extensions AVX-512 for MPI Reductions. In: Proceedings of the 27th European MPI Users' Group Meeting (EuroMPI/USA '20). New York, NY, USA: Association for Computing Machiner; 2020. p. 1-10. https://doi.org/10.1145/3416315.3416316
7. Lin K.-K., et al. Rewriting and Optimizing Vector Length Agnostic Intrinsics from Arm SVE to RVV. In: Workshop Proceedings of the 53rd International Conference on Parallel Processing (ICPP Workshops '24). New York, NY, USA: Association for Computing Machinery; 2024. p. 38-47. https://doi.org/10.1145/3677333.3678151
8. Estérie P., Falcou J., Gaunard M., Lapresté J.-T. Boost.SIMD: generic programming for portable SIMDization. In: Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing (WPMVP '14). New York, NY, USA: Association for Computing Machinery; 2014. p. 1-8. https://doi.org/10.1145/2568058.2568063
9. Krzikalla O., Zitzlsberger G. Code vectorization using Intel Array Notation. In: Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing (WPMVP '16). New York, NY, USA: Association for Computing Machinery; 2016. Article number: 6. https://doi.org/10.1145/2870650.2870655
10. Yzelman A.N. Generalised vectorisation for sparse matrix: vector multiplication. In: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms (IA3 '15). New York, NY, USA: Association for Computing Machinery; 2015. Association for Computing Machinery, 6. https://doi.org/10.1145/2833179.2833185
11. Islam M.A., Kise K. Resource-efficient RISC-V Vector Extension Architecture for FPGA-based Accelerators. In: Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART '23). New York, NY, USA: Association for Computing Machinery; 2024. p. 78-85. https://doi.org/10.1145/3597031.3597047
12. Gupta S.R., Papadopoulou N., Pericàs M. Challenges and Opportunities in the Co-design of Convolutions and RISC-V Vector Processors. In: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23). New York, NY, USA: Association for Computing Machinery; 2023. p. 1550-1556. https://doi.org/10.1145/3624062.3624232
13. Guan X., et al. PresCount: Effective Register Allocation for Bank Conflict Reduction. In: Proceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO '24). IEEE Press; 2024. p. 170-181. https://doi.org/10.1109/CGO57630.2024.10444841
14. Krause P.K. The complexity of register allocation. Discrete Applied Mathematics. 2014;168:51-59. https://doi.org/10.1016/j.dam.2013.03.015
15. Tian X., et al. Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. Shanghai, China: IEEE Press; 2012. p. 2349-2358. https://doi.org/10.1109/IPDPSW.2012.292
16. Brankovic S., Markovic A., Simic D., Rikalo A. Improving performance of sorting small arrays on MIPS CPUs using bitonic sort and SIMD instructions. In: 2019 27th Telecommunications Forum (TELFOR). Belgrade, Serbia: IEEE Press; 2019. p. 1-4. https://doi.org/10.1109/TELFOR48224.2019.8971325
17. Keliris A., Maniatakos M. Investigating large integer arithmetic on Intel Xeon Phi SIMD extensions. In: 2014 9th IEEE International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS). Santorini, Greece: IEEE Press; 2014. p. 1-6. https://doi.org/10.1109/DTIS.2014.6850661
18. Edamatsu T., Takahashi D. Efficient Large Integer Multiplication with Arm SVE Instructions. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia '23). New York, NY, USA: Association for Computing Machinery; 2023. p. 9-17. https://doi.org/10.1145/3578178.3578193
19. Wang J., Hu Y. Enabling Efficient SIMD Acceleration for Virtual Radio Access Network. In: Proceedings of the 50th International Conference on Parallel Processing (ICPP '21). New York, NY, USA: Association for Computing Machinery; 2021. Article number: 63. https://doi.org/10.1145/3472456.3472477
20. Shih M.-S., et al. Register-Pressure Aware Predicator for Length Multiplier of RVV. In: Workshop Proceedings of the 51st International Conference on Parallel Processing (ICPP Workshops '22). New York, NY, USA: Association for Computing Machinery; 2023. Article number: 10. https://doi.org/10.1145/3547276.3548513
21. Lai H.-M., Lee J.-K. Efficient Support of the Scan Vector Model for RISC-V Vector Extension. In: Workshop Proceedings of the 51st International Conference on Parallel Processing (ICPP Workshops '22). New York, NY, USA: Association for Computing Machinery; 2023. Article number: 15. https://doi.org/10.1145/3547276.3548518
22. Hao X., et al. POPA: Expressing High and Portable Performance across Spatial and Vector Architectures for Tensor Computations. In: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '24). New York, NY, USA: Association for Computing Machinery; 2024. p. 199-210. https://doi.org/10.1145/3626202.3637566
23. Wang H., et al. Simple, portable and fast SIMD intrinsic programming: generic SIMD library. In: Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing (WPMVP '14). New York, NY, USA: 2014. p. 9-16. https://doi.org/10.1145/2568058.2568059
24. Falcou J., Sérot J. EVE, an Object Oriented SIMD Library. In: Bubak M., van Albada G.D., Sloot P.M.A., Dongarra J. (Eds.) Computational Science ICCS 2004. ICCS 2004. Lecture Notes in Computer Science. Vol. 3038. Berlin, Heidelberg: Springer; 2004. p. 314-321. https://doi.org/10.1007/978-3-540-24688-6_43
25. Bramas B. Inastemp: A Novel Intrinsics-as-Template Library for Portable SIMD-Vectorization. Scientific Programming. 2017;2017(1):5482468. https://doi.org/10.1155/2017/5482468
Published
2024-10-15
How to Cite
VLADIMIROV, Konstantin Igorevich; TETYUSHKIN, Ivan Andreevich. Using Dynamic-Length Vector Operations to Efficiently Emulate Fixed-Length Vector Operations. Modern Information Technologies and IT-Education, [S.l.], v. 20, n. 3, p. 678-686, oct. 2024. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/1155>. Date accessed: 07 feb. 2026. doi: https://doi.org/10.25559/SITITO.020.202403.678-686.
Section
Research and development in the field of new IT and their applications