Статический и динамический подходы к преобразованию косвенных переходов

Viacheslav Victorovich Chernonog; Ilia Leonidovich Diachkov; Andrey Dmitrievich Dobrov

Viacheslav Victorovich Chernonog ООО "Техкомпания Хуавэй" http://orcid.org/0009-0007-8407-1082
Ilia Leonidovich Diachkov ООО "Коулмэн Сервисиз" http://orcid.org/0009-0002-9740-7861
Andrey Dmitrievich Dobrov ООО "Техкомпания Хуавэй" http://orcid.org/0009-0007-5074-7873

Аннотация

С ростом популярности модульной парадигмы программирования количество косвенных переходов в продуцируемом коде значительно возросло. Аппаратные способы уменьшить задержки, связанные с такими переходами, требуют сложной логики непосредственно на кристалле, которую зачастую невыгодно реализовывать из-за дополнительного энергопотребления процессора.
Существующие в компиляторе GCC подходы преобразования косвенных переходов позволяют существенно увеличить производительность программ, однако без сбора статистики и перекомпиляции кода, все еще остается большое количество непреобразованных переходов, которые приводят к уменьшению производительности программ.
В этой статьи предлагается два метода повышения производительности, связанных с оптимизацией косвенных переходов. Статический метод позволяет расширить существующие оптимизационные возможности компилятора. Динамический метод является новым подходом подстановки целевых адресов переходов во время выполнения программы.
Наше исследование показывает, что эти подходы способны увеличить производительность отдельных участков программ, содержащих косвенные переходы, до 2.3 раз. Применение статического метода позволило улучшить производительность отдельных тестов из пакета CPUBench на 15%, при увеличении затраченного на компиляцию времени не более чем на 1%.

Сведения об авторах

Viacheslav Victorovich Chernonog, ООО "Техкомпания Хуавэй"

сотрудник

Ilia Leonidovich Diachkov, ООО "Коулмэн Сервисиз"

сотрудник

Andrey Dmitrievich Dobrov, ООО "Техкомпания Хуавэй"

сотрудник, кандидат технических наук

Литература

1. Shalf J. The future of computing beyond Moore s Law. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2020;378(2166):20190061. https://doi.org/10.1098/rsta.2019.0061
2. Lee V.W., Kim C., Chhugani J., Deisher M., Kim D., Nguyen A.D., Satish N., Smelyanskiy M., Chennupaty S., Hammarlund P., Singhal R., Dubey P. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In: Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10). New York, NY, USA: Association for Computing Machinery; 2010. p. 451-460. https://doi.org/10.1145/1815961.1816021
3. Kandemir M., Vijaykrishnan N., Irwin M.J., Ye W. Influence of compiler optimizations on system power. In Proceedings of the 37th Annual Design Automation Conference (DAC '00). New York, NY, USA: Association for Computing Machinery; 2000. p. 304-307. https://doi.org/10.1145/337292.337425
4. Devkota S., Aschwanden P., Kunen A., Legendre M., Isaacs K.E. CcNav: Understanding Compiler Optimizations in Binary Code. IEEE Transactions on Visualization and Computer Graphics. 2021;27(2):667-677. https://doi.org/10.1109/TVCG.2020.3030357
5. Calder B., Grunwald D., Zorn B. Quantifying Behavioral Differences Between C and C++ Programs. Journal of Programming languages. 1994;2(4):313-351. Available at: https://cseweb.ucsd.edu/~calder/abstracts/C++Study.html (accessed 16.03.2023).
6. Suganuma T., et al. Overview of the IBM Java Just-in-Time Compiler. IBM Systems Journal. 2000;39(1):175-193. https://doi.org/10.1147/sj.391.0175
7. Bauer M., Rossow C. NoVT: Eliminating C++ Virtual Calls to Mitigate Vtable Hijacking. In: 2021 IEEE European Symposium on Security and Privacy (EuroS&P). Vienna, Austria: IEEE Computer Society; 2021. p. 650-666. https://doi.org/10.1109/EuroSP51992.2021.00049
8. Shah A., Ryder B.G. Function Pointers in C An Empirical Study. Rutgers University; 1995. https://doi.org/10.7282/T3X92FRK
9. Mittal S. A survey of techniques for dynamic branch prediction. Concurrency and Computation: Practice and Experience. 2019;31(1):e4666. https://doi.org/10.1002/cpe.4666
10. Driesen K., Hölzle U. Accurate indirect branch prediction. ACM SIGARCH Computer Architecture News. 1998;26(3):167-178. https://doi.org/10.1145/279361.279380
11. Kim H., Joao J.A., Mutlu O., Lee C.J., Patt Y.N., Cohn R. VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization. ACM SIGARCH Computer Architecture News. 2007;35(2):424-435. https://doi.org/10.1145/1273440.1250715
12. Namolaru M. Devirtualization in GCC. In: Proceedings of the GCC Developers Summit. Ottawa, Ontario Canada; 2006. p. 125-133. Available at: https://gcc.gnu.org/pub/gcc/summit/2006-GCC-Summit-Proceedings.pdf (accessed 16.03.2023).
13. Padlewski P. Devirtualization in LLVM. In: Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH Companion 2017). New York, NY, USA: Association for Computing Machinery; 2017. p. 42-44. https://doi.org/10.1145/3135932.3135947
14. Li D.X., Ashok R., Hundt R. Lightweight feedback-directed cross-module optimization. In: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization (CGO '10). New York, NY, USA: Association for Computing Machinery; 2010. p. 53-61. https://doi.org/10.1145/1772954.1772964
15. Pande H.D., Ryder B.G. Data-flow-based virtual function resolution. In: Cousot R., Schmidt D.A. (eds.) Static Analysis. SAS 1996. Lecture Notes in Computer Science. Vol. 1145. Berlin, Heidelberg: Springer; 1996. p. 238-254. https://doi.org/10.1007/3-540-61739-6_45
16. Garza E., Mirbagher-Ajorpaz S., Khan T.A., Jiménez D.A. Bit-level perceptron prediction for indirect branches. In: Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). New York, NY, USA: Association for Computing Machinery; 2019. p. 27-38. https://doi.org/10.1145/3307650.3322217
17. Hamerly G., Perelman E., Lau J., Calder B. SimPoint 3.0: Faster and More Flexible Program Phase Analysis. Journal of Instruction-Level Parallelism. 2005;7(4):1-28. Available at: https://jilp.org/vol7/v7paper14.pdf (accessed 16.03.2023).
18. Perelman E., Hamerly G., Van Biesbrouck M., Sherwood T., Calder B. Using SimPoint for accurate and efficient simulation. ACM SIGMETRICS Performance Evaluation Review. 2003;31(1):318-319. https://doi.org/10.1145/885651.781076
19. Bucek J., Lange K.D., v. Kistowski J. SPEC CPU2017: Next-generation compute benchmark. In: Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (ICPE '18). New York, NY, USA: Association for Computing Machinery; 2018. p. 41-42. https://doi.org/10.1145/3185768.3185771
20. Shi H.-k., Wang Ze-s., Zhang Shi-z., Gao X., Zhao Y-j. Performance Evaluation Benchmark of General-Purpose CPUA Survey. ACTA ELECTONICA SINICA. 2023;51(1):246-256. (In Chi., abstract in Eng.) https://doi.org/10.12263/DZXB.20220169
21. Stephenson M., Rangan R., Keckler S.W. Cooperative Profile Guided Optimizations. Computer Graphics Forum. 2021;40(8):71-83. https://doi.org/10.1111/cgf.14382
22. Stephens N. Armv8-a next-generation vector architecture for HPC. In: 2016 IEEE Hot Chips 28 Symposium (HCS). Cupertino, CA, USA: IEEE Computer Society; 2016. p. 1-31. https://doi.org/10.1109/HOTCHIPS.2016.7936203
23. Fenton T., Kennedy P. The Future of ESXi on Arm. In: Running ESXi on a Raspberry Pi. Berkeley, CA: Apress; 2021. p. 355-368. https://doi.org/10.1007/978-1-4842-7465-1_14
24. Bacon D.F., Sweeney P.F. Fast static analysis of C++ virtual function calls. In: OOPSLA '96: Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA '96). New York, NY, USA: Association for Computing Machinery; 1996. p. 324-341. https://doi.org/10.1145/236337.236371
25. Lu K., Hu H. Where Does It Go? Refining Indirect-Call Targets with Multi-Layer Type Analysis. In:Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS '19). New York, NY, USA: Association for Computing Machinery; 2019. p. 1867-1881. https://doi.org/10.1145/3319535.3354244
26. Berlin D. Structure aliasing in GCC. In: Proceedings of the GCC Developers Summit. Ottawa, Ontario Canada; 2005. p. 25-35. Available at: https://gcc.gnu.org/wiki/HomePage?action=AttachFile&do=get&target=2005-GCC-Summit-Proceedings.pdf (accessed 16.03.2023).
27. Pearce D.J., Kelly P.H.J., Hankin C. Efficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems. 2007;30(1):4-es. https://doi.org/10.1145/1290520.1290524
28. Glek T., Hubicka J. Optimizing real world applications with GCC Link Time Optimization. In: Proceedings of the 2010 GCC Developers' Summit. Ottawa, Ontario, Canada; 2010. p. 25-45. Available at: https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=2010-GCC-Summit-Proceedings.pdf (accessed 16.03.2023).
29. Ishizaki K., Kawahito M., Yasue T., Komatsu H., Nakatani T. A study of devirtualization techniques for a Java Just-In-Time compiler. ACM SIGPLAN NOTICES. 2000;35(10):294-310. https://doi.org/10.1145/354222.353191
30. Cravford K.M., Blackstone Jr J.H., Cox J.F. A study of JIT implementation and operating problems. The International Journal Of Production Research. 1988;26(9):1561-1568. https://doi.org/10.1080/00207548808947966