Application of Reinforcement Learning and Parallel Programming Technologies for Program Code Generation and Validation

Abstract

At the moment, neural networks are able to create what was previously considered inaccessible: photorealistic faces of people; full-fledged paintings based on rough sketches; any images with a short text description; poems and prose on the first lines or a given topic. All of this has been made possible by rapid advances in areas such as natural language processing and machine vision. Neural networks are capable of generating content based on the data they have memorized during an extensive learning process. Problems for logic, mathematics and logical reasoning are an example of flexible intelligence, and it requires completely different approaches to learning. The study presented in the article proposes the development of a methodology for designing and training neural networks aimed at creating a functioning code. The basis of the study is the possibility of using artificial intelligence, in particular neural networks, to generate code by a machine, that is, AI4Code tasks. The study examined the provisions in favor of the use of reinforcement learning in comparison with language models, as well as the architecture of the environment necessary for such learning. The main research is the focus on the use of Nvidia graphics accelerators and the use of central processes of various architectures. The article discusses the features of creating a learning environment, the advantages and disadvantages of the CUDA platform, and analyzes the potential effectiveness of each of the approaches.


Author Biographies

Vadim Evgenyevich Marchenko, Financial University under the Government of the Russian Federation

student of the Department of Data Analysis and Machine Learning

Petr Vladimirovich Nikitin, Financial University under the Government of the Russian Federation

Associate Professor of the Department of Data Analysis and Machine Learning, Cand. Sci. (Ped.), Associate Professor

Rimma Ivanovna Gorokhova, Financial University under the Government of the Russian Federation

Associate Professor of the Department of Data Analysis and Machine Learning, Cand. Sci. (Ped.), Associate Professor

References

1. Russakovsky O., Deng J., Su H. et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision. 2015;115:211-252. https://doi.org/10.1007/s11263-015-0816-y
2. Puri R. et al. CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. In: Vanschoren J., Yeung S. (eds.) Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Vol. 1. Curran; 2021. p. 1-13. Available at: https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/a5bfc9e07964f8dddeb95fc584cd965d-Paper-round2.pdf (accessed 11.02.2023).
3. Vaswani A., Shazeer N., Parmar N. et al. Attention Is All You Need. In: Guyon I. et al. (eds.) Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Red Hook, NY, USA: Curran Associates Inc.; 2017. Vol. 30. p. 6000-6010. Available at: https://dl.acm.org/doi/pdf/10.5555/3295222.3295349 (accessed 11.02.2023).
4. Brown T.B., Mann B., Ryder N. et al. Language Models are Few-Shot Learners. In: Larochelle H., Ranzato M., Hadsell R., Balcan M.F., Lin H. (eds.) Advances in Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc.; 2020. Article no. 159. p. 1877-1901. Available at: https://papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf (accessed 11.02.2023).
5. Radford A., Wu J., Child R. et al. Language Models are Unsupervised Multitask Learners. OpenAI blog. 2019;1(8):9. Available at: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf (accessed 11.02.2023).
6. Schick T., Schütze H. Few-Shot Text Generation with Natural Language Instructions. In: Moens M.-F., Huang X., Specia L., Wen-tau Yih S. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana, Dominican Republic: Association for Computational Linguistics; 2021. p. 390-402. https://doi.org/10.18653/v1/2021.emnlp-main.32
7. Winata G.I., Madotto A., Lin Z., Liu R., Yosinski J., Fung P. Language Models are Few-shot Multilingual Learners. In: Proceedings of the 1st Workshop on Multilingual Representation Learning. Punta Cana, Dominican Republic: Association for Computational Linguistics; 2021. p. 1-15. https://doi.org/10.18653/v1/2021.mrl-1.1
8. Chen M. et al. Evaluating Large Language Models Trained on Code. arXiv:2107.03374. 2021. Available at: https://arxiv.org/abs/2107.03374 (accessed 11.02.2023).
9. Rubio C., Mella F., Martínez C., Segura A., Vidal C. Exploring Copilot Github to Automatically Solve Programming Problems in Computer Science Courses. In: 2023 42nd IEEE International Conference of the Chilean Computer Science Society (SCCC). Concepcion, Chile: IEEE Computer Society; 2023. p. 1-8. https://doi.org/10.1109/SCCC59417.2023.10315758
10. Drori I. et al. A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. PNAS. 2022;119(32):e2123433119. https://doi.org/10.1073/pnas.2123433119
11. Li Y. et al. Competition-level code generation with AlphaCode. Science. 2022;378(6624):1092-1097. https://doi.org/10.1126/science.abq1158
12. Baker B. et al. Emergent Tool Use From Multi-Agent Autocurricula. In: International Conference on Learning Representations. Addis Ababa, Ethiopia; 2020. p. 1-28. Available at: https://openreview.net/forum?id=SkxpxJBKwS (accessed 11.02.2023).
13. Fan L. et al. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. In: Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022). Track on Datasets and Benchmarks. New Orleans; 2022. p. 1-20. Available at: https://nips.cc/virtual/2022/poster/55737 (accessed 11.02.2023).
14. Fawzi A., Balog M., Huang A. et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature. 2022;610:47-53. https://doi.org/10.1038/s41586-022-05172-4
15. Silver D., Schrittwieser J., Simonyan K. et al. Mastering the game of Go without human knowledge. Nature. 2017;550:354-359. https://doi.org/10.1038/nature24270
16. Silver D., Huang A., Maddison C. et al. Mastering the game of Go with deep neural networks and tree search. Nature. 2016;529:484-489. https://doi.org/10.1038/nature16961
17. Bhagirath, Mittal N., Kumar S. Machine Learning Computation on Multiple GPU's using CUDA and Message Passing Interface. In: 2019 2nd International Conference on Power Energy, Environment and Intelligent Control (PEEIC). Greater Noida, India: IEEE Computer Society; 2019. p. 18-22. https://doi.org/10.1109/PEEIC47157.2019.8976714
18. Diehl P., Seshadri M., Heller T., Kaiser H. Integration of CUDA Processing within the C++ Library for Parallelism and Concurrency (HPX). In: 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). Dallas, TX, USA: IEEE Computer Society; 2018. p. 19-28. https://doi.org/10.1109/ESPM2.2018.00006
19. Kerr A., Diamos G., Yalamanchili S. Modeling GPU-CPU workloads and systems. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3). Association for Computing Machinery, New York, NY, USA; 2010. p. 31-42. https://doi.org/10.1145/1735688.1735696
20. Lustig D., Sahasrabuddhe S., Giroux O. A Formal Analysis of the NVIDIA PTX Memory Consistency Model. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). Association for Computing Machinery, New York, NY, USA; 2019. p. 257-270. https://doi.org/10.1145/3297858.3304043
21. Abdelkhalik H., Arafa Y., Santhi N., Badawy A. -H. A. Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis. In: 2022 IEEE High Performance Extreme Computing Conference (HPEC). Waltham, MA, USA: IEEE Computer Society; 2022. p. 1-8. https://doi.org/10.1109/HPEC55821.2022.9926299
22. van Stigt R., Swatman S.N., Varbanescu A.-L. Isolating GPU Architectural Features Using Parallelism-Aware Microbenchmarks. In: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering (ICPE '22). Association for Computing Machinery, New York, NY, USA; 2022. p. 77-88. https://doi.org/10.1145/3489525.3511673
23. Sun W., Li A., Geng T., Stuijk S., Corporaal H. Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors. IEEE Transactions on Parallel and Distributed Systems. 2023;34(1):246-261. https://doi.org/10.1109/TPDS.2022.3217824
24. Christiano P. et al. Deep reinforcement learning from human preferences. In: Guyon I. et al. (eds.) Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Red Hook, NY, USA: Curran Associates Inc.; 2017. Vol. 30. p. 1-9. Available at: https://proceedings.neurips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html (accessed 11.02.2023).
25. Schieffer G., Peng I. Accelerating Drug Discovery in AutoDock-GPU with Tensor Cores. In: Cano J., Dikaiakos M.D., Papadopoulos G.A., Pericàs M., Sakellariou R. (eds.) Euro-Par 2023: Parallel Processing. Euro-Par 2023. Lecture Notes in Computer Science. Vol. 14100. Springer, Cham; 2023. p. 608-622. https://doi.org/10.1007/978-3-031-39698-4_41
Published
2023-03-30
How to Cite
MARCHENKO, Vadim Evgenyevich; NIKITIN, Petr Vladimirovich; GOROKHOVA, Rimma Ivanovna. Application of Reinforcement Learning and Parallel Programming Technologies for Program Code Generation and Validation. Modern Information Technologies and IT-Education, [S.l.], v. 19, n. 1, p. 163-171, mar. 2023. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/950>. Date accessed: 19 aug. 2025. doi: https://doi.org/10.25559/SITITO.019.202301.163-171.
Section
Research and development in the field of new IT and their applications

Most read articles by the same author(s)