The Impact of Batch Size on the Quality of Training of Neural Networks
Abstract
Neural networks are trained using gradient descent, an optimization technique in which the error estimate used to update the weights of the neural network model is calculated based on a subset of the training dataset. The number of examples from the training dataset used to estimate the error gradient is called the batch size, and is an important hyperparameter that affects the dynamics of the learning algorithm. The article analyzes the impact of the batch size for neural networks of various types - deep learning neural networks, convolutional, recurrent and large language models on the accuracy of forecasting. However, as it turned out during the study, the repeated mention in the sources that the size of the batch size affects the speed of learning, in practice, this statement was not confirmed by experimental values. For this purpose, an experiment was conducted to check the impact of the size of the training sample packet not only on recognition accuracy and the amount of losses (the difference between the obtained prediction value and the real one), but also on the time spent on the learning process. The results of the study of the batch size revealed that it has a decisive influence on the accuracy of image recognition of convolutional neural networks, recurrent neural networks, deep learning neural networks and large language models. The larger the parameter value, the higher the prediction accuracy. On the other hand, a large value of the packet size leads to an increase in the requirements for computing resources.
References
2. Diamos G., Sengupta S., Catanzaro B., Chrzanowski M., Coates, A., Elsen E., Engel J., Hannun A., Satheesh S. Persistent RNNs: Stashing recurrent weights on-chip. In: Balcan M.F., Weinberger K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. New York, New York, USA: PMLR; 2016. Vol. 48. p. 2024-2033. Available at: https://proceedings.mlr.press/v48/diamos16.html (accessed 26.02.2023).
3. Keskar N.S., Mudigere D., Nocedal J., Smelyanskiy M., Tang P.T.P. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In: 5th International Conference on Learning Representations (ICLR 2017). Toulon, France; 2017. p. 1-16. Available at: https://openreview.net/forum?id=H1oyRlYgg (accessed 26.02.2023).
4. Goyal P., Dollar P., Girshick R., Noordhuis P., Wesolowski L., Kyrola A., Tulloch A., Jia Y., He K. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv:1706.02677. 2017. p. 1-12. https://doi.org/10.48550/arXiv.1706.02677
5. Jastrzebski S., Kenton Z., Arpit D., Ballas N., Fischer, A., Bengio Y., Storkey A. Finding Flatter Minima with SGD. In: 6th International Conference on Learning Representations (ICLR 2018 Workshop Track). Vancouver Convention Center, Vancouver, BC, Canada; 2018. p. 1-4. Available at: https://openreview.net/forum?id=r1VF9dCUG (accessed 26.02.2023).
6. Devarakonda A., Naumov M., Garland M. AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks. In: 6th International Conference on Learning Representations (ICLR 2018 Workshop Track). Vancouver Convention Center, Vancouver, BC, Canada; 2018. p. 1-4. Available at: https://openreview.net/forum?id=SkytjjU8G (accessed 26.02.2023).
7. Smith S.L., Kindermans P., Ying C., Le Q.V. Don t Decay the Learning Rate, Increase the Batch Size. In: 6th International Conference on Learning Representations (ICLR 2018 Workshop Track). Vancouver Convention Center, Vancouver, BC, Canada; 2018. p. 1-11. Available at: https://openreview.net/forum?id=B1Yy1BxCZ (accessed 26.02.2023).
8. Vozmilov A., Andreev L., Lisov A. Development of an Algorithm for the Program to Recognize Defects on the Surface of Hot-Rolled Metal. In: 2022 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM). Sochi, Russian Federation: IEEE Computer Society; 2022. p. 1004-1008. https://doi.org/10.1109/ICIEAM54945.2022.9787116
9. Vozmilov A., Urmanov V., Lisov A. Using Computer Vision to Recognize Defects on the Surface of Hot-rolled Steel. In: 2022 International Ural Conference on Electrical Power Engineering (UralCon). Magnitogorsk, Russian Federation: IEEE Computer Society; 2022. p. 21-25. https://doi.org/10.1109/UralCon54942.2022.9906737
10. Lisov A.A., Kulganatov A.Z., Panishev S.A. Using convolutional neural networks for acoustic-based emergency vehicle detection. Modern Transportation Systems And Technologies. 2023;9(1):95-107. (In Russ., abstract in Eng.) https://doi.org/10.17816/transsyst20239195-107
11. Vozmilov A.G., Lisov A.A., Urmanov V.G., Sineva G.N. Determination of the type of potato leaves diseases with using machine learning. Bulletin NGIEI. 2023;3(142):7-16. (In Russ., abstract in Eng.) https://doi.org/10.24412/2227-9407-2023-3-7-16.
12. Kaplan J., McCandlish S., Henighan T., Brown T.B., Chess B., Child R., Gray S., Radford A., Wu J., Amodei D. Scaling Laws for Neural Language Models. arXiv:2001.08361. 2020. https://doi.org/10.48550/arXiv.2001.08361
13. Van den Oord A., Dieleman S., Schrauwen B. Deep content-based music recommendation. In: Burges C.J., Bottou L., Welling M., Ghahramani Z., Weinberger K.Q. (eds.) Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2013. Available at: https://papers.nips.cc/paper_files/paper/2013/hash/b3ba8f1bee1238a2f37603d90b58898d-Abstract.html (accessed 26.02.2023).
14. Collobert R., Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning (ICML '08). New York, NY, USA: Association for Computing Machinery; 2008. p. 160-167. https://doi.org/10.1145/1390156.1390177
15. Avilov O., Rimbert S., Popov A., Bougrain L. Deep Learning Techniques to Improve Intraoperative Awareness Detection from Electroencephalographic Signals. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). Montreal, QC, Canada: IEEE Computer Society; 2020. p. 142-145. https://doi.org/10.1109/EMBC44109.2020.9176228
16. Tsantekidis A., Passalis N., Tefas A., Kanniainen J., Gabbouj M., Iosifidis A. Forecasting Stock Prices from the Limit Order Book Using Convolutional Neural Networks. In: 2017 IEEE 19th Conference on Business Informatics (CBI). Thessaloniki, Greece: IEEE Computer Society; 2017. p. 7-12. https://doi.org/10.1109/CBI.2017.23
17. Radiuk P.M. Impact of Training Set Batch Size on the Performance of Convolutional Neural Networks for Diverse Datasets. Information Technology and Management Science. 2017;20(1):20-24. https://doi.org/10.1515/itms-2017-0003
18. Mishkin D., Sergievskiy N., Matas J. Systematic evaluation of convolution neural network advances on the Imagenet. Computer vision and image understanding. 2017;161:11-19. https://doi.org/10.1016/j.cviu.2017.05.007
19. Bagby T., Rao K., Sim K.C. Efficient Implementation of Recurrent Neural Network Transducer in Tensorflow. In: 2018 IEEE Spoken Language Technology Workshop (SLT). Athens, Greece: IEEE Computer Society; 2018. p. 506-512. https://doi.org/10.1109/SLT.2018.8639690
20. He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE Computer Society; 2016. p. 770-778. https://doi.org/10.1109/CVPR.2016.90
21. Krizhevsky A. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2. 2014. https://doi.org/10.48550/arXiv.1404.5997
22. Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In: 3rd International Conference on Learning Representations (ICLR 2015). arXiv:1409.1556. 2015. p. 1-15. https://doi.org/10.48550/arXiv.1409.1556
23. Takác M., Bijral A., Richtárik P., Srebro N. Mini-Batch Primal and Dual Methods for SVMs. In: Dasgupta S., McAllester D. (eds.) Proceedings of the 30th International Conference on Machine Learning (PMLR). 2013;28(3):1022-1030. Available at: https://proceedings.mlr.press/v28/takac13.html (accessed 26.02.2023).
24. Wilson D.R., Martinez T.R. The general inefficiency of batch training for gradient descent learning. Neural networks. 2003;16(10):1429-1451. https://doi.org/10.1016/S0893-6080(03)00138-2
25. Li M., Zhang T., Chen Y., Smola A.J. Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '14). New York, NY, USA: Association for Computing Machinery; 2014. p. 661-670. https://doi.org/10.1145/2623330.2623612
26. Lin Z., Courbariaux M., Memisevic R., Bengio Y. Neural Networks with Few Multiplications. In: Bengio Y., LeCun Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016. San Juan, Puerto Rico, May 2-4, 2016. Conference Track Proceedings. 2016. https://doi.org/10.48550/arXiv.1510.03009

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.