The Impact of Batch Size on the Quality of Training of Neural Networks

Abstract

Neural networks are trained using gradient descent, an optimization technique in which the error estimate used to update the weights of the neural network model is calculated based on a subset of the training dataset. The number of examples from the training dataset used to estimate the error gradient is called the batch size, and is an important hyperparameter that affects the dynamics of the learning algorithm. The article analyzes the impact of the batch size for neural networks of various types - deep learning neural networks, convolutional, recurrent and large language models on the accuracy of forecasting. However, as it turned out during the study, the repeated mention in the sources that the size of the batch size affects the speed of learning, in practice, this statement was not confirmed by experimental values. For this purpose, an experiment was conducted to check the impact of the size of the training sample packet not only on recognition accuracy and the amount of losses (the difference between the obtained prediction value and the real one), but also on the time spent on the learning process. The results of the study of the batch size revealed that it has a decisive influence on the accuracy of image recognition of convolutional neural networks, recurrent neural networks, deep learning neural networks and large language models. The larger the parameter value, the higher the prediction accuracy. On the other hand, a large value of the packet size leads to an increase in the requirements for computing resources.

Author Biographies

Andrey Anatolyevich Lisov, South Ural State University (National Research University)

Postgraduate student of the Department of Electric Drive, Mechatronics and Electromechanics

Alexander Grigoryevich Vozmilov, South Ural State University (National Research University)

Professor of the Department of Electric Drive, Mechatronics and Electromechanics, Dr. Sci. (Eng.)

Vil Gubaevich Urmanov, Bashkir State Agrarian University

Associate Professor of Applied Mechanics and Computer Engineering Department, Cand. Sci. (Eng.)

Sergei Alexeyevich Panishev, South Ural State University (National Research University)

Postgraduate student of the Department of Electric Drive, Mechatronics and Electromechanics

References

1. LeCun Y.A., Bottou L., Orr G.B., Müller KR. Efficient BackProp. In: Montavon G., Orr G.B., Müller KR. (eds.) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science. Vol. 7700. Berlin, Heidelberg: Springer; 2012. p. 9-48. https://doi.org/10.1007/978-3-642-35289-8_3
2. Diamos G., Sengupta S., Catanzaro B., Chrzanowski M., Coates, A., Elsen E., Engel J., Hannun A., Satheesh S. Persistent RNNs: Stashing recurrent weights on-chip. In: Balcan M.F., Weinberger K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. New York, New York, USA: PMLR; 2016. Vol. 48. p. 2024-2033. Available at: https://proceedings.mlr.press/v48/diamos16.html (accessed 26.02.2023).
3. Keskar N.S., Mudigere D., Nocedal J., Smelyanskiy M., Tang P.T.P. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In: 5th International Conference on Learning Representations (ICLR 2017). Toulon, France; 2017. p. 1-16. Available at: https://openreview.net/forum?id=H1oyRlYgg (accessed 26.02.2023).
4. Goyal P., Dollar P., Girshick R., Noordhuis P., Wesolowski L., Kyrola A., Tulloch A., Jia Y., He K. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv:1706.02677. 2017. p. 1-12. https://doi.org/10.48550/arXiv.1706.02677
5. Jastrzebski S., Kenton Z., Arpit D., Ballas N., Fischer, A., Bengio Y., Storkey A. Finding Flatter Minima with SGD. In: 6th International Conference on Learning Representations (ICLR 2018 Workshop Track). Vancouver Convention Center, Vancouver, BC, Canada; 2018. p. 1-4. Available at: https://openreview.net/forum?id=r1VF9dCUG (accessed 26.02.2023).
6. Devarakonda A., Naumov M., Garland M. AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks. In: 6th International Conference on Learning Representations (ICLR 2018 Workshop Track). Vancouver Convention Center, Vancouver, BC, Canada; 2018. p. 1-4. Available at: https://openreview.net/forum?id=SkytjjU8G (accessed 26.02.2023).
7. Smith S.L., Kindermans P., Ying C., Le Q.V. Don t Decay the Learning Rate, Increase the Batch Size. In: 6th International Conference on Learning Representations (ICLR 2018 Workshop Track). Vancouver Convention Center, Vancouver, BC, Canada; 2018. p. 1-11. Available at: https://openreview.net/forum?id=B1Yy1BxCZ (accessed 26.02.2023).
8. Vozmilov A., Andreev L., Lisov A. Development of an Algorithm for the Program to Recognize Defects on the Surface of Hot-Rolled Metal. In: 2022 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM). Sochi, Russian Federation: IEEE Computer Society; 2022. p. 1004-1008. https://doi.org/10.1109/ICIEAM54945.2022.9787116
9. Vozmilov A., Urmanov V., Lisov A. Using Computer Vision to Recognize Defects on the Surface of Hot-rolled Steel. In: 2022 International Ural Conference on Electrical Power Engineering (UralCon). Magnitogorsk, Russian Federation: IEEE Computer Society; 2022. p. 21-25. https://doi.org/10.1109/UralCon54942.2022.9906737
10. Lisov A.A., Kulganatov A.Z., Panishev S.A. Using convolutional neural networks for acoustic-based emergency vehicle detection. Modern Transportation Systems And Technologies. 2023;9(1):95-107. (In Russ., abstract in Eng.) https://doi.org/10.17816/transsyst20239195-107
11. Vozmilov A.G., Lisov A.A., Urmanov V.G., Sineva G.N. Determination of the type of potato leaves diseases with using machine learning. Bulletin NGIEI. 2023;3(142):7-16. (In Russ., abstract in Eng.) https://doi.org/10.24412/2227-9407-2023-3-7-16.
12. Kaplan J., McCandlish S., Henighan T., Brown T.B., Chess B., Child R., Gray S., Radford A., Wu J., Amodei D. Scaling Laws for Neural Language Models. arXiv:2001.08361. 2020. https://doi.org/10.48550/arXiv.2001.08361
13. Van den Oord A., Dieleman S., Schrauwen B. Deep content-based music recommendation. In: Burges C.J., Bottou L., Welling M., Ghahramani Z., Weinberger K.Q. (eds.) Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2013. Available at: https://papers.nips.cc/paper_files/paper/2013/hash/b3ba8f1bee1238a2f37603d90b58898d-Abstract.html (accessed 26.02.2023).
14. Collobert R., Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning (ICML '08). New York, NY, USA: Association for Computing Machinery; 2008. p. 160-167. https://doi.org/10.1145/1390156.1390177
15. Avilov O., Rimbert S., Popov A., Bougrain L. Deep Learning Techniques to Improve Intraoperative Awareness Detection from Electroencephalographic Signals. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). Montreal, QC, Canada: IEEE Computer Society; 2020. p. 142-145. https://doi.org/10.1109/EMBC44109.2020.9176228
16. Tsantekidis A., Passalis N., Tefas A., Kanniainen J., Gabbouj M., Iosifidis A. Forecasting Stock Prices from the Limit Order Book Using Convolutional Neural Networks. In: 2017 IEEE 19th Conference on Business Informatics (CBI). Thessaloniki, Greece: IEEE Computer Society; 2017. p. 7-12. https://doi.org/10.1109/CBI.2017.23
17. Radiuk P.M. Impact of Training Set Batch Size on the Performance of Convolutional Neural Networks for Diverse Datasets. Information Technology and Management Science. 2017;20(1):20-24. https://doi.org/10.1515/itms-2017-0003
18. Mishkin D., Sergievskiy N., Matas J. Systematic evaluation of convolution neural network advances on the Imagenet. Computer vision and image understanding. 2017;161:11-19. https://doi.org/10.1016/j.cviu.2017.05.007
19. Bagby T., Rao K., Sim K.C. Efficient Implementation of Recurrent Neural Network Transducer in Tensorflow. In: 2018 IEEE Spoken Language Technology Workshop (SLT). Athens, Greece: IEEE Computer Society; 2018. p. 506-512. https://doi.org/10.1109/SLT.2018.8639690
20. He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE Computer Society; 2016. p. 770-778. https://doi.org/10.1109/CVPR.2016.90
21. Krizhevsky A. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2. 2014. https://doi.org/10.48550/arXiv.1404.5997
22. Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In: 3rd International Conference on Learning Representations (ICLR 2015). arXiv:1409.1556. 2015. p. 1-15. https://doi.org/10.48550/arXiv.1409.1556
23. Takác M., Bijral A., Richtárik P., Srebro N. Mini-Batch Primal and Dual Methods for SVMs. In: Dasgupta S., McAllester D. (eds.) Proceedings of the 30th International Conference on Machine Learning (PMLR). 2013;28(3):1022-1030. Available at: https://proceedings.mlr.press/v28/takac13.html (accessed 26.02.2023).
24. Wilson D.R., Martinez T.R. The general inefficiency of batch training for gradient descent learning. Neural networks. 2003;16(10):1429-1451. https://doi.org/10.1016/S0893-6080(03)00138-2
25. Li M., Zhang T., Chen Y., Smola A.J. Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '14). New York, NY, USA: Association for Computing Machinery; 2014. p. 661-670. https://doi.org/10.1145/2623330.2623612
26. Lin Z., Courbariaux M., Memisevic R., Bengio Y. Neural Networks with Few Multiplications. In: Bengio Y., LeCun Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016. San Juan, Puerto Rico, May 2-4, 2016. Conference Track Proceedings. 2016. https://doi.org/10.48550/arXiv.1510.03009
Published
2023-06-30
How to Cite
LISOV, Andrey Anatolyevich et al. The Impact of Batch Size on the Quality of Training of Neural Networks. Modern Information Technologies and IT-Education, [S.l.], v. 19, n. 2, p. 324-332, june 2023. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/952>. Date accessed: 14 sep. 2025. doi: https://doi.org/10.25559/SITITO.019.202302.324-332.
Section
Theoretical Questions of Computer Science, Computer Mathematics