Stochastic Variational Inequality Method for Fine-Tuning Transformer Models
Abstract
Modern transformers (BERT, RoBERTa) are widely used in natural language processing tasks, but during fine-tuning—especially when partially unfreezing layers—classic stochastic gradient descent (SGD)-based optimizers face problems: non-smooth activations and vanishing gradients slow down and destabilize training. This work examines a method based on the stochastic variational inequality (SVI), which does not depend on activation derivatives, as a more robust and faster way to fine-tune transformers. Evaluation is performed on the binary sentiment classification task (SST-2).
For experiments, a pre-trained BERT model and the SST-2 dataset are used. A comparison between classic SGD and the SVI-based method is carried out in two fine-tuning modes: updating only the classifier, and updating the classifier together with several top model layers. Performance is measured by accuracy on the training and validation sets over multiple runs to account for instability.
SVI shows a notable advantage over SGD in both modes: when updating only the classifier, SVI yields substantially higher validation accuracy, and when unfreezing the top layers the advantage increases further. Additionally, SVI provides faster and more stable convergence and lower run-to-run variability.
SVI demonstrates significant benefits for fine-tuning transformers: improved accuracy, stable convergence, and low result variance. The advantage is linked to the lack of dependence on activation derivatives and, consequently, the avoidance of vanishing-gradient issues. Thus, SVI appears to be a promising solution for fast and reliable fine-tuning of large language models. Main limitations of the approach are the need for additional implementation effort and limited support in existing deep-learning frameworks.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.
