Stabilizing Elastic Weight Consolidation Method in Practical ML Tasks and Using Weight Importance’s for Neural Network Pruning

Abstract

This work focuses on the practical application of Elastic Weight Consolidation (EWC) for sequential training of neural networks on several training sets. In it, we will more rigorously compare the well-known methodologies for calculating the importance of weights used in the method of fixing weights. These are the Memory Aware Synapses (MAS), Synaptic Intelligence (SI) methodologies and the calculation of the importance of weights based on the Fisher information matrix from the original work on EWC. We will review these methodologies in the application to deep neural networks with fully connected and matched layers, find optimal hyperparameters for each of the methodologies, and compare the results of sequential learning of the neural network when using them. Next, we will point out the problems that arise when applying the method of elastic weight pinning in deep neural networks with convolutional layers and self-attention layers, such as the "explosion of gradients" and the loss of significant information in the gradient when using its norm constraint (gradient clipping). Then, we will propose a method for stabilizing the elastic weight fixing method that helps to solve these problems, evaluate this method in comparison with the original methodology, and show that the proposed stabilization method copes with the task of retaining skills in sequential training no worse than the original EWC, but, at the same time, does not have its disadvantages. In conclusion, it is interesting to note the use of different types of weights in the neural network’s pruning problem.

Author Biographies

Alexey Anatolyevich Kutalev, PJSC "Sberbank of Russia"

MSc in Mathematics, Senior Software Developer of the Division of Experimental Machine Learning Systems

Alisa Alekseevna Lapina, PJSC "Sberbank of Russia"

MSc in Robotics, Software Developer of the Laboratory of Neuroscience and Human Behavior

References

1. McCloskey M., Cohen N.J. Catastrophic interference in connectionist networks: The sequential learning problem. In: Ed. by G. H. Bower. The Psychology of Learning and Motivation, vol. 24. Academic, New York; 1989. p. 109-165. (In Eng.) DOI: https://doi.org/10.1016/S0079-7421(08)60536-8
2. McClelland J.L., McNaughton B.L., O’Reilly R.C. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review. 1995; 102(3):419-457. (In Eng.) DOI: https://doi.org/10.1037/0033-295X.102.3.419
3. French R.M. Catastrophic forgetting in connectionist networks. Trends in Cognitive Science. 1999; 3(4):128-135. (In Eng.) DOI: https://doi.org/10.1016/S1364-6613(99)01294-2
4. Goodfellow I.J., Mirza M., Xiao D., Courville A.C., Bengio Y. An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks. Proceedings of International Conference on Learning Representations (ICLR'2014). Banff, Canada; 2014. Available at: https://arxiv.org/pdf/1312.6211.pdf (accessed 23.05.2021). (In Eng.)
5. Kirkpatrick J., Pascanu R., Rabinowitz N., Veness J., Desjardins G., Rusu A.A., Milan K., Quan J., Ramalho T., Grabska-Barwinska A., Hassabis D., Clopath C., Kumaran D., Hadsell R. Overcoming catastrophic forgetting in neural networks. Proceeding of the National Academy of Science. 2017; 114(13):3521-3526. (In Eng.) DOI: https://doi.org/10.1073/pnas.1611835114
6. Huszár F. Note on the quadratic penalties in elastic weight consolidation. Proceeding of the National Academy of Science. 2018; 115(11):2496-2497. (In Eng.) DOI: https://doi.org/10.1073/pnas.1717042115
7. Zenke F., Poole B., Ganguli S. Continual Learning Through Synaptic Intelligence. Proceedings of the 34th International Conference on Machine Learning (ICML'17), vol. 70. JMLR.org; 2017. p. 3987-3995. (In Eng.)
8. Aljundi R., Babiloni F., Elhoseiny M., Rohrbach M., Tuytelaars T. Memory Aware Synapses: Learning What (not) to Forget. In: Ed. by V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss. Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol. 11207. Springer, Cham; 2018. p. 144-161. (In Eng.) DOI: https://doi.org/10.1007/978-3-030-01219-9_9
9. Kutalev A.A. Natural Way to Overcome Catastrophic Forgetting in Neural Networks. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2020; 16(2):331-337. (In Russ., abstract in Eng.) DOI: https://doi.org/10.25559/SITITO.16.202002.331-337
10. Thangarasa V., Miconi T., Taylor G.W. Enabling Continual Learning with Differentiable Hebbian Plasticity. 2020 International Joint Conference on Neural Networks (IJCNN). IEEE Press, Glasgow, UK; 2020. p. 1-8. (In Eng.) DOI: https://doi.org/10.1109/IJCNN48605.2020.9206764
11. van Garderen K., van der Voort S., Incekara F., Smits M., Klein S. Towards continuous learning for glioma segmentation with elastic weight consolidation. International Conference on Medical Imaging with Deep Learning. London, United Kingdom; 2019. Available at: https://openreview.net/forum?id=Hkx_ry0NcN (accessed 23.05.2021). (In Eng.)
12. Madasu A., Vijjini A.R. Sequential Domain Adaptation through Elastic Weight Consolidation for Sentiment Analysis. 2020 25th International Conference on Pattern Recognition (ICPR). IEEE Press, Milan, Italy; 2021. p. 4879-4886. (In Eng.) DOI: https://doi.org/10.1109/ICPR48806.2021.9412617
13. Gupta S., Singh P., Chang K., et al. Addressing catastrophic forgetting for medical domain expansion. arXiv:2103.13511. 2021. Available at: https://arxiv.org/pdf/2103.13511.pdf (accessed 23.05.2021). (In Eng.)
14. Miconi T., Stanley K.O., Clune J. Differentiable plasticity: training plastic neural networks with backpropagation. Proceedings of the 35th International Conference on Machine Learning (ICML'2018), vol. 80. PMLR; 2018. p. 3559-3568. (In Eng.)
15. Zenke F., Gerstner W., Ganguli S. The temporal paradox of hebbian learning and homeostatic plasticity. Current Opinion in Neurobiology. 2017; 43:166-176. (In Eng.) DOI: https://doi.org/10.1016/j.conb.2017.03.015
16. LeCun Y., Denker J., Solla S. Optimal Brain Damage. In: Ed. by D. Touretzky. Advances in Neural Information Processing Systems, vol. 2. Morgan-Kaufmann; 1989. p. 598-605. Available at: https://proceedings.neurips.cc/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf (accessed 23.05.2021). (In Eng.)
17. Chechik G., Meilijson I., Ruppin E. Synaptic Pruning in Development: A Computational Account. Neural Computation. 1998; 10(7):1759-1777. (In Eng.) DOI: https://doi.org/10.1162/089976698300017124.
18. Hassibi B., Stork D.G., Wolff G.J. Optimal Brain Surgeon and General Network Pruning. IEEE International Conference on Neural Networks. 1993; 1:293-299. (In Eng.) DOI: https://doi.org/10.1109/ICNN.1993.298572
19. Sietsma J., Dow R.J.F. Neural net pruning-why and how. IEEE 1988 International Conference on Neural Networks, vol. 1. IEEE Press, San Diego, CA, USA; 1988. p. 325-333. (In Eng.) DOI: https://doi.org/10.1109/ICNN.1988.23864
20. Mozer M.C., Smolensky P. Skeletonization: a technique for trimming the fat from a network via relevance assessment. In: Ed. by D. Touretzky. Advances in Neural Network Information Processing Systems, vol. 1. Morgan Kaufmann; 1989. p. 107-115. Available at: https://proceedings.neurips.cc/paper/1988/file/07e1cd7dca89a1678042477183b7ac3f-Paper.pdf (accessed 23.05.2021). (In Eng.)
21. Blalock D., Ortiz J.J.G., Frankle J., Guttag J. What is the State of Neural Network Pruning? Proceedings of the 3rd MLSys Conference. Austin, TX, USA; 2020. Available at: https://arxiv.org/pdf/2003.03033.pdf (accessed 23.05.2021). (In Eng.)
22. Zacarias A., Alexandre L.A. Overcoming Catastrophic Forgetting in Convolutional Neural Networks by Selective Network Augmentation. Artificial Neural Networks in Pattern Recognition. 2018. p. 102-112. (In Eng.) DOI: https://doi.org/10.1007/978-3-319-99978-4_8
23. Li H., Barnaghi P., Enshaeifar S., Ganz F. Continual Learning Using Multi-view Task Conditional Neural Networks. Journal of LaTEX Class Files. 2015; 14(8):1-10. Available at: https://arxiv.org/pdf/2005.05080.pdf (accessed 23.05.2021). (In Eng.)
24. Kumaran D., Hassabis D., McClelland J.L. What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in Cognitive Sciences. 2016; 20(7):512-534. (In Eng.) DOI: https://doi.org/10.1016/j.tics.2016.05.004
25. Li Z., Hoiem D. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2018; 40(12):2935-2947. (In Eng.) DOI: https://doi.org/10.1109/TPAMI.2017.2773081
26. Parisi G.I., Kemker R., Part J.L., Kanan C., Wermter S. Continual lifelong learning with neural networks: A review. Neural Networks. 2019; 113:54-71. (In Eng.) DOI: https://doi.org/10.1016/j.neunet.2019.01.012
27. Masse N.Y., Grant G.D., Freedman D.J. Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization. Proceedings of the National Academy of Sciences. 2018; 115(44):E10467-E10475. (In Eng.) DOI: https://doi.org/10.1073/pnas.1803839115
28. Mirzadeh S.I., Farajtabar M., Ghasemzadeh H. Dropout as an Implicit Gating Mechanism for Continual Learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle, WA, USA; 2020. p. 945-951. (In Eng.) DOI: https://doi.org/10.1109/CVPRW50498.2020.00124
29. Soltoggio A., Stanley K.O., Risi S. Born to learn: The inspiration, progress, and future of evolved plastic artificial neural networks. Neural Networks. 2018; 108:48-67. (In Eng.) DOI: https://doi.org/10.1016/j.neunet.2018.07.013
30. Lee K., Lee K., Shin J., Lee H. Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild. IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South); 2019. p. 312-321. (In Eng.) DOI: https://doi.org/10.1109/ICCV.2019.00040
31. Rostami M., Kolouri S., Pilly P.K. Complementary Learning for Overcoming Catastrophic Forgetting Using Experience Replay. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). Macao; 2019. p. 3339-3345. (In Eng.) DOI: https://doi.org/10.24963/ijcai.2019/463
32. Schak M., Gepperth A. A Study on Catastrophic Forgetting in Deep LSTM Networks. In: Ed. by I. Tetko, V. Kůrková, P. Karpov, F. Theis. Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science, vol. 11728. Springer, Cham; 2019. p. 714-728. (In Eng.) DOI: https://doi.org/10.1007/978-3-030-30484-3_56
33. Ribeiro J., Melo F.S., Dias J. Multi-task Learning and Catastrophic Forgetting in Continual Reinforcement Learning. In: Ed. by D. Calvanese, L. Iocchi. Proceedings of the 5th Global Conference on Artificial Intelligence (GCAI-2019). EPiC Series in Computing. 2019; 65:163-175. (In Eng.) DOI: https://doi.org/10.29007/g7bg
Published
2021-06-30
How to Cite
KUTALEV, Alexey Anatolyevich; LAPINA, Alisa Alekseevna. Stabilizing Elastic Weight Consolidation Method in Practical ML Tasks and Using Weight Importance’s for Neural Network Pruning. Modern Information Technologies and IT-Education, [S.l.], v. 17, n. 2, p. 345-354, june 2021. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/741>. Date accessed: 05 nov. 2025. doi: https://doi.org/10.25559/SITITO.17.202102.345-354.
Section
Research and development in the field of new IT and their applications