Разработка и анализ методики отбора инфраструктурных метрик для предиктивного мониторинга инцидентов

Andrew Vladimirovich Egorkin

doi:10.25559/SITITO.021.202501.36-45

Andrew Vladimirovich Egorkin Lomonosov Moscow State University; Sberbank of Russia http://orcid.org/0009-0002-9329-3641

DOI: https://doi.org/10.25559/SITITO.021.202501.36-45

Abstract

The growth of telemetry volume in distributed IT systems leads to "information noise" and increases the computational costs of AIOps platforms. This paper proposes a formalized two-stage metric selection procedure designed to improve the accuracy and efficiency of predictive monitoring: (1) a multicriteria correlation filter using Pearson coefficients (|r| > 0.60), Kendall’s τ (> 0.50), and Maximal Information Coefficient (MICe > 0.35) to eliminate redundant and non-linearly related features; (2) verification of causal relationships using the Granger test (lag = 5, p < 0.01), the PCMCI algorithm (FDR = 10%), and the Directed Information metric (DI > 0.1 bits/step) to identify true drivers of the target metric. Experimental validation was conducted on a 14-day fragment of Prometheus metrics from the industrial cluster of the "Sber Antifraud" system (≈7 billion data points, 1379 initial metrics). The results showed a 43% reduction in the Mean Absolute Error (MAE) of 30-minute CPU utilization forecasts, a 14-fold decrease in input time series, and an 89% reduction in model inference time. The methodology is integrated into an industrial data processing pipeline (Prometheus → Kafka → Spark 3.5 → MLflow 2.11) and aligns with the data minimization principle outlined in GOST R 57580.1-2017 and FSTEC guidelines for information protection.

Author Biography

Andrew Vladimirovich Egorkin, Lomonosov Moscow State University; Sberbank of Russia

Master degree student of the Cybersecurity, which is a joint Academic Program with Sberbank, Faculty of Computational Mathematics and Cybernetics; Senior Development Engineer at the Cybersecurity Platform Services Development Department, "Technologies" Block, IT Department of Services and Security Block

References

1. Sokolov I.A., Drozhzhinov V.I., Raikov A.N., et al. On artificial intelligence as a strategic tool for the economic development of the country and the improvement of its public administration. Part 2. On prospects for using artificial intelligence in Russia for public administration. International Journal of Open Information Technologies. 2017;5(9):76-101. (In Russ., abstract in Eng.) EDN: ZEQDMT
2. Lebed S.V. Innovative Technologies in Cybersecurity. Modern Information Technologies and IT-Education. 2022;18(2):383-390. (In Russ., abstract in Eng.) https://doi.org/10.25559/SITITO.18.202202.383-390
3. Ionescu S.-A., Diaconita V., Radu A.-O. Engineering Sustainable Data Architectures for Modern Financial Institutions. Electronics. 2025;14(8):1650. https://doi.org/10.3390/electronics14081650
4. Weinberg A.I., Premebida C., Faria D.R. Causality from Bottom to Top: A Survey. arXiv:2403.11219. 2024. https://doi.org/10.48550/arXiv.2403.11219
5. Naghoosi E., Huang B., Domlan E., Kadali R. Information transfer methods in causality analysis of process variables with an industrial application. Journal of Process Control. 2013;23(9):1296-1305. https://doi.org/10.1016/j.jprocont.2013.02.003
6. Chatfield C. The Holt-Winters Forecasting Procedure. Journal of the Royal Statistical Society. Series C (Applied Statistics). 1978;27(3):264-279. https://doi.org/10.2307/2347162
7. Nashold L., Krishnan R. Using LSTM and SARIMA Models to Forecast Cluster CPU Usage. arXiv:2007.08092. 2020. https://doi.org/10.48550/arXiv.2007.0809
8. Widiputra H., Mailangkay A., Gautama E. Multivariate CNN‐LSTM Model for Multiple Parallel Financial Time‐Series Prediction. Complexity. 2021;2021(1):9903518. https://doi.org/10.1155/2021/9903518
9. Das T., Guchhai S. A hybrid GRU and LSTM-based deep learning approach for multiclass structural damage identification using dynamic acceleration data. Engineering Failure Analysis. 2025;170:109259. https://doi.org/10.1016/j.engfailanal.2024.10925
10. Mienye E., Jere N., Obaido G., Mienye I.D., Aruleba K. Deep Learning in Finance: A Survey of Applications and Techniques. AI. 2024;5(4):2066-2091. https://doi.org/10.3390/ai5040101
11. Battiti R. Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Transactions on Neural Networks. 1994;5(4):537-550. https://doi.org/10.1109/72.298224
12. Vergara J.R., Estévez P.A. A Review of Feature Selection Methods Based on Mutual Information. Neural Computing and Applications. 2024;24:175-186. https://doi.org/10.1007/s00521-013-1368-0
13. Reshef D., et al. Detecting Novel Associations in Large Data Sets. Science. 2011;334(6062):1518-1524. https://doi.org/10.1126/science.1205438
14. Lütkepohl H. New Introduction to Multiple Time Series Analysis. Berlin, Heidelberg: Springer; 2005. 764 p. https://doi.org/10.1007/978-3-540-27752-1
15. Runge J. et al. Detecting Causal Associations in Large Nonlinear Time Series. Science Advances. 2019;5(10):eaau4996. https://doi.org/10.1126/sciadv.aau4996
16. Massey J. Causality, Feedback and Directed Information. Proc. Int. Symp. Information Theory (ISIT 1990). 1990. p. 303-305. Available at: https://www.isiweb.ee.ethz.ch/archive/massey_pub/pdf/BI532.pdf (accessed 13.02.2025).
17. Mienye I. D., Swart T. G., Obaido G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information. 2024;15(9):517. https://doi.org/10.3390/info15090517
18. Gong C., Zhang C., Yao D., Bi J., Li W., Xu Y.-J. Causal Discovery from Temporal Data: An Overview and New Perspectives. ACM Computing Surveys. 2024;57(4):100. https://doi.org/10.1145/3705297
19. Wang K., Tan Y., Zhang L., Chen Z., Lei J. A Network Traffic Prediction Method for AIOps Based on TDA and Attention GRU. Applied Sciences. 2022;12(20):10502. https://doi.org/10.3390/app122010502
20. Cheng X. A Comprehensive Study of Feature Selection Techniques in Machine Learning Models. Artificial Intelligence and Digital Technology. 2024;1(1):65-78. https://doi.org/10.70088/xpf2b276
21. Zhang G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing. 2003;50:159-175. https://doi.org/10.1016/S0925-2312(01)00702-0
22. Pearl J. The mathematics of causal inference. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'11). New York, NY, USA: Association for Computing Machinery; 2011. p. 5. https://doi.org/10.1145/2020408.2020416
23. Gyeera T.W., Simons A.J.H., Stannett M. Regression Analysis of Predictions and Forecasts of Cloud Data Center KPIs Using the Boosted Decision Tree Algorithm. IEEE Transactions on Big Data. 2023;9(4):1071-1085. https://doi.org/10.1109/TBDATA.2022.3230649
24. Greener T., Costanza J., et al. Calculating the Carbon Footprint of AI. IEEE Access. 2023;11:82809-82823.
25. West M., Harrison P. Bayesian Forecasting and Dynamic Models. Springer Series in Statistics. New York: Springer Science & Business Media; 1997. 682 p. https://doi.org/10.1007/b98971

Development and Analysis of a Methodology for Selecting Infrastructure Metrics for Predictive Incident Monitoring

Abstract

Author Biography

References

Journal Sponsorship