Development and Analysis of a Methodology for Selecting Infrastructure Metrics for Predictive Incident Monitoring
Abstract
The growth of telemetry volume in distributed IT systems leads to "information noise" and increases the computational costs of AIOps platforms. This paper proposes a formalized two-stage metric selection procedure designed to improve the accuracy and efficiency of predictive monitoring: (1) a multicriteria correlation filter using Pearson coefficients (|r| > 0.60), Kendall’s τ (> 0.50), and Maximal Information Coefficient (MICe > 0.35) to eliminate redundant and non-linearly related features; (2) verification of causal relationships using the Granger test (lag = 5, p < 0.01), the PCMCI algorithm (FDR = 10%), and the Directed Information metric (DI > 0.1 bits/step) to identify true drivers of the target metric. Experimental validation was conducted on a 14-day fragment of Prometheus metrics from the industrial cluster of the "Sber Antifraud" system (≈7 billion data points, 1379 initial metrics). The results showed a 43% reduction in the Mean Absolute Error (MAE) of 30-minute CPU utilization forecasts, a 14-fold decrease in input time series, and an 89% reduction in model inference time. The methodology is integrated into an industrial data processing pipeline (Prometheus → Kafka → Spark 3.5 → MLflow 2.11) and aligns with the data minimization principle outlined in GOST R 57580.1-2017 and FSTEC guidelines for information protection.
References
2. Lebed S.V. Innovative Technologies in Cybersecurity. Modern Information Technologies and IT-Education. 2022;18(2):383-390. (In Russ., abstract in Eng.) https://doi.org/10.25559/SITITO.18.202202.383-390
3. Ionescu S.-A., Diaconita V., Radu A.-O. Engineering Sustainable Data Architectures for Modern Financial Institutions. Electronics. 2025;14(8):1650. https://doi.org/10.3390/electronics14081650
4. Weinberg A.I., Premebida C., Faria D.R. Causality from Bottom to Top: A Survey. arXiv:2403.11219. 2024. https://doi.org/10.48550/arXiv.2403.11219
5. Naghoosi E., Huang B., Domlan E., Kadali R. Information transfer methods in causality analysis of process variables with an industrial application. Journal of Process Control. 2013;23(9):1296-1305. https://doi.org/10.1016/j.jprocont.2013.02.003
6. Chatfield C. The Holt-Winters Forecasting Procedure. Journal of the Royal Statistical Society. Series C (Applied Statistics). 1978;27(3):264-279. https://doi.org/10.2307/2347162
7. Nashold L., Krishnan R. Using LSTM and SARIMA Models to Forecast Cluster CPU Usage. arXiv:2007.08092. 2020. https://doi.org/10.48550/arXiv.2007.0809
8. Widiputra H., Mailangkay A., Gautama E. Multivariate CNN‐LSTM Model for Multiple Parallel Financial Time‐Series Prediction. Complexity. 2021;2021(1):9903518. https://doi.org/10.1155/2021/9903518
9. Das T., Guchhai S. A hybrid GRU and LSTM-based deep learning approach for multiclass structural damage identification using dynamic acceleration data. Engineering Failure Analysis. 2025;170:109259. https://doi.org/10.1016/j.engfailanal.2024.10925
10. Mienye E., Jere N., Obaido G., Mienye I.D., Aruleba K. Deep Learning in Finance: A Survey of Applications and Techniques. AI. 2024;5(4):2066-2091. https://doi.org/10.3390/ai5040101
11. Battiti R. Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Transactions on Neural Networks. 1994;5(4):537-550. https://doi.org/10.1109/72.298224
12. Vergara J.R., Estévez P.A. A Review of Feature Selection Methods Based on Mutual Information. Neural Computing and Applications. 2024;24:175-186. https://doi.org/10.1007/s00521-013-1368-0
13. Reshef D., et al. Detecting Novel Associations in Large Data Sets. Science. 2011;334(6062):1518-1524. https://doi.org/10.1126/science.1205438
14. Lütkepohl H. New Introduction to Multiple Time Series Analysis. Berlin, Heidelberg: Springer; 2005. 764 p. https://doi.org/10.1007/978-3-540-27752-1
15. Runge J. et al. Detecting Causal Associations in Large Nonlinear Time Series. Science Advances. 2019;5(10):eaau4996. https://doi.org/10.1126/sciadv.aau4996
16. Massey J. Causality, Feedback and Directed Information. Proc. Int. Symp. Information Theory (ISIT 1990). 1990. p. 303-305. Available at: https://www.isiweb.ee.ethz.ch/archive/massey_pub/pdf/BI532.pdf (accessed 13.02.2025).
17. Mienye I. D., Swart T. G., Obaido G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information. 2024;15(9):517. https://doi.org/10.3390/info15090517
18. Gong C., Zhang C., Yao D., Bi J., Li W., Xu Y.-J. Causal Discovery from Temporal Data: An Overview and New Perspectives. ACM Computing Surveys. 2024;57(4):100. https://doi.org/10.1145/3705297
19. Wang K., Tan Y., Zhang L., Chen Z., Lei J. A Network Traffic Prediction Method for AIOps Based on TDA and Attention GRU. Applied Sciences. 2022;12(20):10502. https://doi.org/10.3390/app122010502
20. Cheng X. A Comprehensive Study of Feature Selection Techniques in Machine Learning Models. Artificial Intelligence and Digital Technology. 2024;1(1):65-78. https://doi.org/10.70088/xpf2b276
21. Zhang G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing. 2003;50:159-175. https://doi.org/10.1016/S0925-2312(01)00702-0
22. Pearl J. The mathematics of causal inference. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'11). New York, NY, USA: Association for Computing Machinery; 2011. p. 5. https://doi.org/10.1145/2020408.2020416
23. Gyeera T.W., Simons A.J.H., Stannett M. Regression Analysis of Predictions and Forecasts of Cloud Data Center KPIs Using the Boosted Decision Tree Algorithm. IEEE Transactions on Big Data. 2023;9(4):1071-1085. https://doi.org/10.1109/TBDATA.2022.3230649
24. Greener T., Costanza J., et al. Calculating the Carbon Footprint of AI. IEEE Access. 2023;11:82809-82823.
25. West M., Harrison P. Bayesian Forecasting and Dynamic Models. Springer Series in Statistics. New York: Springer Science & Business Media; 1997. 682 p. https://doi.org/10.1007/b98971

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.
