Detecting Anomalies in CPU Behavior Using Clustering Algorithms from the Scikit-Learn Library in Python Programming Language
Abstract
Modern computer systems are becoming increasingly complex. They provide us with many features, but sometimes anomalies in the system can negatively affect computer performance. In this case, the issue of anomaly detection is acute, since anomalous activity detected in time can prevent a cyber attack.
This article examines the problem of detecting anomalies in central processing unit (CPU) operation using time series clustering algorithms. The central processing unit is the main computing component responsible for executing instructions and processing data. Anomalies in CPU operation can lead to system crashes, reduced performance, and other negative consequences. To solve this problem, the usage of clustering algorithms is proposed, which allow identifying anomalies based on the time series analysis representing the behavior of the CPU. The article presents an overview of existing time series clustering algorithms and their application to the problem of identifying anomalies in CPU operation. Classic clustering methods of the Scikit-Learn library in Python programming language are considered, such as KMeans, DBSCAN, Agglomerative Clustering and Affinity Propagation. To evaluate the effectiveness of the proposed algorithms, various quality metrics are used, such as ARI, AMI, Homogeneity Score, Completeness score, V – measure and Silhouette score. Experiments are conducted on real data obtained from CPU monitoring systems to evaluate the performance and compare the results of different algorithms.
Detecting CPU anomalies is an important task that helps improve the quality of computer systems. Understanding the causes of CPU anomalies and the available solutions can help you solve problems related to processor performance. This can help improve the performance, stability and reliability of computer systems.
References
2. Ozer G., Netti A., Tafani D., Schulz M. Characterizing HPC Performance Variation with Monitoring and Unsupervised Learning. In: Jagode H., Anzt H., Juckeland G., Ltaief H. (eds.) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science. Vol. 12321. Cham: Springer; 2020. p. 280-292. https://doi.org/10.1007/978-3-030-59851-8_18
3. Utomo D., Hsiung P.-A. A Multitiered Solution for Anomaly Detection in Edge Computing for Smart Meters. Sensors. 2020;20(18):5159. https://doi.org/10.3390/s20185159
4. Fernando D., Rodriguez M.A., Arroba P., Ismail L., Buyya R. Efficient Training Approaches for Performance Anomaly Detection Models in Edge Computing Environments. arXiv:2408.12855. 2024. https://doi.org/10.48550/arXiv.2408.12855
5. Daraghmeh M., Agarwal A., Jararweh Y. Anomaly Detection-Based Multilevel Ensemble Learning for CPU Prediction in Cloud Data Centers. In: 2024 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE). Kingston, ON, Canada: IEEE Computer Society; 2024. p. 559-564. https://doi.org/10.1109/CCECE59415.2024.10667074
6. Halawa M.S., Díaz Redondo R.P., Vilas A.F. Supervised Performance Anomaly Detection in HPC Data Centers. In: Hassanien A., Azar A., Gaber T., Bhatnagar R., F. Tolba M. (eds.) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019). AMLTA 2019. Advances in Intelligent Systems and Computing. Vol. 921. Cham: Springer; 2020. p. 680-688. https://doi.org/10.1007/978-3-030-14118-9_67
7. Cao C., Blaise A., Verwer S., Rebecchi F. Learning State Machines to Monitor and Detect Anomalies on a Kubernetes Cluster. In: Proceedings of the 17th International Conference on Availability, Reliability and Security (ARES '22). Article number: 117. New York, NY, USA: Association for Computing Machinery; 2022. https://doi.org/10.1145/3538969.3543810
8. Chavan V.D., Yalagi P.S. A Review of Machine Learning Tools and Techniques for Anomaly Detection. In: Choudrie J., Mahalle P.N., Perumal T., Joshi A. (eds.) ICT for Intelligent Systems. ICTIS 2023. Smart Innovation, Systems and Technologies. Vol. 361. Singapore: Springer; 2023. p. 395-406. https://doi.org/10.1007/978-981-99-3982-4_34
9. Huč A., Šalej J., Trebar M. Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices. Sensors. 2021;21(14):4946. https://doi.org/10.3390/s21144946
10. Putina A., Rossi D. Online Anomaly Detection Leveraging Stream-Based Clustering and Real-Time Telemetry. IEEE Transactions on Network and Service Management. 2021;18(1):839-854. https://doi.org/10.1109/TNSM.2020.3037019
11. Shiokawa H. Scalable Affinity Propagation for Massive Datasets. Proceedings of the AAAI Conference on Artificial Intelligence. 2021;35(11):9639-9646. https://doi.org/10.1609/aaai.v35i11.17160
12. Kherbache M., Espes D., Amroun K. An Enhanced approach of the K-means clustering for Anomaly-based intrusion detection systems. In: 2021 International Conference on Computing, Computational Modelling and Applications (ICCMA). Brest, France: IEEE Computer Society; 2021. p. 78-83. https://doi.org/10.1109/ICCMA53594.2021.00021
13. Fujiwara Y., Irie G., Kitahara T. Fast Algorithm for Affinity Propagation. In: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence. Volume Three (IJCAI'11). AAAI Press; 2011. p. 2238-2243. Available at: https://www.ijcai.org/Proceedings/11/Papers/373.pdf (accessed 27.01.2024).
14. Molan M., Borghesi A., Cesarini D., Benini L., Bartolini A. RUAD: Unsupervised anomaly detection in HPC systems. Future Generation Computer Systems. 2023;141(C):542-554. https://doi.org/10.1016/j.future.2022.12.001
15. Aljohani A. Optimizing Patient Stratification in Healthcare: A Comparative Analysis of Clustering Algorithms for EHR Data. International Journal of Computational Intelligence Systems. 2024;17:173. https://doi.org/10.1007/s44196-024-00568-8
16. Dodda S., Chintala S., Kunchakuri N., Kamuni N. Enhancing Microservice Reliability in Cloud Environments Using Machine Learning for Anomaly Detection. In: 2024 International Conference on Computing, Sciences and Communications (ICCSC). Ghaziabad, India: IEEE Computer Society; 2024. p. 1-5. https://doi.org/10.1109/ICCSC62048.2024.10830437
17. Giorgino T. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software. 2009;31(7):1-24. https://doi.org/10.18637/jss.v031.i07
18. Wang L., Koniusz P. Uncertainty-DTW for Time Series and Sequences. In: Avidan S., Brostow G., Cissé M., Farinella G.M., Hassner T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science. Vol. 13681. Cham: Springer; 2022. p. 176-195. https://doi.org/10.1007/978-3-031-19803-8_11
19. Bie M., Li W., Fu Q., Chen T., Du Y., Nan L. Energy-Efficient Reconfigurable Acceleration Engine for Polynomial Coefficient Generation of Lattice-Based Post-Quantum Cryptography. Electronics. 2024;13(24):4921. https://doi.org/10.3390/electronics13244921
20. Togbe M.U., et al. Anomaly Detection for Data Streams Based on Isolation Forest Using Scikit-Multiflow. In: Gervasi O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science. Vol. 12252. Cham: Springer; 2020. p. 15-30. https://doi.org/10.1007/978-3-030-58811-3_2
21. Sun L., Guo C., Liu C., et al. Fast affinity propagation clustering based on incomplete similarity matrix. Knowledge and Information Systems. 2017;51:941-963. https://doi.org/10.1007/s10115-016-0996-y
22. Deng D. Research on Anomaly Detection Method Based on DBSCAN Clustering Algorithm. In: 2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT). Shenyang, China: IEEE Computer Society; 2020. p. 439-442. https://doi.org/10.1109/ISCTT51595.2020.00083
23. Patel P., Sivaiah B., Patel R. Approaches for finding Optimal Number of Clusters using K-Means and Agglomerative Hierarchical Clustering Techniques. In: 2022 International Conference on Intelligent Controller and Computing for Smart Power (ICICCSP). Hyderabad, India: IEEE Computer Society; 2022. p. 1-6. https://doi.org/10.1109/ICICCSP53532.2022.9862439
24. Shorewala V. Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm. In: 2021 5th International Conference on Computer, Communication and Signal Processing (ICCCSP). Chennai, India: IEEE Computer Society; 2021. p. 115-121. https://doi.org/10.1109/ICCCSP52374.2021.9465539
25. Javed A., Lee B.S., Rizzo D.M. A benchmark study on time series clustering. Machine Learning with Applications. 2020;1:100001. https://doi.org/10.1016/j.mlwa.2020.100001

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.