A method for compressing event log data based on combinatorial generation using AND/OR tree structures
Abstract
The exponential growth in the volume of digital information produced by modern society entails the problem of storing large amounts of data, including archival data. Archival data refers to the category of “cold” data (data that requires storage, but is rarely used). A clear example of this type of archival data is data from event logs, which contain a brief description of events that occurred in the information system in chronological order. Due to the large amount of archival data and its rare use, it is relevant to store such data in compressed form. This article discusses the problem of developing a method for compressing archival data using the example of event log data by applying combinatorial generation algorithms. In particular, if we fix some current state of the event log, then the set of its entries can be considered as a combinatorial set. Then, using an algorithm for ranking elements of the combinatorial set, each event log entry can be encoded with a single number, which will require less memory to store. Based on this idea, a method for compressing event log data based on combinatorial generation using AND/OR tree structures is proposed. To evaluate the effectiveness of the proposed method, an example of compressing event log data generated within Moodle electronic courses is considered. The results of the experimental study confirmed the effectiveness of the proposed method: the total amount of memory required to store the event log of a Moodle electronic course in the compressed form is less compared to the existing methods for compressing text files.
References
2. Memishi B., Appuswamy R., Paradies M. Cold storage data archives: More than just a bunch of tapes. In: Proceedings of the 15th International Workshop on Data Management on New Hardware (DaMoN'19). New York, NY, USA: Association for Computing Machinery; 2019. Article number: 1. https://doi.org/10.1145/3329785.3329921
3. Pernet C., Svarer C., Blair R., van Horn J.D., Poldrack R.A. On the long-term archiving of research data. Neuroinformatics. 2023;21:243-246. https://doi.org/10.1007/s12021-023-09621-x
4. Liu A., Yu T. Overview of Cloud Storage And Architecture. International Journal of Scientific & Technology Research. Available at: https://ssrn.com/abstract=3649074 (accessed 01.09.2023).
5. Jayasankar U., Thirumal V., Ponnurangam D. A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications. Journal of King Saud University Computer and Information Sciences. 2021;33(2):119-140. https://doi.org/10.1016/j.jksuci.2018.05.006
6. Gupta A., Bansal A., Khanduja V. Modern lossless compression techniques: Review, comparison and analysis. In: 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT). Coimbatore, India: IEEE Computer Society; 2017. p. 1-8. https://doi.org/10.1109/ICECCT.2017.8117850
7. Bakulina M. Efficient lossless compression of large information arrays. Problems of Informatics. 2022;(4):63-69. (In Russ., abstract in Eng.) https://doi.org/10.24412/2073-0667-2022-4-63-69
8. Ko J., Comuzzi M. A Systematic Review of Anomaly Detection for Business Process Event Logs. Business & Information Systems Engineering. 2023;65(7):441-462. https://doi.org/10.1007/s12599-023-00794-y
9. Yao K., Sayagh M., Shang W., Hassan A.E. Improving State-of-the-Art Compression Techniques for Log Management Tools. IEEE Transactions on Software Engineering. 2022;48(8):2748-2760. https://doi.org/10.1109/TSE.2021.3069958
10. Balakrishnan R., Sahoo R. Lossless compression for large scale cluster logs. In: Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. Rhodes, Greece: IEEE Computer Society; 2006. p. 7. https://doi.org/10.1109/IPDPS.2006.1639692
11. Grebennik I.V., Lytvynenko O.S. Generating combinatorial sets with given properties. Cybernetics and Systems Analysis. 2012;48:890-898. https://doi.org/10.1007/s10559-012-9469-9
12. Hartung E., Hoang H.P., Mutze T., Williams A. Combinatorial generation via permutation languages. I. Fundamentals. Transactions of the American Mathematical Society. 2020;375:2255-2291. https://doi.org/10.1090/tran/8199
13. Kruchinin V.V., Titkov .V., Khomich S.L. Approach to development of database based on the generation algorithms and tuple identification. Bulletin of Tomsk Polytechnic University. 2006;309(8):28-31. (In Russ., abstract in Eng.) EDN: HYZVQV
14. Shablya Y., Kruchinin D., Kruchinin V. Method for developing combinatorial generation algorithms based on AND/OR trees and its application. Mathematics. 2020;8(6):962. https://doi.org/10.3390/math8060962
15. Shablya Y.V., Kruchinin D.V. Modification of the algorithm development method for combinatorial generation based on the application of the generating functions theory. Proceedings of TUSUR University. 2019;22(3):55-60. (In Russ., abstract in Eng.) https://doi.org/10.21293/1818-0442-2019-22-3-55-60
16. Kruchinin D.V. Modification of the method for developing combinatorial generation algorithms based on the use of multivariate generating functions and approximations. Proceedings of TUSUR University. 2022;25(1):55-60. (In Russ., abstract in Eng.) https://doi.org/10.21293/1818-0442-2021-25-1-55-60
17. Kruchinin V.V. Presentation of set by means of tree AND/OR. Proceedings of TUSUR University. 2008;(1):107-112. (In Russ., abstract in Eng.) EDN: KUUJLT
18. Kruchinin V. V., Lukschin B. A. Method of coding of information objects on the basis of trees And-Or. Proceedings of TUSUR University. 2010;(1):170-172. (In Russ., abstract in Eng.) EDN: MPWDAR
19. Bojiah J. Effectiveness of Moodle in teaching and learning. Journal of Hunan University Natural Sciences. 2022;49(12):320-328. https://doi.org/10.55463/issn.1674-2974.49.12.33
20. Parise P. A preliminary look at online learner behavior what can the Moodle logs tell us? Bulletin of Kanagawa Prefectural Institute of Language and Culture Studies. 2017;6:15-31. https://doi.org/10.20686/academiakiyou.6.0_15
21. Rotelli D., Monreale A. Processing and understanding Moodle log data and their temporal dimension. Journal of Learning Analytics. 2023;10(2):126-141. https://doi.org/10.18608/jla.2023.7867
22. Athaya H., Nadir R.D.A., Indra Sensuse D., Kautsarina K., Suryono R.R. Moodle Implementation for E-Learning: A Systematic Review. In: Proceedings of the 6th International Conference on Sustainable Information Engineering and Technology (SIET '21). New York, NY, USA: Association for Computing Machinery; 2021. p. 106-112. https://doi.org/10.1145/3479645.3479646
23. Jacob N., Somvanshi P., Tornekar R. Comparative analysis of lossless text compression techniques. International Journal of Computer Applications. 2012;56(3):17-21. https://doi.org/10.5120/8871-2850
24. Tanjung A. S., Nasution S. D. Comparison analysis with Huffman algorithm and Goldbach codes algorithm in file compression text using the method exponential comparison. International Journal of Informatics and Computer Science. 2020;4(1):29-34. http://dx.doi.org/10.30865/ijics.v4i1.1387
25. Kotb A., Hassan S., Hassan H. A Comparative Study Among Various Algorithms for Lossless Airborne LiDAR Data Compression. In: 2018 14th International Computer Engineering Conference (ICENCO). Cairo, Egypt: IEEE Computer Society; 2018. p. 17-21. https://doi.org/10.1109/ICENCO.2018.8636136

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.