Create Partial Table Indexing for Search Sources
Abstract
Due to the growing number of data and the growing variety of requirements for their processing, now we have to move away from data processing at the time of the request and increasingly shift the main work on its implementation or the implementation of its main aspects to pre-stored and prepared results. In many ways, DBMS thus try to solve performance problems by increasing memory consumption, but in many ways, it is necessary to think about saving the latter, while preferably preserving the results of methods based on a similar approach – indexing, hashing, neural network algorithms. The article discusses a method for improving the efficiency of solving search problems for large tables. The proposed method is based on partial indexing of elements near convergence centers and the introduction of the concept of metadata for these centers. Such clustering with stored metadata for the centers, near which the next intermediate nodes are formed, allows you to reduce the memory costs for indexing, because, firstly, with this approach there is no need for nested indexing, which can lead to serious spatial costs. Secondly, such an approach can make it possible to use one indexing for different combinations of the presence of columns in the search image, without losing most of the search efficiency during indexing. Such a combination, if used correctly, can make it possible to efficiently process tables with different search needs, for different groups of columns, for which storing indexing for each large type of query or group of queries can naturally lead to serious memory consumption costs as well as loss of performance when working with large arrays of memory, which also increases far from linearly.
References
2. Chamoso P., Rivas A., Sánchez-Torres R., Rodríguez S. Social computing for image matching. PLOS ONE. 2018;13(5):e0197576. doi: https://doi.org/10.1371/journal.pone.0197576
3. Das S., Grbic M., Ilic I., Jovandic I., Jovanovic A., Narasayya V.R., Radulovic M., Stikic M., Xu G., Chaudhuri S. Automatically Indexing Millions of Databases in Microsoft Azure SQL Database. In: Proceedings of the 2019 International Conference on Management of Data (SIGMOD'19). New York, NY, USA: Association for Computing Machinery; 2019. p. 666-679. doi: https://doi.org/10.1145/3299869.3314035
4. Dodonov A., Mukhin V., Zavgorodnii V., Kornaga Ya., Zavgorodnya A., Mukhin O. Method of Parallel Information Object Search in Unified Information Spaces. International Journal of Computer Network and Information Security. 2021;13(4):1-13. doi: https://doi.org/10.5815/ijcnis.2021.04.01
5. Gorokhovatskyi V.A., Gorokhovatskiy A.V., Peredrii Ye.О. Hashing of structural descriptions at building of the class image descriptor, computing of relevance and classification of the visual objects. Telecommunications and Radio Engineering. 2018;77(13):1159-1168. Available at: https://openarchive.nure.ua/server/api/core/bitstreams/00ab1f8f-d40e-49ee-8540-da9d745c1be4/content (accessed 23.06.2022).
6. Graefe G. Modern B-Tree Techniques. Foundations and Trends® in Databases. 2011;3(4):203-402. doi: http://dx.doi.org/10.1561/1900000028
7. Haynes D., Ray S., Manson S.M., Soni A. High performance analysis of big spatial data. In: 2015 IEEE International Conference on Big Data (Big Data). Santa Clara, CA, USA: IEEE Computer Society; 2015. p. 1953-1957. doi: https://doi.org/10.1109/BigData.2015.7363974
8. Pan V.Y., Yu Y., Stewart C. Algebraic and Numerical Techniques for the Computation of Matrix Determinants. Computers & Mathematics with Applications. 1997;34(1):43-70. doi: https://doi.org/10.1016/S0898-1221(97)00097-7
9. Kirikova A., Mironov A. Using Metadata-indexing to Improve the Efficiency of Complex Operations. In: 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus). St. Petersburg, Moscow, Russia: IEEE Computer Society; 2021. p. 2124-2127. doi: https://doi.org/10.1109/ElConRus51938.2021.9396274
10. Kirikova A., Mironov A., Munerman V. The Method of Composition Hash-functions for Optimize a Task of Searching Images in Dataset. In: 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). St. Petersburg and Moscow, Russia: IEEE Computer Society; 2020. p. 1983-1986. doi: https://doi.org/10.1109/EIConRus49466.2020.9038919
11. Levin N.A., Munerman V.I. Models of Big Data Processing in Massively Parallel Systems. Highly Available Systems. 2013;9(1):035-043. Available at: https://www.elibrary.ru/item.asp?id=18928468 (accessed 23.06.2022).
12. Lomet D. The evolution of effective B-tree: Page organization and techniques: A personal account. ACM SIGMOD Record. 2001;30(3):64-69. doi: https://doi.org/10.1145/603867.603878
13. Lvovich I., Lvovich Y., Preobrazhenskiy A., Choporov O. Modeling and Optimization of Processing Large Data Arrays in Information Systems. In: 2021 International Conference on Information Technology and Nanotechnology (ITNT). Samara, Russian Federation: IEEE Computer Society; 2021. p. 1-5. doi: https://doi.org/10.1109/ITNT52450.2021.9649229
14. Monga V., Evans B.L. Perceptual image hashing via feature points: performance evaluation and tradeoffs. IEEE Transactions on Image Processing. 2006;15(11):3452-3465. doi: https://doi.org/10.1109/TIP.2006.881948
15. Munerman V., Munerman D. Realization of Distributed Data Processing on the Basis of Container Technology. In: 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). Saint Petersburg and Moscow, Russia; IEEE Computer Society; 2019. p. 1740-1744. doi: https://doi.org/10.1109/EIConRus.2019.8656766
16. Munerman V., Munerman D., Samoilova T. The Heuristic Algorithm For Symmetric Horizontal Data Distribution. In: 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus). St. Petersburg, Moscow, Russia: IEEE Computer Society; 2021. p. 2161-2165. doi: https://doi.org/10.1109/ElConRus51938.2021.9396510
17. Alam K.S., Shishir T.A., Azharul Hasan K.M. Efficient Partitioning Algorithm for Parallel Multidimensional Matrix Operations by Linearization. In: Senjyu T., Mahalle P.N., Perumal T., Joshi A. (eds.). Information and Communication Technology for Intelligent Systems. ICTIS 2020. Smart Innovation, Systems and Technologies. Vol. 195. Singapore: Springer; 2021. p. 141-149. doi: https://doi. org/10.1007/978-981-15-7078-0_13
18. Pushpa R. Suri, Sudesh Rani. A New Classification for Architecture of Parallel Databases. Information Technology Journal. 2008;7(7):983-991. doi: https://doi.org/10.3923/itj.2008.983.991
19. Chen Y., Li K., Yang W., Xiao G., Xie X., Li T. Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer. IEEE Transactions on Parallel and Distributed Systems. 2019;30(4):923-938. doi: https://doi. org/10.1109/TPDS.2018.2871189
20. Sridhar R., Chandrasekaran M., Sriramya C., Page T. Optimization of heterogeneous Bin packing using adaptive genetic algorithm. IOP Conference Series: Materials Science and Engineering. 2017;183(1):012026. doi: https://doi.org/10.1088/1757-899X/183/1/012026
21. Syrotkina O., Aleksieiev M., Moroz B., Matsiuk S., Shevtsova O., Kozlovskyi A. Mathematical Methods for optimizing Big Data Processing. In: 2020 10th International Conference on Advanced Computer Information Technologies (ACIT). Deggendorf, Germany: IEEE Computer Society; 2020. p. 170-176. doi: https://doi.org/10.1109/ACIT49673.2020.9208940
22. Wajszczyk B., Gruszka I.M. Analysis of possibilities to increase the efficiency of the relative database management system using the methods of parallel processing. Proceedings SPIE. Radioelectronic Systems Conference. 2019;11442:1144215. doi: https://doi.org/10.1117/12.2565744
23. Zakharov V., Kirikova A., Munerman V., Samoilova T. Architecture of Software-Hardware Complex for Searching Images in Database. In: 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). Saint Petersburg and Moscow, Russia: IEEE Computer Society; 2019. p. 1735-1739. doi: https://doi.org/10.1109/EIConRus.2019.8657241
24. Zaki M.J. Parthasarathy S., Ogihara M. Parallel Algorithms for Discovery of Association Rules. Data Mining and Knowledge Discovery. 1997;1:343-373. Available at: http://www.cs.rpi.edu/~zaki/PaperDir/DMKD97.pdf (accessed 23.06.2022).
25. Zobel J., Moffat A., Sacks-Davis R. An Efficient Indexing Technique for Full Text Databases. In: Proceedings of the 18th International Conference on Very Large Data Bases (VLDB'92). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1992. p. 352-362. Available at: https://www.vldb.org/conf/1992/P353.PDF (accessed 23.06.2022).

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.