Partial Indexing Applied to Search and Join Problems

Abstract

Due to the increasing volume of data and the diversity of processing requirements, there is now a growing trend to move away from on-the-fly data processing and instead focus on executing queries or their main aspects using pre-stored and prepared results. In many cases, DBMSs try to solve performance issues by increasing memory usage; however, it is essential to consider memory conservation while retaining the results of methods based on approaches such as indexing, hashing, and neural algorithms. This article discusses a method for improving the efficiency of search tasks in large tables. The proposed method is based on partial indexing of elements near convergence centers and the introduction of metadata concepts for these centers. Such clustering, with stored metadata for the centers around which the next intermediate nodes are arranged, can reduce memory usage for indexing. Firstly, this approach eliminates the need for nested indexing, which can lead to significant spatial costs. Secondly, this method can provide the ability to use one indexing for different combinations of column presence in search patterns without losing much search efficiency during indexing. Correct application of this approach can effectively process tables with different search needs for various column groups, for which storing indexing for each major query type or group of queries can lead to significant memory costs and performance loss when working with large memory blocks, the growth of which is not linear.

Author Biography

Artem Igorevich Mironov, Smolensk State University

Postgraduate student of the Faculty of Physics and Mathematics

Published
2023-12-20
How to Cite
MIRONOV, Artem Igorevich. Partial Indexing Applied to Search and Join Problems. Modern Information Technologies and IT-Education, [S.l.], v. 19, n. 4, dec. 2023. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/1047>. Date accessed: 16 sep. 2025.
Section
Parallel and distributed programming, grid technologies, programming on GPUs