Intelligent Search System for Working with Big Data

Abstract

The article describes a system for modeling an information retrieval system on the Internet. The developed application is described, which allows the operation of the information retrieval system according to the following parameters: according to the data collection model, to the solution of the indexing problem, according to the ranking model, to the solution of the storage problem. Solutions in this area are developing most actively, thanks to progress in the field of artificial intelligence, cloud technologies and natural language processing. These factors have made re-search, the development of intelligent information retrieval systems (IRS), which collect information on the Internet and implement a search based on the data found. This search is available in the absence of impressive material resources. The main problems to be solved in the development of IRS: the problem of data collection; indexing problem; index model, its choice and development; ranking problem; storage problem; quality assessment problem. Search intelligence is provided through the use of ranking using the tf-idf methods, vector model and link analysis, which allow you to find relevant documents that do not contain direct occurrences of words from queries and sort them according to the degree of matching the query.
The developed application in the Python language is described, test runs of the system were carried out, which showed its performance, and the organization of the intellectual component is explained.


Author Biographies

Irina Fedorovna Astachova, Voronezh State University

Professor of the Chair of Computer Hardware, Faculty of Applied Mathematics, Informatics and Mechanics, Dr. Sci. (Eng.), Professor

Katerina Alexandrovna Makoviy, Voronezh State Technical University

Associate Professor of the Department of Control Systems and Information Technologies in Construction, Faculty of Information Technologies and Computer Security, Cand. Sci. (Tech.)

Lev Sergeevich Nikitin, Voronezh State University

student of the Chair of Computer Hardware, Faculty of Applied Mathematics, Informatics and Mechanics

Yuliya Vladimirovna Khitskova, Voronezh State University

Associate Professor of the Department of Regional Economics and Territorial Administration of the Faculty of Economics, Cand. Sci. (Econ.)

References

1. Kirillov A. Search Engines: Components, Logic and Ranking Methods. Business Informatics. 2009;(4):51- 59. (In Russ., abstract in Eng.) EDN: KZXGGJ
2. Galiev T.A. Methods of ranking of searching information in corporate searching systems. Open Education. 2012;(1):46-51. (In Russ., abstract in Eng.) EDN: PLQHGJ
3. Marina M.S. Yandex Search Engine. Vestnik Magistratury. 2014;1(4):82-84. (In Russ., abstract in Eng.) EDN: SAVBVD
4. Trifonov A.A. Algorithms of inverted index construction for text data collection. University proceedings. Volga region. Technical sciences. 2013;(3):52-61. (In Russ., abstract in Eng.) EDN: SBVDQP
5. Sankpal L.J., Patil S.H. Rider-Rank Algorithm-Based Feature Extraction for Re-ranking the Webpages in the Search Engine. The Computer Journal. 2020;63(10):1479-1489. https://doi.org/10.1093/comjnl/bxaa032
6. Patel P., Patel K. A Review of PageRank and HITS Algorithms. International Journal of Advance Research in Engineering, Science & Technology. 2015;2(1):2394-2444.
7. Tagarov B.Zh. The development of the market of search optimization in Russia. Creative Economy. 2018;12(9):1373-1384. (In Russ., abstract in Eng.) https://doi.org/10.18334/ce.12.9.39379
8. Latypov A.R. Review of the impact of user behavior in search algorithms. Sovremennye materialy, tehnika i tehnologii. 2015;(2):92-97. (In Russ., abstract in Eng.) EDN: UNUUBP
9. Brin S., Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. 1998;30(1-7):107-117. https://doi.org/10.1016/S0169-7552(98)00110-X
10. Vasyaeva N.S., Degaev M.N. Formalization of an index construction model for search engines. International Research Journal. 2022;(6-1):56-60. (In Russ., abstract in Eng.) https://doi.org/10.23670/IRJ.2022.120.6.007
11. Pang L., Xu J., Ai Q., Lan Y., Cheng X., Wen J. SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20). New York, NY, USA: Association for Computing Machinery; 2020. p. 499-508. https://doi.org/10.1145/3397271.3401104
12. Zherdeva M.V., Artyushenko V.M. Stemming and lemmatization in Lucene.Net. Lesnoy Vestnik = Forestry Bulletin. 2016;20(3):131-134. (In Russ., abstract in Eng.) EDN: WKNMTN
13. Thota P., Ramez E. Web Scraping of COVID-19 News Stories to Create Datasets for Sentiment and Emotion Analysis. In: Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference (PETRA'21). New York, NY, USA: Association for Computing Machinery; 2021. p. 306-314. https://doi.org/10.1145/3453892.3461333
14. Sorokin V.E. Fuzzy Data Storing and Efficient Processing in PostgreSQL DBMS. Software & Systems. 2017;30(4):609-618. (In Russ., abstract in Eng.) https://doi.org/10.15827/0236-235X.030.4.609-618
15. Avci C., Tekinerdogan B., Athanasiadis I.N. Software architectures for big data: a systematic literature review. Big Data Analytics. 2020;5:5. https://doi.org/10.1186/s41044-020-00045-1
16. Xiaojie X., Yuan F., Jian W. The Basic Principle and Applications of the Search Engine Optimization. In: Du Z. (ed.) Proceedings of the 2012 International Conference of Modern Computer Science and Applications. Advances in Intelligent Systems and Computing. Vol. 191. Berlin, Heidelberg: Springer; 2013. p. 63-69. https://doi.org/10.1007/978-3-642-33030-8_11
17. Lehmann C., Goren Huber L., Horisberger T.et al.Big Data architecture for intelligent maintenance: a focus on query processing and machine learning algorithms. Journal of Big Data. 2020;7:61. https://doi.org/10.1186/s40537-020-00340-7
18. Lee D., Camacho D., Jung J.J. Smart Mobility with Big Data: Approaches, Applications, and Challenges. Applied Sciences. 2023;13(12):7244. https://doi.org/10.3390/app13127244
19. Sparck Jones K., Walker S., Robertson S.E. A probabilistic model of information retrieval: development and comparative experiments: Part 1. Information Processing & Management. 2000;36(6):779-808. https://doi.org/10.1016/S0306-4573(00)00015-7
20. Liu C., Chen Z., Cao D., Shang M. Application of Recommender System in Intelligent Community under Big Data Scenario. In: Proceedings of the 2nd International Conference on Big Data Technologies (ICBDT'19). New York, NY, USA: Association for Computing Machinery; 2019. p. 92-96. https://doi.org/10.1145/3358528.3359551
21. Sun Z., Huo Y. A Managerial Framework for Intelligent Big Data Analytics. In: Proceedings of the 2nd International Conference on Software Engineering and Information Management (ICSIM'19). New York, NY, USA: Association for Computing Machinery; 2019. p. 152-156. https://doi.org/10.1145/3305160.3305211
22. Serrano W. A Big Data Intelligent Search Assistant Based on the Random Neural Network. In: Angelov P., Manolopoulos Y., Iliadis L., Roy A., Vellasco M. (eds.) Advances in Big Data. INNS 2016. Advances in Intelligent Systems and Computing. Vol. 529. Cham: Springer; 2017. p. 254-261. https://doi.org/10.1007/978-3-319-47898-2_26
23. Sanderson M. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval. 2010;4(4):247-375. https://doi.org/10.1561/1500000009
24. Chakraborty N., La Gatta V., Moscato V., Sperlì G. Information retrieval algorithms and neural ranking models to detect previously fact-checked information. Neurocomputing. 2023;557:126680. https://doi.org/10.1016/j.neucom.2023.126680
25. Sun Z. Intelligent Big Data Analytics: A Managerial Perspective. In: Sun Z. (ed.) Managerial Perspectives on Intelligent Big Data Analytics. Hershey, PA: IGI Global; 2019. p. 1-19. https://doi.org/10.4018/978-1-5225-7277-0.ch001
Published
2023-03-30
How to Cite
ASTACHOVA, Irina Fedorovna et al. Intelligent Search System for Working with Big Data. Modern Information Technologies and IT-Education, [S.l.], v. 19, n. 1, p. 180-188, mar. 2023. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/953>. Date accessed: 09 sep. 2025. doi: https://doi.org/10.25559/SITITO.019.202301.180-188.
Section
Research and development in the field of new IT and their applications

Most read articles by the same author(s)