Intelligent Search System for Working with Big Data
Abstract
The article describes a system for modeling an information retrieval system on the Internet. The developed application is described, which allows the operation of the information retrieval system according to the following parameters: according to the data collection model, to the solution of the indexing problem, according to the ranking model, to the solution of the storage problem. Solutions in this area are developing most actively, thanks to progress in the field of artificial intelligence, cloud technologies and natural language processing. These factors have made re-search, the development of intelligent information retrieval systems (IRS), which collect information on the Internet and implement a search based on the data found. This search is available in the absence of impressive material resources. The main problems to be solved in the development of IRS: the problem of data collection; indexing problem; index model, its choice and development; ranking problem; storage problem; quality assessment problem. Search intelligence is provided through the use of ranking using the tf-idf methods, vector model and link analysis, which allow you to find relevant documents that do not contain direct occurrences of words from queries and sort them according to the degree of matching the query.
The developed application in the Python language is described, test runs of the system were carried out, which showed its performance, and the organization of the intellectual component is explained.
References
2. Galiev T.A. Methods of ranking of searching information in corporate searching systems. Open Education. 2012;(1):46-51. (In Russ., abstract in Eng.) EDN: PLQHGJ
3. Marina M.S. Yandex Search Engine. Vestnik Magistratury. 2014;1(4):82-84. (In Russ., abstract in Eng.) EDN: SAVBVD
4. Trifonov A.A. Algorithms of inverted index construction for text data collection. University proceedings. Volga region. Technical sciences. 2013;(3):52-61. (In Russ., abstract in Eng.) EDN: SBVDQP
5. Sankpal L.J., Patil S.H. Rider-Rank Algorithm-Based Feature Extraction for Re-ranking the Webpages in the Search Engine. The Computer Journal. 2020;63(10):1479-1489. https://doi.org/10.1093/comjnl/bxaa032
6. Patel P., Patel K. A Review of PageRank and HITS Algorithms. International Journal of Advance Research in Engineering, Science & Technology. 2015;2(1):2394-2444.
7. Tagarov B.Zh. The development of the market of search optimization in Russia. Creative Economy. 2018;12(9):1373-1384. (In Russ., abstract in Eng.) https://doi.org/10.18334/ce.12.9.39379
8. Latypov A.R. Review of the impact of user behavior in search algorithms. Sovremennye materialy, tehnika i tehnologii. 2015;(2):92-97. (In Russ., abstract in Eng.) EDN: UNUUBP
9. Brin S., Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. 1998;30(1-7):107-117. https://doi.org/10.1016/S0169-7552(98)00110-X
10. Vasyaeva N.S., Degaev M.N. Formalization of an index construction model for search engines. International Research Journal. 2022;(6-1):56-60. (In Russ., abstract in Eng.) https://doi.org/10.23670/IRJ.2022.120.6.007
11. Pang L., Xu J., Ai Q., Lan Y., Cheng X., Wen J. SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20). New York, NY, USA: Association for Computing Machinery; 2020. p. 499-508. https://doi.org/10.1145/3397271.3401104
12. Zherdeva M.V., Artyushenko V.M. Stemming and lemmatization in Lucene.Net. Lesnoy Vestnik = Forestry Bulletin. 2016;20(3):131-134. (In Russ., abstract in Eng.) EDN: WKNMTN
13. Thota P., Ramez E. Web Scraping of COVID-19 News Stories to Create Datasets for Sentiment and Emotion Analysis. In: Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference (PETRA'21). New York, NY, USA: Association for Computing Machinery; 2021. p. 306-314. https://doi.org/10.1145/3453892.3461333
14. Sorokin V.E. Fuzzy Data Storing and Efficient Processing in PostgreSQL DBMS. Software & Systems. 2017;30(4):609-618. (In Russ., abstract in Eng.) https://doi.org/10.15827/0236-235X.030.4.609-618
15. Avci C., Tekinerdogan B., Athanasiadis I.N. Software architectures for big data: a systematic literature review. Big Data Analytics. 2020;5:5. https://doi.org/10.1186/s41044-020-00045-1
16. Xiaojie X., Yuan F., Jian W. The Basic Principle and Applications of the Search Engine Optimization. In: Du Z. (ed.) Proceedings of the 2012 International Conference of Modern Computer Science and Applications. Advances in Intelligent Systems and Computing. Vol. 191. Berlin, Heidelberg: Springer; 2013. p. 63-69. https://doi.org/10.1007/978-3-642-33030-8_11
17. Lehmann C., Goren Huber L., Horisberger T.et al.Big Data architecture for intelligent maintenance: a focus on query processing and machine learning algorithms. Journal of Big Data. 2020;7:61. https://doi.org/10.1186/s40537-020-00340-7
18. Lee D., Camacho D., Jung J.J. Smart Mobility with Big Data: Approaches, Applications, and Challenges. Applied Sciences. 2023;13(12):7244. https://doi.org/10.3390/app13127244
19. Sparck Jones K., Walker S., Robertson S.E. A probabilistic model of information retrieval: development and comparative experiments: Part 1. Information Processing & Management. 2000;36(6):779-808. https://doi.org/10.1016/S0306-4573(00)00015-7
20. Liu C., Chen Z., Cao D., Shang M. Application of Recommender System in Intelligent Community under Big Data Scenario. In: Proceedings of the 2nd International Conference on Big Data Technologies (ICBDT'19). New York, NY, USA: Association for Computing Machinery; 2019. p. 92-96. https://doi.org/10.1145/3358528.3359551
21. Sun Z., Huo Y. A Managerial Framework for Intelligent Big Data Analytics. In: Proceedings of the 2nd International Conference on Software Engineering and Information Management (ICSIM'19). New York, NY, USA: Association for Computing Machinery; 2019. p. 152-156. https://doi.org/10.1145/3305160.3305211
22. Serrano W. A Big Data Intelligent Search Assistant Based on the Random Neural Network. In: Angelov P., Manolopoulos Y., Iliadis L., Roy A., Vellasco M. (eds.) Advances in Big Data. INNS 2016. Advances in Intelligent Systems and Computing. Vol. 529. Cham: Springer; 2017. p. 254-261. https://doi.org/10.1007/978-3-319-47898-2_26
23. Sanderson M. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval. 2010;4(4):247-375. https://doi.org/10.1561/1500000009
24. Chakraborty N., La Gatta V., Moscato V., Sperlì G. Information retrieval algorithms and neural ranking models to detect previously fact-checked information. Neurocomputing. 2023;557:126680. https://doi.org/10.1016/j.neucom.2023.126680
25. Sun Z. Intelligent Big Data Analytics: A Managerial Perspective. In: Sun Z. (ed.) Managerial Perspectives on Intelligent Big Data Analytics. Hershey, PA: IGI Global; 2019. p. 1-19. https://doi.org/10.4018/978-1-5225-7277-0.ch001

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.