Keyword Search Algorithm for Specialists in a Thematic Area

Abstract

Thematic search for information is an important problem now. The search for specialists can be called the most difficult direction for thematic search. This direction needs to be developed in many areas of activity. At the beginning of the article, a brief overview of existing approaches to solving this problem is given. The following is a list of existing information systems for finding specialists and experts. The paper describes an algorithm developed for solving this problem. Approbation of the algorithm is implemented on the data of the scientometric system IAS ISTINA, which has been used since 2012 to collect data on scientific activities at the Lomonosov Moscow State University. Thematic search uses data analysis of articles, reports and abstracts of conferences, monographs, dissertations, lectures and training courses. The following information about the objects of scientific activity is used: name of object; list of authors; keywords and annotations. The main features of the developed algorithm are: the search for specialists without their active participation in the search process, the ability to search in Russian and English; use all kind of scientific products for searching; additional definition of the authority of specialists in the co-authorship graph. The article describes the developed thematic search algorithm and the interface of its software implementation. This interface was created to test the algorithm on real data of the scientometric system ISTINA. In this interface, you can search by keywords with additional filtering by headings and view detailed information on the specialists found.

Author Biographies

Alexander Sergeevich Kozitsin, Lomonosov Moscow State University

Leading Researcher of the Institute of Mechanics Lomonosov Moscow State University, Ph.D. (Phys.-Math.)

Sergey Alexandrovich Afonin, Lomonosov Moscow State University

Leading Researcher of the Institute of Mechanics Lomonosov Moscow State University, Ph.D. (Phys.-Math.)

Dmitry Alekseevich Shachnev, Lomonosov Moscow State University

Software Developer of the Institute of Mechanics Lomonosov Moscow State University

References

1. Antopolskii A.B., Kalenov N.E., Serebryakov V.A., Sotnikov A.N. O edinom tsifrovom prostranstve nauchnykh znanii [Common Digital Space of Scientific Knowledge]. Vestnik Rossijskoj akademii nauk. 2019; 89(7):728-735. (In Russ., abstract in Eng.) DOI: https://doi.org/10.31857/S0869-5873897728-735
2. Vlasova S.A., Kalenov N.E., Sotnikov A.N. Web-orientirovannaia sistema formirovaniia kontenta edinogo tsifrovogo prostranstva nauchnykh znanii [A web-based content generation system for a common digital space of scientific knowledge]. Programmnye produkty i sistemy = Software & Systems. 2020; 33(3):365-374. (In Russ., abstract in Eng.) DOI: https://doi.org/10.15827/0236-235X.131.365-374
3. Golubeva E.A., Smagina M.V. Ispolzovanie potentsiala elektronnykh bibliotechnykh sistem v obrazovatelnoi deiatelnosti vuza [Using the potential of electronic library systems in the educational activities of universities]. Bulletin of Kemerovo State University of Culture and Arts. 2020; (50):211-218. (In Russ., abstract in Eng.) DOI: https://doi.org/10.317773/2078-1768-2020-50-211-218
4. Asif R. et al. Analyzing undergraduate students' performance using educational data mining. Computers & Education. 2017; 113:177-194. (In Eng.) DOI: https://doi.org/10.1016/j.compedu.2017.05.007
5. Vasenin V.A., Zenzinov A.A., Lunev K.V. Ispolzovanie naukometricheskikh informatsionno-analiticheskikh sistem dlia avtomatizatsii provedeniia konkursnykh protsedur na primere informatsionno-analiticheskoi sistemy "ISTINA" [The Usage of CRIS-systems for the Contest Procedures Automation in Terms of the ISTINA Information System]. Programmnaia inzheneriia = Software Engineering. 2016; 7(10):472-480. (In Russ., abstract in Eng.) DOI: https://doi.org/10.17587/prin.7.472-480
6. Vasenin V.A., Zanchurin M.A., Kozitsin A.S., Krivchikov M.A., Shachnev D.A. Arkhitekturno-tekhnologicheskie aspekty razrabotki i soprovozhdeniia bolshikh informatsionno-analiticheskikh sistem v sfere nauki i obrazovaniia [Architectural and Technological Aspects of the Development and Maintenance of Large Information Analysis Systems in the Area of Science and Education]. Programmnaia inzheneriia = Software Engineering. 2017; 8(10):448-455. (In Russ., abstract in Eng.) DOI: https://doi.org/10.17587/prin.8.448-455
7. Marshakova-Shaikevich I.V. Tematicheskii spektr issledovatelskoi aktivnosti Rossii // Vestnik Rossiiskoi akademii nauk [Thematic Spectrum of Research Activity in Russia]. Vestnik Rossijskoj akademii nauk. 2007; 77(9):811-818. Available at: https://elibrary.ru/item.asp?id=9552050 (accessed 09.02.2021). (In Russ.)
8. Afonin S.A., Golomazov D.D., Kozitsyn A.S., Ispolzovanie sistem semanticheskogo analiza dlia orgmanizatsii poiska nauchno-tekhnicheskoi informatsii [Using analytical ontology-based information systems for scientific and technological information search]. Programmnaia inzheneriia = Software Engineering. 2012; (2):29-34. Available at: https://elibrary.ru/item.asp?id=17588117 (accessed 09.02.2021). (In Russ., abstract in Eng.)
9. Bradley P. Expert Internet Searching. 5th Ed. Facet Publishing; 2017. (In Eng.)
10. Sikuler D.V. Poisk dannykh dlia aprobatsii intellektualnykh algoritmov i tekhnologii [Data search for testing intelligent algorithms and technologies]. Simvol nauki = Symbol of Science. 2020; (4):49-54. Available at: https://elibrary.ru/item.asp?id=42726648 (accessed 09.02.2021). (In Russ.)
11. Mohammad M., Kosaraju S., Bayramoglu T., Modgil G., Kang M. Automatic knowledge extraction from OCR documents using hierarchical document analysis. Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems (RACS'18). Association for Computing Machinery, New York, NY, USA; 2018. p. 189-194. (In Eng.) DOI: https://doi.org/10.1145/3264746.3264793
12. Chen G., Ye D., Xing Z., Chen J., Cambria E. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, Anchorage, AK, USA; 2017. p. 2377-2383. (In Eng.) DOI: https://doi.org/10.1109/IJCNN.2017.7966144
13. Glazkova A.V. Sravnenie neirosetevykh modelei dlia klassifikatsii tekstovykh fragmentov, soderzhashchikh biograficheskuiu informatsiiu [Comparison of neural network models for classifying text fragments containing biographical information]. Programmnye produkty i sistemy = Software & Systems. 2019; 32(2):263-267. (In Russ., abstract in Eng.) DOI: https://doi.org/10.15827/0236-235X.126.263-267
14. Grinchenkov D.V., Kushchiy D.N. Arkhitektura sistemy tematicheskogo poiska elektronnykh obrazovatelnykh resursov v seti Internet [Architecture of system the subject search of electronic educational resources on the Internet]. Informatizatsiia i sviaz = Informatization and communication. 2016; (3):209-212. Available at: https://elibrary.ru/item.asp?id=26623194 (accessed 09.02.2021). (In Russ., abstract in Eng.)
15. Kozitsin A., Afonin S., Shachnev D. Determination of thematic proximity of scientific journals and conferences using big data technologies. CEUR Workshop Proceedings. 2020; 2543:407-413. Available at: http://ceur-ws.org/Vol-2543/spa-per12.pdf (accessed 09.02.2021). (In Eng.)
16. Afonin S.A., Kozitsyn A.S., Shachnev D.A. Programmnye mekhanizmy agregatsii dannykh, osnovannye na ontologicheskom predstavlenii struktury reliatsionnoi bazy naukometricheskikh dannykh [Software mechanisms for scientometrical data aggregation based on ontological representation of the relational database structure]. Programmnaia inzheneriia = Software Engineering. 2016; 7(9):408-413. (In Russ., abstract in Eng.) DOI: https://doi.org/10.17587/prin.7.408-413
17. Samoilov A.Yu., Nikitina E.Yu. Razrabotka sredstv avtomatizatsii poiska informatsii o cheloveke v otkrytykh istochnikakh seti internet [Development of tools for automated search of information about a person in open web sources]. Vestnik Permskogo universiteta Matematika. Mekhanika = Bulletin of Perm University. Mathematics. Mechanics. Computer Science. 2020; (1):74-79. (In Russ., abstract in Eng.) DOI: https://doi.org/10.17072/1993-0550-2020-1-74-79
18. Berger M., Zavrel J., Groth P. Effective Distributed Representations for Academic Expert Search. Proceedings of the First Workshop on Scholarly Document Processing. Association for Computational Linguistics; 2020. p. 56-71. (In Eng.) DOI: https://doi.org/10.18653/v1/2020.sdp-1.7
19. Ma L., Srivatsa M., Cansever D., Yan X., Kase S., Vanni M. Performance Bounds of Decentralized Search in Expert Networks for Query Answering. ACM Transactions on Knowledge Discovery from Data. 2019; 13(2):18. (In Eng.) DOI: https://doi.org/10.1145/3300230
20. Eremenko G.O. ELIBRARY.RU: kurs na povyshenie kachestva kontenta [ELIBRARY.RU: Course to Improve the Quality of Content]. Universitetskaia kniga = University Book. 2016; (3):62-68. Available at: https://elibrary.ru/item.asp?id=25721733 (accessed 09.02.2021). (In Russ.)
21. Vasenin V., Lunev K., Afonin S., Shachnev D. Methods for Intelligent Data Analysis Based on Keywords and Implicit Relations: The Case of "ISTINA" Data Analysis System. 2019 Actual Problems of Systems and Software Engineering (APSSE 2019). IEEE, Moscow, Russia; 2019. p. 157-161. (In Eng.) DOI: https://doi.org/10.1109/APSSE47353.2019.00027
22. Shundeev A.S. Ob izmenenii razmernosti vektornogo predstavleniia tekstovykh dannykh [On changing the dimension of the document embeddings]. Programmnaia inzheneriia = Software Engineering. 2019; 10(6):265-273. (In Russ., abstract in Eng.) DOI: https://doi.org/10.17587/prin.10.265-273
23. Wu S., Wei G. High dimensional data Clustering Algorithm Based on Sparse Feature Vector for Categorical Attributes. 2010 International Conference on Logistics Systems and Intelligent Management (ICLSIM). IEEE, Harbin, China; 2010. p. 973-976. (In Eng.) DOI: https://doi.org/10.1109/ICLSIM.2010.5461099
24. Sadovnichy V.A. Vasenin V.A. Intellektualnaia sistema tematicheskogo issledovaniia naukometricheskikh dannykh: predposylki sozdaniia i metodologiia razrabotki. Chast 1 [Intellectual System of Thematic Investigation of Scientometrical Data: Background of Creation and Methodology of Development]. Programmnaia inzheneriia = Software Engineering. 2018; 9(2):51-58. (In Russ., abstract in Eng.) DOI: https://doi.org/10.17587/prin.9.51-58
25. Guskov A. E., Rossiiskaia naukometriia: obzor issledovanii [Russian Scientometrics: A Review of Research-es]. Bibliosfera = Bibliosphere. 2015; (3):75-86. Available at: https://elibrary.ru/item.asp?id=24100709 (accessed 09.02.2021). (In Russ., abstract in Eng.)
26. Vasenin V.A., Afonin S.A., Zanchurin M.A., Zenzinov A.A., Kozitsin A.S., Korshunov A.A., Krivchikov M.A., Shachnev D.A. Intellektualnaia sistema tematicheskogo issledovaniia naukometricheskikh dannykh: sostoianie i perspektivy [Intellectual System of Thematic Investigation of Scientometrical Data: State and Prospects]. Proceedings of the International conference “Knowledge – Ontology – Theories” (KONT-2019). IM SB RAS, Novosibirsk; 2019. p. 94-103. Available at: https://elibrary.ru/item.asp?id=42432021 (accessed 09.02.2021). (In Russ.)
27. Vasenin V.A., Afonin S.A., Kozitsin A.S., Golomazov D.D., Bahtin A.V., Gankin G.M. Intellektualnaia sistema tematicheskogo issledovaniia nauchno-tekhnicheskoi informatsii (ISTINA) [Intelligent System for Case Study of Scientific and Technical Information (ISTINA)]. Obozrenie prikladnoi i promyshlennoi matematiki = Review of Applied and Industrial Mathematics. 2012; 19(2):239-240. Available at: https://istina.msu.ru/publications/article/813649 (accessed 09.02.2021). (In Russ.)
28. Kozitsin A.S., Afonin S.A. Razreshenie neodnoznachnostei pri opredelenii avtorov publikatsii s ispolzovaniem grafov soavtorstva v bolshikh kollektsiiakh bibliograficheskikh dannykh [The Resolution of Ambiguities in the Identification of Authors of the Publication with the Use of Co-Authors’ Graphs in Large Collections of Bibliographic Data]. Programmnaia inzheneriia = Software Engineering. 2017; 8(12):556-562. (In Russ., abstract in Eng.) DOI: https://doi.org/10.17587/prin.8.556-562
29. Kozitsin A.S., Afonin S.A. Algoritm razresheniia neodnoznachnosti imen avtorov v IAS ISTINA [Algorithm for Resolving the Ambiguity of Author Names in IAS ISTINA]. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2020; 16(1):108-117. (In Russ., abstract in Eng.) DOI: https://doi.org/10.25559/SITITO.16.202001.108-117
30. Kozitsin A.S., Afonin S.A., Zenzinov A.A. Algoritm opredeleniia perevodov statei s ispolzovaniem statisticheskikh dannykh [Algorithm for linking translated articles using authorship statistics]. Elektronnye biblioteki = Russian Digital Libraries Journal. 2018; 21(6):494-505. Available at: https://elibrary.ru/item.asp?id=37028485 (accessed 09.02.2021). (In Russ., abstract in Eng.)
31. Gleich D.F. PageRank Beyond the Web. SIAM Review. 2015; 57(3):321-363. (In Eng.) DOI: https://doi.org/10.1137/140976649
Published
2021-04-15
How to Cite
KOZITSIN, Alexander Sergeevich; AFONIN, Sergey Alexandrovich; SHACHNEV, Dmitry Alekseevich. Keyword Search Algorithm for Specialists in a Thematic Area. Modern Information Technologies and IT-Education, [S.l.], v. 17, n. 1, p. 124-133, apr. 2021. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/711>. Date accessed: 09 jan. 2026. doi: https://doi.org/10.25559/SITITO.17.202101.711.
Section
Scientific software in education and science