THE DETERMINATION METHOD FOR CONTEXTUAL MEANINGS OF WORDS AND DOCUMENTS

  • Елизавета Александровна Доренская Institute for Theoretical and Experimental Physics named by A.I. Alikhanov of National Research Centre «Kurchatov Institute» http://orcid.org/0000-0002-4249-5131
  • Юрий Алексеевич Семёнов Institute for Theoretical and Experimental Physics named by A.I. Alikhanov of National Research Centre «Kurchatov Institute»; Moscow Institute of Physics and Technology http://orcid.org/0000-0002-3855-3650

Abstract

Problems and methods are considered for program context recognition of words and text documents. Survey of existent text processing methods is provided, simple numeric algorithm is given for determination of words and documents context with a help of semantic net, having a form of tree type graph. Semantic net structure is described in detail. Given semantic net is needed to fix basic word W1 context by means of words-meaning W2 coupled with it. Words W2 represent possible W1 context meanings. For every word W2 correspond some words-characteristics W3. At the context calculation the distances between words W2 and W3 are taken into account. The distances are measured in words between. Every word W3 has metrics, according to the concept proximity to W2. There is a table of words W1,W2 and W3 with their metrics values. At context document analyses there was taken into account case or number words variations. Simple formula for context calculation is presented. Method of results proofing with a help of Chebyshev inequality is also provided. The context analyses method was checked by Monte-Carlo simulations. Tables of investigation results are provided and some recommendation for algorithm parameters tuning and optimization are also given. The analyses showed that proposed method is quite effective for context estimation at text analyses, and for any systems, where one needs computer recognition of context.

Author Biographies

Елизавета Александровна Доренская, Institute for Theoretical and Experimental Physics named by A.I. Alikhanov of National Research Centre «Kurchatov Institute»

software engineer

Юрий Алексеевич Семёнов, Institute for Theoretical and Experimental Physics named by A.I. Alikhanov of National Research Centre «Kurchatov Institute»; Moscow Institute of Physics and Technology

Сandidate of  Physical and Mathematical Sciences, Lead Researcher; Deputy Head of the Department of Informatics and Computer Networks, Institute of Nano-, Bio-, Information, Cognitive and Socio-humanistic Sciences and Technologies

References

[1] Ustalov D.A. Modeli, metody i algoritmy postroeniya semanticheskoj seti slov dlya zadach obrabotki estestvennogo yazyka. Diss. kand. fiz.-mat. nauk [Models, methods and algorithms for constructing a semantic network of words for natural language processing problems]. Ekaterinburg, 2017. 129 p. (In Russian)
[2] Bondarchuk D.V. Calculating the semantic relatedness of terms with the context set. Proceedings of the First Computer Image Analysis: Intelligent Solutions in Industrial Networks. Ekaterinburg. 2016, pp. 175-179. Available at: https://elibrary.ru/item.asp?id=28549507 (accessed 12.06.2018). (In Russian)
[3] Dobrynin V.Yu., Klyuev B.B., Nekrestyanov I.S. Evaluation of the thematic similarity of text documents. Digital Libraries: Advanced Methods and Technologies. Protvino, 2000, pp. 204-210. Available at: http://rcdl.ru/doc/2000/069.pdf (accessed 12.06.2018). (In Russian)
[4] Ilvovski D.A. Modeli, algoritmy i programmnye kompleksy obrabotki tekstovyh dannyh na osnove reshetok zamknutyh opisanij. Diss. kand. tekh. nauk [Models, algorithms and software systems for processing text data based on lattices of closed descriptions]. Moscow, 2014. 158 p. (In Russian)
[5] Malakhov D.A., Serebryakov V.A. The Semantic Search Model Based on the Thesaurus. CEUR Workshop Proceedings. 2017; 2022:191-196. Available at: http://ceur-ws.org/Vol-2022/paper32.pdf (accessed 12.06.2018). (In Russian)
[6] Voronina E.I., Kretov A.A., Popova I.V. Algorithms of semantic proximity assessment based on the lexical environment of the keywords in a text. Proceedings of Voronezh State University. Series: Systems analysis and information technologies. 2010; 1:148-153. Available at: https://elibrary.ru/item.asp?id=15199663 (accessed 12.06.2018). (In Russian)
[7] Kreines M.G. Text and text corpora models for information retrieval and analysis. Proceedings of MIPT. 2017; 9(3):132-142. Available at: https://elibrary.ru/item.asp?id=32736043 (accessed 12.06.2018). (In Russian)
[8] Turdakov D.Y. Metody i programmnye sredstva razresheniya leksicheskoj mnogoznachnosti terminov na osnove setej dokumentov. Diss. kand. fiz.-mat. nauk [Methods and software tools for the resolution of lexical ambiguity of terms based on networks of documents]. Moscow, 2010. 138 p. (In Russian)
[9] Prokhorov U.V., Rozanov U.A. Teoriya veroyatnostey. Osnovnye ponyatiya, predel'nye teoremy, sluchajnye process [Theory of probabilities. Basic concepts, limit theorems, random processes]. 2nd ed. Moscow: Nauka, 1973. 494 p. (In Russian)
[10] Rishel T., Perkins L.A., Yenduri S., Zand F. Determining the context of text using augmented latent semantic indexing. Journal of the American Society for Information Science and Technology. 2007; 58(14):2197-2204. DOI: 10.1002/asi.20687
[11] Chen J., Scholz U., Zhou R., Lange M. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions. PLoS Computational Biology. 2018; 14(3):e1006058. DOI: 10.1371/journal.pcbi.1006058
[12] Yang L., Zhang J. Automatic transfer learning for short text mining. EURASIP Journal on Wireless Communications and Networking. 2017; 2017(1):42. 8 p. DOI: 10.1186/s13638-017-0815-5
[13] Yan E., Williams J., Chen Z. Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach. PLoS ONE. 2017; 12(11):e0187762. DOI: 10.1371/journal.pone.0187762
[14] Arras L., Horn F., Montavon G., Müller K.-R., Samek W. "What is relevant in a text document?": An interpretable machine learning approach. PLoS ONE. 2017; 12(8):e0181142. DOI: 10.1371/journal.pone.0181142
[15] Eidlin A.A., Eidlina M.A., Samsonovich A.V. Analyzing weak semantic map of word senses. Procedia Computer Science. 2018; 123:140-148. DOI: 10.1016/j.procs.2018.01.023
[16] Samsonovich A.V. Weak Semantic Map of the Russian Language: Preliminary Results. Procedia Computer Science. 2016; 88:538-543. DOI: 10.1016/j.procs.2016.08.001
[17] Wei T., Lu Y., Chang H., Zhou Q., Bao X. A semantic approach for text clustering using WordNet and lexical chains. Expert Systems with Applications. 2015; 42(4):2264-2275. DOI: 10.1016/j.eswa.2014.10.023
[18] Zhan J., Dahal B. Using deep learning for short text understanding. Journal of Big Data. 2017; 4(34). 15 p. DOI: 10.1186/s40537-017-0095-2
[19] Khenner E., Nasraoui O. A bilingual semantic network of computing concepts. Procedia Computer Science. 2016; 80:2392-2396. DOI: 10.1016/j.procs.2016.05.460
[20] Batura T.V. Semantic analysis and methods of text meaning representation in computer linguistics. Programmnye produkty i sistemy = Software & Systems. 2016; 4:45-57. (In Russian) DOI: 10.15827/0236-235X.116.045-057
[21] Mozgovoy M.V. Mashinnyj semanticheskij analiz russkogo yazyka i ego primeneniya. Diss. kand. fiz.-mat. nauk [Machine semantic analysis and its applications of the Russian language]. St. Petersburg, 2006. 116 p. (In Russian)
[22] Nadezhdin E.N. Applied problems of semantic analysis of text documents. Fundamental research. 2017; 1:94-100. Available at: https://elibrary.ru/item.asp?id=28307282 (accessed 12.06.2018). (In Russian)
[23] Boyarsky K.K. Introduction to Computer Linguistics. St. Petersburg, ITMO University. 2013, 73 p. Available at: http://books.ifmo.ru/file/pdf/1470.pdf (accessed 12.06.2018). (In Russian)
[24] Shelmanov A.O. Issledovanie metodov avtomaticheskogo analiza tekstov i razrabotka integrirovannoj sistemy semantiko-sintaksicheskogo analiza. Diss. kand. tekh. nauk [Research of methods of automatic text analysis and development of an integrated system of semantic and syntactic analysis]. Moscow, 2015. 182 p. (In Russian)
[25] Batura T.V. Mathematical linguistics and automatic processing of natural language texts. Novosibirsk, NSU, 2016. 166 p. Available at: https://www.iis.nsk.su/files/book/file/Batura_Matlingvistika_i_avtomat._obrabotka_tekstov.pdf (accessed 12.06.2018). (In Russian)
[26] Marchenko О.О., Nikonenko А.A. The Contextual Semantic Analysis of Natural Language Text. System of Text Monitoring and Qualitative Estimation of the Focus Object. Artificial intelligence. 2008; 3:808-813. Available at: http://dspace.nbuv.gov.ua/bitstream/handle/123456789/7155/02-Marchenko.pdf?sequence=1 (accessed 12.06.2018). (In Russian)
[27] Bolshakova E.I., Vorontsov K.V., Efremova N.E., Klyshinsky E.S., Lukashevich N.V. Sayapin A.S. Automatic text processing in natural lang uage and data analysis. Moscow, HSE, 2017. 269 p. Available at: https://www.hse.ru/data/2017/08/12/1174382135/NLP_and_DA.pdf (accessed 12.06.2018). (In Russian)
[28] Orlova Yu.А. Avtomatizaciya semanticheskogo analiza teksta tekhnicheskogo zadaniya. Diss. kand. tekh. nauk [Automation of semantic analysis of the text of the technical task]. Volgograd, 2008. 228 p. Available at: https://elibrary.ru/item.asp?id=16191917 (accessed 12.06.2018). (In Russian)
[29] Svyatogor L., Gladun V. Semantic analysis of natural language texts: goals and instruments International. Book Series «Information Science and Computing». Knowledge – Dialogue – Solution. Supplement to International Journal «Information Technologies and Knowledge». 2009; 3:9-18. Available at: http://www.foibg.com/ibs_isc/ibs-15/ibs-15-p01.pdf (accessed 12.06.2018). (In Russian)
Published
2018-12-10
How to Cite
ДОРЕНСКАЯ, Елизавета Александровна; СЕМЁНОВ, Юрий Алексеевич. THE DETERMINATION METHOD FOR CONTEXTUAL MEANINGS OF WORDS AND DOCUMENTS. Modern Information Technologies and IT-Education, [S.l.], v. 14, n. 4, p. 896-902, dec. 2018. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/456>. Date accessed: 15 sep. 2025. doi: https://doi.org/10.25559/SITITO.14.201804.896-902.
Section
Research and development in the field of new IT and their applications