The Improved Algorithm for Calculation of the Contextual Words Meaning in the Text

  • Elizaveta Alexandrovna Dorenskaya Alikhanov Institute for Theoretical and Experimental Physics of National Research Center "Kurchatov Institute" http://orcid.org/0000-0002-4249-5131
  • Yuri Alexeyevich Semenov Alikhanov Institute for Theoretical and Experimental Physics of National Research Center "Kurchatov Institute"; Moscow Institute of Physics and Technology http://orcid.org/0000-0002-3855-3650

Abstract

Some modifications of the algorithm for context calculation, published in [1], are considered. A new solution for word and document context calculation  is proposed. To improve a context determination it is proposed to take into consideration distances between words W1 and W2. This approach is especially important, when W2 number is >1. The results of investigations of these two formulas are presented. For efficiency comparison of these formulas calculation has been made for 100 texts. There were built distributions for C average and dispersion, which were compared with model data from [1]. The weight function  has been optimized. The versions comparison was made according to the value of s/Сaver. The C dispersion was calculated for all version of the weight function. Dispersion of C appeared to be rather big because of great variation of text size, number W2 and W3, as well as wide distribution of words in the text. There is an example of  L distribution for W2=”компьютер”.

Author Biographies

Elizaveta Alexandrovna Dorenskaya, Alikhanov Institute for Theoretical and Experimental Physics of National Research Center "Kurchatov Institute"

Software Engineer

Yuri Alexeyevich Semenov, Alikhanov Institute for Theoretical and Experimental Physics of National Research Center "Kurchatov Institute"; Moscow Institute of Physics and Technology

Ph.D. (Phys.-Math.), Lead Researcher; Deputy Head of the Chair for Computer Science, Institute of Nano-, Bio-, Information, Cognitive and Socio-humanistic Sciences and Technologies

References

[1] Dorenskaya E.A., Semenov Y.A. The Determination Method for Contextual Meanings of Words and Documents. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2018; 14(4):896-902. (In Russ., abstract in Eng.) DOI: 10.25559/SITITO.14.201804.896-902
[2] Dorenskaya E.A., Semenov Y.A. About the Programming Techniques, Oriented to Minimize Errors. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2017; 13(2):50-56. (In Russ., abstract in Eng.) DOI: 10.25559/SITITO.2017.2.226
[3] Dorenskaya E.A., Semenov Y.A. New Methods of Minimizing the Errors in the Software. In: CEUR Workshop Proceedings: Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 -14, 2018, vol. 2267. 2018, pp. 150-154. Available at: http://ceur-ws.org/Vol-2267/150-154-paper-27.pdf (accessed 15.08.2019). (In Eng.)
[4] Semenov Y.A., Ovsyannikov A.P., Ovsyannikova T.V. Development of the algorithm bank and basics of the language for problem description to minimize a number of program errors. Proceedings of NIISI RAS. 2016; 6(2):96-100. Available at: https://elibrary.ru/item.asp?id=29798446 (accessed 15.08.2019). (In Russ., abstract in Eng.)
[5] Semenov Y.A. IT-Economy in 2016 and in 10 Years. Economic Strategies. 2017; 19(1):126-135. Available at:
[6] Rishel T., Perkins L.A., Yenduri S., Zand F. Determining the context of text using augmented latent semantic indexing. Journal of the American Society for Information Science and Technology. 2007; 58(14):2197-2204. (In Eng.) DOI: 10.1002/asi.20687
[7] Chen J., Scholz U., Zhou R., Lange M. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions. PLoS Computational Biology. 2018; 14(3):e1006058. (In Eng.) DOI: 10.1371/journal.pcbi.1006058
[8] Yang L., Zhang J. Automatic transfer learning for short text mining. Eurasip Journal on Wireless Communications and Networking. 2017; 2017(1):42. (In Eng.) DOI: 10.1186/s13638-017-0815-5
[9] Yan E., Williams J., Chen Z. Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach. PLoS ONE. 2017; 12(11):e0187762. (In Eng.) DOI: 10.1371/journal.pone.0187762
[10] Arras L., Horn F., Montavon G., Müller K.-R., Samek W. What is relevant in a text document?": An interpretable machine learning approach. PLoS ONE. 2017; 12(8):e0181142. (In Eng.) DOI: 10.1371/journal.pone.0181142
[11] Eidlin A.A., Eidlina M.A., Samsonovich A.V. Analyzing weak semantic map of word senses. Procedia Computer Science. 2018; 123:140-148. (In Eng.) DOI:
[12] Samsonovich A.V. Weak Semantic Map of the Russian Language: Preliminary Results. Procedia Computer Science. 2016; 88:538-543. (In Eng.) DOI: 10.1016/j.procs.2016.08.001
[13] Wei T., Lu Y., Chang H., Zhou Q., Bao X. A semantic approach for text clustering using WordNet and lexical chains. Expert Systems with Applications. 2015; 42(4):2264-2275. (In Eng.) DOI: 10.1016/j.eswa.2014.10.023
[14] Zhan J., Dahal B. Using deep learning for short text understanding. Journal of Big Data. 2017; 4(1):34. (In Eng.) DOI: 10.1186/s40537-017-0095-2
[15] Khenner E., Nasraoui O. A bilingual semantic network of computing concepts. Procedia Computer Science. 2016; 80:2392-2396. (In Eng.) DOI: 10.1016/j.procs.2016.05.460
[16] Yu B. Research on information retrieval model based on ontology. EURASIP Journal on Wireless Communications and Networking. 2019; 2019(1):30. (In Eng.) DOI: 10.1186/s13638-019-1354-z
[17] Yelkina E.E., Kononova O.V., Prokudin D.E. Typology of Contexts and Contextual Approach Principles in Multidisciplinary Scientific Research. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2019; 15(1):141-153. (In Russ., abstract in Eng.) DOI: 10.25559/SITITO.15.201901.141-153
[18] Komrakov A.A. Using Ontologies to Describe the Structure of Arrays of Information Exchange. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2019; 15(1):182-189. (In Russ., abstract in Eng.) DOI: 10.25559/SITITO.15.201901.182-189
[19] Barakhnin V.B., Kozhemyakina O.Yu., Rychkova E.V., Pastushkov I.S., Borzilova Y.S. The extraction of lexical and metrorhythmic features which are characteristic for the genre and the style and for their combinations within the process of automated processing of texts in Russian. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2018; 14(4):888-895. (In Russ., abstract in Eng.) DOI: 10.25559/SITITO.14.201804.888-895
[20] Krassovitsky A.M., Ualiyeva I.M., Meirambekkyzy Z., Mussabayev R.R. Lexicon-based approach in generalization evaluation in Russian language media. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2018; 14(3):567-572. (In Russ., abstract in Eng.) DOI: 10.25559/SITITO.14.201803.567-572
[21] Kogalovsky M.R., Parinov S.I. Semantic Annotation of Information Resources by Taxonomies in Scientific Digital Library. In: CEUR Workshop Proceedings: Selected Papers of the XIX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2017). Moscow, Russia, October 9-13, 2017, vol. 2022. 2017, pp. 301-310. Available at: http://ceur-ws.org/Vol-2022/paper47.pdf (accessed 15.08.2019). (In Russ., abstract in Eng.)
[22] Tsukanova Z.V. Strukturnye i semanticheskie osobennosti zagolovkov sovremennyh nauchnyh statej (na materiale russkogo i anglijskogo jazykov) [Structural and semantic features of the headings of modern scientific articles (by the material of Russian and English languages)]. Modern scientific researches and innovations. 2018; (5):33. Available at:
[23] Chapaykina N.E. Semanticheskij analiz tekstov. Osnovnye polozhenija [Semantic analysis of texts. Fundamentals]. Young Scientist. 2012; (5):112-115. Available at: https://elibrary.ru/item.asp?id=20470090 (accessed 15.08.2019). (In Russ.)
[24] Batura T.V. Metody i sistemy semanticheskogo analiza tekstov [Methods and systems of semantic text analysis]. Software Journal: Theory and Applications. 2016; (4). (In Russ.) DOI: 10.15827/2311-6749.21.220
[25] Bessmertny I.A. Knowledge visualization based on semantic networks. Programming and Computer Software. 2010; 36(4):197-204. (In Eng.) DOI:
[26] Ayusheeva N.N., Dikikh A.Yu. Model of constructing a semantic network of scientific text. Modern High Technologies. 2018; (6):9-13. Available at: https://www.elibrary.ru/item.asp?id=35197327 (accessed 15.08.2019). (In Russ., abstract in Eng.)
[27] Ustalov D.A., Sozykin A.V. A Software System for Automatic Construction of a Semantic Word Network. Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering. 2017; 6(2):69-83. (In Russ., abstract in Eng.) DOI: 10.14529/cmse170205
Published
2019-12-23
How to Cite
DORENSKAYA, Elizaveta Alexandrovna; SEMENOV, Yuri Alexeyevich. The Improved Algorithm for Calculation of the Contextual Words Meaning in the Text. Modern Information Technologies and IT-Education, [S.l.], v. 15, n. 4, p. 954-960, dec. 2019. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/559>. Date accessed: 10 july 2025. doi: https://doi.org/10.25559/SITITO.15.201904.954-960.
Section
Research and development in the field of new IT and their applications