On Measures and Metrics of Information Retrieval Relevance in Systems on Inorganic Substances Properties

Abstract

Information systems play a serious role in modern education, providing an information basis for many disciplines. One of the main tasks in integrating information systems into the educational process is to provide a relevant search for information consolidated from heterogeneous sources. In the field of inorganic chemistry and material science, set-theoretic methods for searching for relevant information are known, which ensure the construction of a sufficiently high-quality response to user requests. However, the problem of quantifying the relevance of information retrieval in this subject area remains open. In this paper we propose a method based on weighted graphs for quantifying the relevance of information retrieval in integrated systems on inorganic substances and materials properties. The vertices of the graph are heterogeneous chemical objects (systems, substances and crystal modifications) on which a metric is determined that estimates the similarity of chemical objects. In metric space, cost definition of the path between the vertices of the graph allows us to evaluate the chemical objects similarity (relevance), that is important in enabling the search for related chemical entities and their properties in the context of an integrated information system that consolidates Russian and foreign resources on inorganic substances properties (www.imet-db.ru). Thus, a relevance metric (introduced as a value inversely proportional to the cost of the graph path) allows us, from the material scientist’s point of view, to optimally rank the information that is displayed at the user's request at a single access point to consolidated information resources on inorganic substances properties. In addition to the metric on the graph, a measure is defined that is useful in finding out a complete informational description of a chemical object. The measure is used in the search for all properties of the object available in integrated resources, which is necessary when compiling a complete analytical description of a chemical object.

Author Biographies

Victor Anatolevich Dudarev, National Research University Higher School of Economics; National University of Science and Technology "MISIS"

Associate Professor of the School of Software Engineering, Faculty of Computer Science; Associate Professor of the Department of Automated Control Systems, College of IT & Automated Control Systems, Ph.D. (Engineering), Associate Professor

Igor Olegovich Temkin, National University of Science and Technology “MISIS”

Head of the Department of Automated Control Systems, College of IT & Automated Control Systems, Dr.Sci. (Engineering), Professor

References

[1] Blokhin E., Villars P. The PAULING FILE Project and Materials Platform for Data Science: From Big Data Toward Materials Genome. In: W. Andreoni, S. Yip (ed.) Handbook of Materials Modeling. Springer, Cham; 2019. p. 1-25. (In Eng.) DOI: https://doi.org/10.1007/978-3-319-42913-7_62-2
[2] Abdurazakov M.M., Monahov V.V., Nimatulaev M.M. What is the Integration of Pedagogical and Information Technologies. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2016; 12(4):6-10. Available at: https://www.elibrary.ru/item.asp?id=28151051 (accessed 02.12.2019). (In Russ., abstract in Eng.).
[3] Kiselyova N.N., Dudarev V.A., Zemskov V.S. Computer Information Resources in Inorganic Chemistry and Materials Science. Russian Chemical Reviews. 2010; 79(2):145-166. (In Eng.) DOI: http://dx.doi.org/10.1070/RC2010v079n02ABEH004104
[4] Kiselyova N.N., Dudarev V.A., Stolyarenko A.V. Integrated system of databases on the properties of inorganic substances and materials. High Temperature. 2016; 54(2):215-222. (In Eng.) DOI: https://doi.org/10.1134/S0018151X16020085
[5] Kiselyova N.N., Dudarev V.A. "Information Resources on Inorganic Chemistry and Materials Science" Database. Informacionnye tehnologii = Information Technology. 2010; 12:63-66. Available at: https://www.elibrary.ru/item.asp?id=15510885 (accessed 02.12.2019). (In Russ., abstract in Eng.).
[6] Pence H.E., Williams A.J. Big Data and Chemical Education. Journal of Chemical Education. 2016; 93(3):504-508. (In Eng.) DOI: https://doi.org/10.1021/acs.jchemed.5b00524
[7] Kornyshko V.F., Dudarev V.A. Software Development for Distributed System of Russian Databases on Electronics Materials. Information Theories & Application. 2006; 13(2):121-126. Available at: http://www.foibg.com/ijita/vol13/ijita13-2-p03.pdf (accessed 02.12.2019). (In Eng.).
[8] Dudarev V.A., Kiselyova N.N., Xu Y., Yamazaki M. Virtual integration of the Russian and Japanese databases on properties of inorganic substances and materials. In: Symposium on Materials Database MITS (2009). Proceedings. Materials Database Station (MDBS). – Tsukuba, Japan; 2009. p. 37-48. (In Eng.).
[9] Dudarev V.A. Integratsiya informatsionnykh sistem v oblasti neorganicheskoy khimii i materialovedeniya [Integration of information systems in the field of inorganic chemistry and materials science] Krasand, Moscow; 2016. (In Russ.).
[10] Zhuravlev Yu.I., Ryazanov V.V., Sen’ko O.V. «RECOGNITION». Matematicheskiye metody. Programmnaya sistema. Prakticheskiye primeneniya [Mathematical methods. Software system. Practical solutions] Phasis, Moscow; 2006. (In Russ.).
[11] Pedregosa F., Varoquaux G., Gramfort A., et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 2011; 12:2825-2830. Available at: https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf (accessed 02.12.2019). (In Eng.).
[12] Sen’ko O.V., Kiselyova N.N., Dudarev V.A., Dokukin A.A., Ryazanov V.V. Various Machine Learning Methods Efficiency Comparison in Application to Inorganic Compounds Design. In: Kalinichenko L., Manolopoulos Ya., Stupnikov S., Skvortsov N., Sukhomlin V. (eds) Selected Papers of the XX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL’2018). October 9-12, 2018, Moscow. CEUR Workshop Proceedings. 2018; 2277:152-156. Available at: http://ceur-ws.org/Vol-2277/paper27.pdf (accessed 02.12.2019). (In Eng.).
[13] Park S.H., Talebi M., Amos R.I.J., Tyteca E., Haddad P.R., Szucs R., Pohl C.A., Dolan J.W. Towards a chromatographic similarity index to establish localised quantitative structure-retention relationships for retention prediction. II. Use of Tanimoto similarity index in ion chromatography. Journal of Chromatography A. 2017; 1523:173-182. (In Eng.) DOI: https://doi.org/10.1016/j.chroma.2017.2.54
[14] Bjørnøy S.H., Mandaric S., Bassett D.C., Åslund A.K.O., Ucar S., Andreassen J.-P., Strand B.L., Sikorski P. Gelling kinetics and in situ mineralization of alginate hydrogels: A correlative spatiotemporal characterization toolbox. Acta Biomaterialia. 2016; 44:243-253. (In Eng.) DOI: https://doi.org/10.1016/j.actbio.2016.7.46
[15] Park K., Ko Y-J., Durai P., Pan C-H. Machine learning-based chemical binding similarity using evolutionary relationships of target genes. Nucleic Acids Research. 2019; 47(20):e128. (In Eng.) DOI: https://doi.org/10.1093/nar/gkz743
[16] Wassenaara P., Rorijea E., Janssena N., Peijnenburga W., Vijver M. Chemical similarity to identify potential Substances of Very High Concern – An effective screening method. Computational Toxicology. 2019; 12:100110. (In Eng.) DOI: https://doi.org/10.1016/j.comtox.2019.100110
[17] Vogt M., Bajorath J. Modeling Tanimoto Similarity Value Distributions and Predicting Search Results. Molecular Informatics. 2017; 36(7):1600131. (In Eng.) DOI: https://doi.org/10.1002/minf.201600131
[18] Mutton T., Ridley D.D. Understanding Similarities and Differences between Two Prominent Web-Based Chemical Information and Data Retrieval Tools: Comments on Searches for Research Topics, Substances, and Reactions. Journal of Chemical Education. 2019; 96(10):2167-2179. (In Eng.) DOI: https://doi.org/10.1021/acs.jchemed.9b00268
[19] Skinnider M., Dejong C., Franczak B., McNicholas P., Magarvey N. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm. Journal of Cheminformatics.2017. Vol. 9, Article number: 46. (In Eng.) DOI: https://doi.org/10.1186/s13321-017-0234-y
[20] Wang Z., Liang L., Yin Z., Lin J. Improving chemical similarity ensemble approach in target prediction. Journal of Cheminformatics.2016. Vol. 8, Article number: 20. (In Eng.) DOI: https://doi.org/10.1186/s13321-016-0130-x
Published
2020-05-25
How to Cite
DUDAREV, Victor Anatolevich; TEMKIN, Igor Olegovich. On Measures and Metrics of Information Retrieval Relevance in Systems on Inorganic Substances Properties. Modern Information Technologies and IT-Education, [S.l.], v. 16, n. 1, p. 13-22, may 2020. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/610>. Date accessed: 16 sep. 2025. doi: https://doi.org/10.25559/SITITO.16.202001.13-22.
Section
Theoretical Questions of Computer Science, Computer Mathematics