On Measures and Metrics of Information Retrieval Relevance in Systems on Inorganic Substances Properties
Abstract
Information systems play a serious role in modern education, providing an information basis for many disciplines. One of the main tasks in integrating information systems into the educational process is to provide a relevant search for information consolidated from heterogeneous sources. In the field of inorganic chemistry and material science, set-theoretic methods for searching for relevant information are known, which ensure the construction of a sufficiently high-quality response to user requests. However, the problem of quantifying the relevance of information retrieval in this subject area remains open. In this paper we propose a method based on weighted graphs for quantifying the relevance of information retrieval in integrated systems on inorganic substances and materials properties. The vertices of the graph are heterogeneous chemical objects (systems, substances and crystal modifications) on which a metric is determined that estimates the similarity of chemical objects. In metric space, cost definition of the path between the vertices of the graph allows us to evaluate the chemical objects similarity (relevance), that is important in enabling the search for related chemical entities and their properties in the context of an integrated information system that consolidates Russian and foreign resources on inorganic substances properties (www.imet-db.ru). Thus, a relevance metric (introduced as a value inversely proportional to the cost of the graph path) allows us, from the material scientist’s point of view, to optimally rank the information that is displayed at the user's request at a single access point to consolidated information resources on inorganic substances properties. In addition to the metric on the graph, a measure is defined that is useful in finding out a complete informational description of a chemical object. The measure is used in the search for all properties of the object available in integrated resources, which is necessary when compiling a complete analytical description of a chemical object.
References
[2] Abdurazakov M.M., Monahov V.V., Nimatulaev M.M. What is the Integration of Pedagogical and Information Technologies. Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2016; 12(4):6-10. Available at: https://www.elibrary.ru/item.asp?id=28151051 (accessed 02.12.2019). (In Russ., abstract in Eng.).
[3] Kiselyova N.N., Dudarev V.A., Zemskov V.S. Computer Information Resources in Inorganic Chemistry and Materials Science. Russian Chemical Reviews. 2010; 79(2):145-166. (In Eng.) DOI: http://dx.doi.org/10.1070/RC2010v079n02ABEH004104
[4] Kiselyova N.N., Dudarev V.A., Stolyarenko A.V. Integrated system of databases on the properties of inorganic substances and materials. High Temperature. 2016; 54(2):215-222. (In Eng.) DOI: https://doi.org/10.1134/S0018151X16020085
[5] Kiselyova N.N., Dudarev V.A. "Information Resources on Inorganic Chemistry and Materials Science" Database. Informacionnye tehnologii = Information Technology. 2010; 12:63-66. Available at: https://www.elibrary.ru/item.asp?id=15510885 (accessed 02.12.2019). (In Russ., abstract in Eng.).
[6] Pence H.E., Williams A.J. Big Data and Chemical Education. Journal of Chemical Education. 2016; 93(3):504-508. (In Eng.) DOI: https://doi.org/10.1021/acs.jchemed.5b00524
[7] Kornyshko V.F., Dudarev V.A. Software Development for Distributed System of Russian Databases on Electronics Materials. Information Theories & Application. 2006; 13(2):121-126. Available at: http://www.foibg.com/ijita/vol13/ijita13-2-p03.pdf (accessed 02.12.2019). (In Eng.).
[8] Dudarev V.A., Kiselyova N.N., Xu Y., Yamazaki M. Virtual integration of the Russian and Japanese databases on properties of inorganic substances and materials. In: Symposium on Materials Database MITS (2009). Proceedings. Materials Database Station (MDBS). – Tsukuba, Japan; 2009. p. 37-48. (In Eng.).
[9] Dudarev V.A. Integratsiya informatsionnykh sistem v oblasti neorganicheskoy khimii i materialovedeniya [Integration of information systems in the field of inorganic chemistry and materials science] Krasand, Moscow; 2016. (In Russ.).
[10] Zhuravlev Yu.I., Ryazanov V.V., Sen’ko O.V. «RECOGNITION». Matematicheskiye metody. Programmnaya sistema. Prakticheskiye primeneniya [Mathematical methods. Software system. Practical solutions] Phasis, Moscow; 2006. (In Russ.).
[11] Pedregosa F., Varoquaux G., Gramfort A., et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 2011; 12:2825-2830. Available at: https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf (accessed 02.12.2019). (In Eng.).
[12] Sen’ko O.V., Kiselyova N.N., Dudarev V.A., Dokukin A.A., Ryazanov V.V. Various Machine Learning Methods Efficiency Comparison in Application to Inorganic Compounds Design. In: Kalinichenko L., Manolopoulos Ya., Stupnikov S., Skvortsov N., Sukhomlin V. (eds) Selected Papers of the XX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL’2018). October 9-12, 2018, Moscow. CEUR Workshop Proceedings. 2018; 2277:152-156. Available at: http://ceur-ws.org/Vol-2277/paper27.pdf (accessed 02.12.2019). (In Eng.).
[13] Park S.H., Talebi M., Amos R.I.J., Tyteca E., Haddad P.R., Szucs R., Pohl C.A., Dolan J.W. Towards a chromatographic similarity index to establish localised quantitative structure-retention relationships for retention prediction. II. Use of Tanimoto similarity index in ion chromatography. Journal of Chromatography A. 2017; 1523:173-182. (In Eng.) DOI: https://doi.org/10.1016/j.chroma.2017.2.54
[14] Bjørnøy S.H., Mandaric S., Bassett D.C., Åslund A.K.O., Ucar S., Andreassen J.-P., Strand B.L., Sikorski P. Gelling kinetics and in situ mineralization of alginate hydrogels: A correlative spatiotemporal characterization toolbox. Acta Biomaterialia. 2016; 44:243-253. (In Eng.) DOI: https://doi.org/10.1016/j.actbio.2016.7.46
[15] Park K., Ko Y-J., Durai P., Pan C-H. Machine learning-based chemical binding similarity using evolutionary relationships of target genes. Nucleic Acids Research. 2019; 47(20):e128. (In Eng.) DOI: https://doi.org/10.1093/nar/gkz743
[16] Wassenaara P., Rorijea E., Janssena N., Peijnenburga W., Vijver M. Chemical similarity to identify potential Substances of Very High Concern – An effective screening method. Computational Toxicology. 2019; 12:100110. (In Eng.) DOI: https://doi.org/10.1016/j.comtox.2019.100110
[17] Vogt M., Bajorath J. Modeling Tanimoto Similarity Value Distributions and Predicting Search Results. Molecular Informatics. 2017; 36(7):1600131. (In Eng.) DOI: https://doi.org/10.1002/minf.201600131
[18] Mutton T., Ridley D.D. Understanding Similarities and Differences between Two Prominent Web-Based Chemical Information and Data Retrieval Tools: Comments on Searches for Research Topics, Substances, and Reactions. Journal of Chemical Education. 2019; 96(10):2167-2179. (In Eng.) DOI: https://doi.org/10.1021/acs.jchemed.9b00268
[19] Skinnider M., Dejong C., Franczak B., McNicholas P., Magarvey N. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm. Journal of Cheminformatics.2017. Vol. 9, Article number: 46. (In Eng.) DOI: https://doi.org/10.1186/s13321-017-0234-y
[20] Wang Z., Liang L., Yin Z., Lin J. Improving chemical similarity ensemble approach in target prediction. Journal of Cheminformatics.2016. Vol. 8, Article number: 20. (In Eng.) DOI: https://doi.org/10.1186/s13321-016-0130-x

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.