Представление знаний в виде графа: основные технологии и подходы

Аннотация

В последние десятилетия объем накопленной человечеством информации увеличился невероятно. Люди не могут эффективно анализировать такой объем с помощью традиционных алгоритмов и структур данных из-за того, что они не позволяют использовать семантические связи.
Таким образом, назрела необходимость в таком представлении информации, которое бы позволяло бы с одной стороны хранить огромное количество объектов и связей между ними, а с другой предоставляло высокоскоростной доступ к хранящимся данным, и, кроме того, сохраняло семантику. Одной из самых эффективных структур данных, позволяющей решать задачи подобного класса, является граф знаний, который относительно недавно появился и стал предметом исследований в последние годы. Пик интереса к графу знаний пришелся на то время, когда Google представил свою реализацию в 2012 году и стал использовать в своей поисковой машине, что значительно улучшило качество поиска. Однако до сих пор неясно, как воспользоваться данной технологией на практике из-за небольшого количества имеющейся информации по этой теме.
В этой статье мы рассматриваем все этапы реализации графа знаний, а также проблемы, с которыми возможно придется столкнуться при создании собственного экземпляра данной абстракции. Помимо этого, мы рассмотрим методы создания векторного представления информации для ее эффективного хранения в графе, а также практические шаги по его использованию.

Сведения об авторах

Vladislav Sergeevich Gurin, Санкт-Петербургский государственный университет

аспирант математико-механического факультета, исследователь

Eugene Victorovich Kostrov, Санкт-Петербургский государственный университет

исследователь

Yuliya Yuryevna Gavrilenko, Московский государственный университет имени М.В. Ломоносова

магистрант факультета космических исследований

Daniel Firasovich Saada, Московский государственный университет имени М.В. Ломоносова

магистрант факультета вычислительной математики и кибернетики

Eugene Albinovich Ilyushin, Московский государственный университет имени М.В. Ломоносова

аспирант, ведущий программист лаборатории открытых информационных технологий, факультет вычислительной математики и кибернетики

Ivan Vladimirovich Chizhov, Московский государственный университет имени М.В. Ломоносова

доцент кафедры информационной безопасности, факультет вычислительной математики и кибернетики

Литература

[1] The Google Knowledge Graph: Information gatekeeper or a force to be reckoned with? Strategic Direction. 2014; 30(4):15-17. (In Eng.) DOI: 10.1108/SD-04-2014-0049
[2] Bian J., Gao B., Liu T.Y. Knowledge-Powered Deep Learning for Word Embedding. In: Calders T., Esposito F., Hüllermeier E., Meo R. (Eds.) Ma­chine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science, vol. 8724. Springer, Berlin, Heidelberg, 2014, pp. 132-148. (In Eng.) DOI: 10.1007/978-3-662-44848-9_9
[3] Bordes A., Glorot X., Weston J., Bengio Y. A semantic matching en­ergy function for learning with multi-relational data: Application to word­-sense disambiguation. Machine Learning. 2014; 94(2):233-259. (In Eng.) DOI: 10.1007/s10994-013-5363-6
[4] Chen T., Dredze M., Weiner J.P., Hernandez L., Kimura J., Kharrazi H. Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods. JMIR Medical Informatics. 2019; 7(1):e13039. (In Eng.) DOI: 10.2196/13039
[5] Culotta A., Sorensen J. Dependency Tree Kernels for Relation Extraction. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04. Association for Computational Linguistics, 2004, pp. 423-es. (In Eng.) DOI: 10.3115/1218955.1219009
[6] Dai Q., Li Q., Tang J., Wang D. Adversarial Network Embedding. arXiv. 2017. Available at: http://arxiv.org/abs/1711.07838 (accessed 12.08.2019). (In Eng.)
[7] Ding J., Ma S., Jia W., Guo M. Jointly Modeling Structural and Textual Representation for Knowledge Graph Completion in Zero-Shot Scenario. In: Cai Y., Ishikawa Y., Xu J. (Eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science, vol. 10987. Springer, Cham, 2018, pp. 369-384. (In Eng.) DOI: 10.1007/978-3-319-96890-2_31
[8] Donnat C., Zitnik M., Hallac D., Leskovec J. Learning Structural Node Embeddings via Diffusion Wavelets. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). ACM Press, Association for Computing Machinery, New York, NY, USA, 2018, pp. 1320-1329. (In Eng.) DOI: 10.1145/3219819.3220025
[9] Finkel J.R., Grenager T., Manning C. Incorporating Non-Local Information into Information Extraction Systems by Gibbs Sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL ’05). Association for Computational Linguistics, USA, 2005, pp. 363-370. (In Eng.) DOI: 10.3115/1219840.1219885
[10] Andrieu C., de Freitas N., Doucet A., Jordan M.I. An Introduction to MCMC for Machine Learning. Machine Learning. 2003; 50:5-43. (In Eng.) DOI: 10.1023/A:1020281327116
[11] Grover A., Leskovec J. Node2vec: Scalable Feature Learning for Networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, New York, NY, USA, 2016, pp. 855-864. (In Eng.) DOI: 10.1145/2939672.2939754
[12] Hamilton W.L., Bajaj P., Zitnik M., Jurafsky D., Leskovec J. Embedding logical queries on knowledge graphs. arXiv:1806.01445 [cs.SI]. 2018. Available at: http://arxiv.org/abs/1806.01445 (accessed 12.08.2019). (In Eng.)
[13] Hearst M.A. Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th conference on Computational linguistics - Volume 2 (COLING ’92). Association for Computational Linguistics, USA, 1992, pp. 539-545. (In Eng.) DOI: 10.3115/992133.992154
[14] Hoffart J., Suchanek F.M., Berberich K., Weikum G. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence. 2013; 194; 28-61. (In Eng.) DOI: 10.1016/j.artint.2012.06.
[15] Hogan M. Facebook Data Storage Centers as the Archive’s Underbelly. Television & New Media. 2013; 16(1):3-18. (In Eng.) DOI: 10.1177/1527476413509415
[16] Huang X., Zhang J., Li D., Li P. Knowledge Graph Embedding Based Question Answering. In: Proceedings of the Twelfth ACM In­ternational Conference on Web Search and Data Mining (WSDM ’19). Association for Computing Machinery, New York, NY, USA, 2019, pp. 105-113. (In Eng.) DOI: 10.1145/3289600.3290956
[17] Lehmann J., Isele R., Jakob M., Jentzsch A., Kontokostas D., Mendes P.N., Hellmann S., Morsey M., van Kleef P., Auer S., Bizer C. DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web. 2015; 6(2):167-195. (In Eng.) DOI: 10.3233/SW-140134
[18] Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. Association for Computa­tional Linguistics, Barcelona, Spain, 2004, pp. 22-es. (In Eng.) DOI: 10.3115/1219044.1219066
[19] Karsai L., Fekete A., Kay J., Missier P. Clustering Provenance Facilitating Provenance Exploration through Data Abstraction. In: Proceed­ings of the Workshop on Human-In-the-Loop Data Analytics (HILDA'16). Association for Computing Machinery, New York, NY, USA, 2016, Article 6, pp. 1-5. (In Eng.) DOI: 10.1145/2939502.2939508
[20] Kertkeidkachorn N., Ichise R. T2KG: An End-to-End System for Creating Knowledge Graph from Unstructured Text. In: AAAI-17 Workshop on Knowledge-Based Techniques for Problem Solving and Reasoning WS-17-12, vol. WS-17. Association for the Advancement of Artificial Intelligence, 2017. (In Eng.) Available at: https://aaai.org/ocs/index.php/WS/AAAIW17/paper/view/15129 (accessed 12.08.2019). (In Eng.)
[21] Kushmerick N. Wrapper induction: Efficiency and expressive­ness. Artificial Intelligence. 2000; 118(1):15-68. (In Eng.) DOI: 10.1016/S0004-3702(99)00100-9
[22] Lin C.Y., Xue N., Zhao D., Huang X., Feng Y. (Eds.): Natural Language Un­derstanding and Intelligent Applications: 5th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2016, and 24th International Confer­ence on Computer Processing of Oriental Languages, ICCPOL 2016, Kunming, China, December 2-6, 2016, Proceedings, Lecture Notes in Computer Science, vol. 10102. Springer International Publishing, Cham, 2016. (In Eng.) DOI: 10.1007/978-3-319-50496-4
[23] Lissandrini M., Brugnara M., Velegrakis Y. Beyond Macrobenchmarks: Microbenchmark-Based Graph Database Evaluation. Proceedings of the VLDB Endowment. 2018; 12(4):390-403. (In Eng.) DOI: 10.14778/3297753.3297759
[24] Luo A., Gao S., Xu Y. Deep Semantic Match Model for Entity Linking Using Knowledge Graph and Text. Procedia Computer Science . 2018; 129:110-114. (In Eng.) DOI: 10.1016/j.procs.2018.03.057
[25] Malewicz G., Austern M.H., Bik A.J., Dehnert J.C., Horn I., Leiser N., Czajkowski G. Pregel: A System for Large-Scale Graph Processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, 2010, pp. 135-146. (In Eng.) DOI: 10.1145/1807167.1807184
[26] Manning C., Surdeanu M., Bauer J., Finkel J., Bethard S., McClosky D. The Stanford CoreNLP Natural Language Processing Toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Baltimore, Maryland, 2014, pp. 55-60. (In Eng.) DOI: 10.3115/v1/P14-5010
[27] Miller G.A. WordNet: A Lexical Database for English. Communications of the ACM. 1995; 38(11):39-41. (In Eng.) DOI: 10.1145/219717.219748
[28] Mitchell T., Cohen W., Hruschka E., Talukdar P., Yang B., Betteridge J., Carlson A., Dalvi B., Gardner M., Kisiel B., Krishnamurthy J., Lao N., Mazaitis K., Mohamed T., Nakashole N., Platanios E., Ritter A., Samadi M., Settles B., Wang R., Wijaya D., Gupta A., Chen X., Saparov A., Greaves M., Welling J. Never-Ending Learning. Communications of the ACM . 2018; 61(5):103-115. (In Eng.) DOI: 10.1145/3191513
[29] Moon C., Jones P., Samatova N.F. Learning Entity Type Embeddings for Knowledge Graph Completion. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17). Association for Computing Machinery, New York, NY, USA, 2017, pp. 2215-2218. (In Eng.) DOI: 10.1145/3132847.3133095
[30] Muslea I., Minton S., Knoblock C.A. Hierarchical Wrapper Induction for Semistructured Information Sources. Autonomous Agents and Multi-Agent Systems . 2001; 4 (1):93-114. (In Eng.) DOI: 10.1023/A:1010022931168
[31] Nguyen N.T., Miwa M., Tsuruoka Y., Chikayama T., Tojo S. Wide-coverage relation extraction from MEDLINE using deep syntax. BMC Bioinformatics. 2015; 16 (1):107. (In Eng.) DOI: 10.1186/s12859-015-0538-8
[32] Pan S.J., Toh Z., Su J. Transfer Joint Embedding for Cross-Domain Named Entity Recognition. ACM Transactions on Information Systems. 2013; 31 (2):1-27. (In Eng.) DOI: 10.1145/2457465.2457467
[33] Peng N., Poon H., Quirk C., Toutanova K., Yih W-t. Cross-Sentence N -ary Relation Extraction with Graph LSTMs. Transactions of the Association for Computational Linguistics . 2017; 5:101-115. (In Eng.) DOI: 10.1162/tacl\_a\_00049
[34] Perozzi B., Al-Rfou R., Skiena S. DeepWalk: Online Learning of Social Representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’14). Association for Computing Machinery, New York, NY, USA, 2014, pp. 701-710. (In Eng.) DOI: 10.1145/2623330.2623732
[35] Prokofyev R., Demartini G., Cudr´e-Mauroux P. Effective Named Entity Recognition for Idiosyncratic Web Collections. In: Proceedings of the 23rd international conference on World wide web (WWW ’14). Association for Computing Machinery, New York, NY, USA, 2014; 397-408. (In Eng.) DOI: 10.1145/2566486.2568013
[36] Ribeiro L.F.R., Saverese P.H.P., Figueiredo D.R. Struc2vec: Learning Node Representations from Structural Identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17). Association for Computing Machinery, New York, NY, USA, 2017, pp. 385-394. (In Eng.) DOI: 10.1145/3097983.3098061
[37] Rodriguez M.A., Neubauer P. Constructions from dots and lines. Bulletin of the American Society for Information Science and Technology. 2010; 36 (6):35-41. (In Eng.) DOI: 10.1002/bult.2010.1720360610
[38] Rodríguez J.M., Merlino H.D., Pesado P., García-Martínez R. Performance Evaluation of Knowledge Extraction Methods. In: Fujita H., Ali M., Selamat A., Sasaki J., Kurematsu M. (Eds.) Trends in Applied Knowledge-Based Systems and Data Science. IEA/AIE 2016. Lecture Notes in Computer Science, vol. 9799. Springer, Cham, 2016, pp. 16-22. (In Eng.) DOI: 10.1007/978-3-319-42007-3_2
[39] Tang J., Qu M., Wang M., Zhang M., Yan J., Mei Q. LINE: Large-scale Information Network Embedding. In: Proceedings of the 24th International Conference on World Wide Web (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 2015, pp. 1067-1077. (In Eng.) DOI: 10.1145/2736277.2741093
[40] Tianlei Z., Xinyu Z., Mu G. KeEL: knowledge enhanced entity linking in automatic biography construction. The Journal of China Universities of Posts and Telecommunications. 2015; 22(1):57-64, 71. (In Eng.) DOI: 10.1016/S1005-8885(15)60625-2
[41] Tran P.V. Learning to Make Predictions on Graphs with Autoencoders. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 2018, pp. 237-245. (In Eng.) DOI: 10.1109/DSAA.2018.00034
[42] Wang L., Cao Z., de Melo G., Liu Z. Relation Classification via Multi-Level Attention CNNs. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 2016, pp. 1298-1307. (In Eng.) DOI: 10.18653/v1/P16-1123
[43] Wang Z., Zhang J., Feng J., Chen Z. Knowledge Graph and Text Jointly Embedding. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1591-1601. (In Eng.) DOI: 10.3115/v1/D14-1167
[44] Wu Y.C., Fan T.K., Lee Y.S., Yen S.J. Extracting Named Entities Using Support Vector Machines. In: Bremer E.G., Hakenberg J., Han EH., Berrar D., Dubitzky W. (Eds.) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science, vol. 3886. Springer, Berlin, Heidelberg, 2006, pp. 91-103. (In Eng.) DOI: 10.1007/11683568_8
[45] Yan J., Wang C., Cheng W., Gao M., Zhou A. A retrospective of knowledge graphs. Frontiers of Computer Science. 2018; 12(1):55-74. (In Eng.) DOI: 10.1007/s11704-016-5228-9
[46] Zhang D., Li M., Jia Y., Wang Y., Cheng X. Efficient Parallel Translating Embedding for Knowledge Graphs. In: Proceedings of the International Conference on Web Intelligence (WI ’17). Association for Computing Machinery, New York, NY, USA, 2017, pp. 460-468. (In Eng.) DOI: 10.1145/3106426.3106447
[47] Zhao S-L., Hao R-X., Stewart I. The Generalized Three-Connectivity of Two Kinds of Cayley Graphs. The Computer Journal. 2019; 62(1):144-149. (In Eng.) DOI: 10.1093/computer_journal/bxy054
[48] Zhou D., Zhong D., He Y. Biomedical Relation Extraction: From Binary to Complex. Computational and Mathematical Methods in Medicine . 2014; 2014:298473. 18 pp. (In Eng.) DOI: 10.1155/2014/298473
[49] Zhou G.D., Su J. Named entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02). Association for Computational Linguistics, USA, 2002, pp. 473-480. (In Eng.) DOI: 10.3115/1073083.1073163
Опубликована
2019-12-23
Как цитировать
GURIN, Vladislav Sergeevich et al. Представление знаний в виде графа: основные технологии и подходы. Международный научный журнал «Современные информационные технологии и ИТ-образование», [S.l.], v. 15, n. 4, p. 912-922, dec. 2019. ISSN 2411-1473. Доступно на: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/561>. Дата доступа: 05 june 2020 doi: https://doi.org/10.25559/SITITO.15.201904.912-922.
Раздел
Исследования и разработки в области новых ИТ и их приложений