Comparison of Language Models in Skills Extraction from Vacancies and Resumes
Abstract
The ability of large language models (LLMs) to “understand” large volumes of text data allows for consistent quality selection of candidates for company openings. The purpose of the work is to consider the capabilities of language models (LLM) and the specified method using vector representations in the tasks of extracting skills from texts of vacancies and resumes. Particular attention is paid to the use of skill ranking methods using LLM and the following method using the cosine distance between vector representations of skills. The study consisted of three experiments: the first experiment aimed to extract skill phrases from described work experiences from a text resume; the second involves assigning skills from the resume text to a reference set of functions from the job requirements; The third experiment aims to evaluate the best performance between the two skill sets. The result of the study is the selection of the best model and the derivation of functions from the summary text, as well as a comparison of the two sets of functions with each other. Experiments have shown that language models are superior to numerical methods in terms of accuracy and flexibility in determining the capabilities of a text. Using LLM to rank features using cosine distance has shown poor performance and accuracy in measuring features between job openings and resumes. However, the numerical method using vector representation methods showed better results in quality ranking and stability with increasing number of parliamentary examples. The results of this study have practical implications for the development of more accurate and efficient personnel selection systems. The introduction of language models into human resource management processes can improve the quality and speed of processing large volumes of data, which will lead to a more accurate and faster selection of qualified specialists.
References
2. Kalyan K.S. A survey of GPT-3 family large language models including ChatGPT and GPT-4. Natural Language Processing Journal. 2024;6:100048. https://doi.org/10.1016/j.nlp.2023.100048
3. Deming D.J., Kahn L.B., Skill Requirements across Firms and Labor Markets: Evidence from Job Postings for Professionals. Journal of Labor Economics. 2018;36(S1):S337-S369. https://doi.org/10.1086/694106
4. Komarova L.A., Zolkin A.L., Kornetov A.N., Pestin V.A. Research Methods and Mechanisms of Decision-Making in HR Management (Literature Review). Scientific and Technical Volga region Bulletin. 2023;(5):136-141. (In Russ., abstract in Eng.) EDN: ITNRZB
5. Tamburri D.A., Van Den Heuvel W.-J., Garriga M., Dataops for societal intelligence: a data pipeline for labor market skills extraction and matching. In: 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI). Las Vegas, NV, USA: IEEE Press; 2020. p. 391-394. https://doi.org/10.1109/IRI49571.2020.00063
6. Zhang M., Jensen K.N., Plank B., Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning. arXiv:2205.01381. 2022. https://doi.org/10.48550/arXiv.2205.01381
7. Zhang M., Jensen K.N., Sonniks S., Plank B., SkillSpan: Hard and soft skill extraction from English job postings. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics. Seattle, United States: IEEE Press; 2022. p. 4962-4984. https://doi.org/10.48550/arXiv.2204.12811
8. Jiechieu K.F.F., Tsopze N., Skills prediction based on multi-label resume classification using CNN with model predictions explanation. Neural Computing and Applications. 2021;33(10):5069-5087. https://doi.org/10.1007/s00521-020-05302-x
9. Fareri S., Melluso N., Chiarello F., Fantoni G., SkillNER: Mining and mapping soft skills from any text. Expert Systems with Applications. 2021;184:115544. https://doi.org/10.1016/j.eswa.2021.115544
10. Gaur B., et al. Semi-supervised deep learning based named entity recognition model to parse education section of resumes. Neural Computing and Applications. 2021;33:5705-5718. https://doi.org/10.1007/s00521-020-05351-2
11. Paaß G., Giesselbach S. Pre-trained Language Models. In: Foundation Models for Natural Language Processing. Artificial Intelligence: Foundations, Theory, and Algorithms. Cham: Springer; 2023. p. 19-78. https://doi.org/10.1007/978-3-031-23190-2_2
12. Komarova L.A., Cheremuhin A.D. Increasing the efficiency of recruitment based on deep neural networks. Journal of Applied Informatics. 2024;2(110):10-22. (In Russ., abstract in Eng.) https://doi.org/10.37791/2687-0649-2024-19-2-10-22
13. Wings I., Nanda R., Adebayo K.J., A Context-Aware Approach for Extracting Hard and Soft Skills. Procedia Computer Science. 2021;193:163-172. https://doi.org/10.1016/j.procs.2021.10.016
14. Botov D., Klenin J., Melnikov A., Dmitrin Y., Nikolaev I., Vinel M. Mining Labor Market Requirements Using Distributional Semantic Models and Deep Learning. In: Abramowicz W., Corchuelo R. (eds.) Business Information Systems. BIS 2019. Lecture Notes in Business Information Processing. Vol. 354. Cham: Springer; 2019. p. 177-190. https://doi.org/10.1007/978-3-030-20482-2_15
15. Tian X., Pavur R., Han H., Zhang L. A machine learning-based human resources recruitment system for business process management: using LSA, BERT and SVM. Business Process Management Journal. 2023;29(1):202-222. https://doi.org/10.1108/BPMJ-08-2022-0389
16. Emary E. A proposed Emergent Skill Extraction Methodology from Unstructured Text. In: Proceedings of the Federated Africa and Middle East Conference on Software Engineering (FAMECSE '22). New York, NY, USA: Association for Computing Machinery; 2022. p. 26-30. https://doi.org/10.1145/3531056.3531071
17. Nikolaev I.E. Knowledge and skills extraction from the job requirements texts. Ontology of Designing. 2023;13(2):282-293. (In Russ., abstract in Eng.) https://doi.org/10.18287/2223-9537-2023-13-2-282-293
18. Nikolaev I.E. An intelligent method for generating a list of job profile requirements based on neural network language models using ESCO taxonomy and online job corpus. Business Informatics. 2023;17(2):71-84. https://doi.org/10.17323/2587-814X.2023.2.71.84
19. Nguyen K.C., Zhang M., Montariol S., Bosselut A. Rethinking Skill Extraction in the Job Market Domain using Large Language Model. arXiv:2402.03832. 2024. https://doi.org/10.48550/arXiv.2402.03832
20. Bhola A., Halder K., Prasad A., Kan M.-Y. Retrieving skills from job descriptions: A language model based extreme multi-label classification framework. In: Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics. Barcelona, Spain (Online): IEEE Press; 2020. p. 5832-5842. https://doi.org/10.18653/v1/2020.coling-main.513
21. Haq M.U.U., Frazzetto P., Sperduti A., Da San Martino G. Improving Soft Skill Extraction via Data Augmentation and Embedding Manipulation. In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing (SAC '24). New York, NY, USA: Association for Computing Machinery; 2024. p. 987-996. https://doi.org/10.1145/3605098.3636010
22. Clavié B., Soulié G. Large Language Models as Batteries-Included Zero-Shot ESCO Skills Matchers. arXiv:2307.03539. 2023. https://doi.org/10.48550/arXiv.2307.03539
23. Decorte J.-J., Verlinden S., Hautte J.V., Deleu J., Develder C., Demeester T. Extreme Multi-Label Skill Extraction Training using Large Language Models. In: The International workshop on AI for Human Resources and Public Employment Services (AI4HR&PES) as part of ECML-PKDD. 2023. p. 1-12. Available at: https://ai4hrpes.github.io/ecmlpkdd2023/papers/ai4hrpes2023_paper_173.pdf (accessed 12.01.2024).
24. Nguyen K., Zhang M., Montariol S., Bosselut A., Rethinking Skill Extraction in the Job Market Domain using Large Language Models. In: Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024). St. Julian’s, Malta: Association for Computational Linguistics; 2024. p. 27-42. Available at: https://aclanthology.org/2024.nlp4hr-1.3/ (accessed 12.01.2024).
25. Fang C., Qin C., Zhang Q., Yao K., Zhang J., Zhu H., Zhuang F., Xiong H. RecruitPro: A Pretrained Language Model with Skill-Aware Prompt Learning for Intelligent Recruitment. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23). New York, NY, USA: Association for Computing Machinery; 2023. p. 3991-4002. https://doi.org/10.1145/3580305.3599894

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.