Evaluating the Quality of Data Registration Systems
Abstract
The problem of uncertainty in the quality of data registration systems, which provide data about the system, is one of the most significant problems in building management systems for complex objects. This problem is most acute in the management of poorly formalized (soft) systems, such as social and socioeconomic systems. The presence of such uncertainty dictates the need to adapt control algorithms in real time depending on the characteristics of the uncer-tainty of the input parameters of the controlled system. The uncertainty in the input parameters of the controlled system is defined by the inherent properties of the data registration system, typ-ically represented by government statistical bodies. The methodology presented in this work of-fers a formalized and computationally simple algorithm for assessing the quality of individual parameters and the overall data registration system, based on a series of observations. For the chosen quality characteristic, specific measures (metrics), evaluation models, and standard val-ues have been established, where the system's quality characteristic values are deemed standard. The proposed methodology is applied to analyze the effectiveness of the functioning of the state statistical services of the Russian Federation, the United Kingdom, Sweden, and Japan based on the analysis of a subset of statistical data registered in all the countries considered. These are data characterizing the mortality of the population for 2013-2020. The results of this analysis conclusively demonstrate varying quality levels in the state statistical systems of the countries in question. The assessment of the data registration system in Russia is close to the chosen normative (threshold) value of the standard. This evaluation ranks the Russian state statistical system as the best among those reviewed, characterizing its functioning as normal. Japan's data registration system's quality is close to Russia's, suggesting its performance is satisfactory but slightly inferior. In contrast, the data registration systems in the United Kingdom and Sweden deliver significantly lower quality data. The quality assessments of these countries' statistical services far exceed the normative values, indicating their functioning is subpar.
References
2. Mihindukulasooriya N., García-Castro R., Priyatna F., Ruckhaus E., Saturno N. A Linked Data Profiling Service for Quality Assessment. In: Blomqvist E., Hose K., Paulheim H., Ławrynowicz A., Ciravegna F., Hartig O. (eds.) The Semantic Web: ESWC 2017 Satellite Events. ESWC 2017. Lecture Notes in Computer Science. Vol. 10577. Cham: Springer; 2017. p. 335-340. https://doi.org/10.1007/978-3-319-70407-4_42
3. Debattista J., Auer S., Lange C. Luzzu A Methodology and Framework for Linked Data Quality Assessment. Journal of Data and Information Quality. 2016;8(1):4. https://doi.org/10.1145/2992786
4. Zubair N., Niranjan A., Hebbar K., Simmhan Y. Characterizing IoT Data and its Quality for Use. arXiv:1906.10497. https://doi.org/10.48550/arXiv.1906.10497
5. Luo T., Huang J., Kanhere S.S., Zhang J., Das S.K. Improving IoT Data Quality in Mobile Crowd Sensing: A Cross Validation Approach. IEEE Internet of Things Journal. 2019;6(3):5651-5664. https://doi.org/10.1109/JIOT.2019.2904704
6. Karkouch A., et al. Data quality in internet of things: A state-of-the-art survey. Journal of Network and Computer Applications. 2016;73:57-81. https://doi.org/10.1016/j.jnca.2016.08.002
7. Salvatore C., Biffignandi S., Bianchi A. Social Media and Twitter Data Quality for New Social Indicators. Social Indicators Research. 2021;156(2):601-630. https://doi.org/10.1007/s11205-020-02296-w
8. Zengin O., Onder M.F. YouTube for information about side effects of biologic therapy: A social media analysis. International Journal of Rheumatic Diseases. 2020;23(12):1645-1650. https://doi.org/10.1111/1756-185X.14003
9. Al Qundus J., Paschke A., Gupta S., Alzouby A.M., Yousef M. Exploring the impact of short-text complexity and structure on its quality in social media. Journal of Enterprise Information Management. 2020;33(6):1443-1466. https://doi.org/10.1108/JEIM-06-2019-0156
10. Benedick P.-L., Robert J., Le Traon Y. A Systematic Approach for Evaluating Artificial Intelligence Models in Industrial Settings. Sensors. 2021;21(18):6195. https://doi.org/10.3390/s21186195
11. Siegert V. Content-and Context-Related Trust in Open Multi-agent Systems Using Linked Data. In: Bakaev M., Frasincar F., Ko I.Y. (eds.) Web Engineering. ICWE 2019. Lecture Notes in Computer Science. Vol. 11496. Cham: Springer; 2019. p. 541-547. https://doi.org/10.1007/978-3-030-19274-7_42
12. Bertossi L., Geerts F. Data Quality and Explainable AI. Journal of Data and Information Quality. 2020;12(2):11. https://doi.org/10.1145/3386687
13. Xiao Q., Shan M., Xiao X., et al. Evaluation Model of Industrial Operation Quality Under Multi-source Heterogeneous Data Information. International Journal of Fuzzy Systems. 2020;22:522-547. https://doi.org/10.1007/s40815-019-00776-x
14. Guo A., Liu X., Sun T. Research on Key Problems of Data Quality in Large Industrial Data Environment. In: Proceedings of the 3rd International Conference on Robotics, Control and Automation (ICRCA '18). New York, NY, USA: Association for Computing Machinery; 2018. p. 245-248. https://doi.org/10.1145/3265639.3265680
15. Azeroual O., Abuosba M. Improving the Data Quality in the Research Information Systems. International Journal of Computer Science and Information Security. 2017;15(11):82-86. Available at: https://dspacecris.eurocris.org/bitstream/11366/633/1/Azeroual_IJCSIS_201711.pdf (accessed 13.02.2024).
16. Azeroual O., Saake G., Abuosba M. Data Quality Measures and Data Cleansing for Research Information Systems. arXiv:1901.06208. 2019. https://doi.org/10.48550/arXiv.1901.06208
17. Timmerman Y., Bronselaer A. Measuring data quality in information systems research. Decision Support Systems. 2019;126:113138. https://doi.org/10.1016/j.dss.2019.113138
18. Cappiello C., Samá W., Vitali M. Quality awareness for a Successful Big Data Exploitation. In: Proceedings of the 22nd International Database Engineering & Applications Symposium (IDEAS '18). New York, NY, USA: Association for Computing Machinery; 2018. p. 37-44. https://doi.org/10.1145/3216122.3216124
19. Taleb I., Serhani M.A., Dssouli R. Big Data Quality: A Data Quality Profiling Model. In: Xia Y., Zhang L.J. (eds.) Services SERVICES 2019. SERVICES 2019. Lecture Notes in Computer Science. Vol. 11517. Cham: Springer; 2019. p. 61-77. https://doi.org/10.1007/978-3-030-23381-5_5
20. Ramasamy A., Chowdhury S. Big Data Quality Dimensions: A Systematic Literature Review. Journal of Information Systems and Technology Management. 2020;17:e202017003. https://doi.org/10.4301/S1807-1775202017003
21. Pezoulas V.C., et al. Medical data quality assessment: On the development of an automated framework for medical data curation. Computers in Biology and Medicine. 2019;107:270-283. https://doi.org/10.1016/j.compbiomed.2019.03.001
22. Terry A.L., et al. A basic model for assessing primary health care electronic medical record data quality. BMC medical informatics and decision making. 2019;19(1):30. https://doi.org/10.1186/s12911-019-0740-0
23. Lee K., Weiskopf N., Pathak J. A framework for data quality assessment in clinical research datasets. American Medical Informatics Association Annual Symposium Proceedings. 2018;2017:1080-1089. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC5977591/pdf/2731442.pdf (accessed 13.02.2024).
24. Bian J., et al. Assessing the practice of data quality evaluation in a national clinical data research network through a systematic scoping review in the era of real-world data. Journal of the American Medical Informatics Association. 2020;27(12):1999-2010. https://doi.org/10.1093/jamia/ocaa245
25. Fürber C. Data Quality. In: Data Quality Management with Semantic Technologies. Wiesbaden: Springer Gabler; 2016. p. 20-55. https://doi.org/10.1007/978-3-658-12225-6_3
26. Batini C., Scannapieca M. Data Quality Dimensions. In: Data Quality. Data-Centric Systems and Applications. Berlin, Heidelberg: Springer; 2006. p. 19-49. https://doi.org/10.1007/3-540-33173-5_2
27. Herzog T.N., Scheuren F.J., Winkler W.E. What is Data Quality and Why Should We Care? In: Data Quality and Record Linkage Techniques. New York, NY: Springer; 2007. p. 7-15. https://doi.org/10.1007/0-387-69505-2_2
28. Wang R.Y., Kon H.B., Madnick S.E. Data quality requirements analysis and modeling. In: Proceedings of IEEE 9th International Conference on Data Engineering. Vienna, Austria: IEEE Press; 1993. p. 670-677. https://doi.org/10.1109/ICDE.1993.344012
29. Jaya I., et al. Systematic review of data quality research. Journal of Theoretical and Applied Information Technology. 2019;97(21):3043-3068. Available at: https://www.jatit.org/volumes/Vol97No21/13Vol97No21.pdf (accessed 13.02.2024).
30. Yang Y., Yuan Y., Li B. Data Quality Evaluation: Methodology and Key Factors. In: Qiu M. (eds.) Smart Computing and Communication. SmartCom 2017. Lecture Notes in Computer Science. Vol. 10699. Cham: Springer; 2018. p. 222-230. https://doi.org/10.1007/978-3-319-73830-7_22
31. Zhgun T.V. Data transformations when constructing a composite system quality index. Journal of Physics: Conference Series. 2021;2052:012058. https://doi.org/10.1088/1742-6596/2052/1/012058
32. Zhgun T.V. Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations. Modern Information Technologies and IT-Education. 2020;16(2):295-303. (In Russ., abstract in Eng.) https://doi.org/10.25559/SITITO.16.202002.295-303
33. Zhgun T.V. Investigation of data quality in the problem of calculating the composite index of a system from a series of observations. Journal of Physics: Conference Series. 2020;1658(1):012082. https://doi.org/10.1088/1742-6596/1658/1/012082
34. Zhgun T.V. Metrics for Assessing the Quality of Numerical Parameters of Dynamic Systems. Modern Information Technologies and IT-Education. 2023;19(2):393-402. (In Russ., abstract in Eng.) https://doi.org/10.25559/SITITO.019.202302.393-402

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.