Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations
Abstract
The construction of the composite index of a system can be considered as a problem of separating signal from noise. The signal in this case is the weight coefficients of the linear convolution of indicators. The weights to be determined should reflect the structure of the system being evaluated. However, principal component analysis and factor analysis determine the structure of principal components and principal factors differently for different observations. The reason for this may be the presence of inevitable errors in the used data. A solution of the problem requires a detailed understanding of input data errors’ influence on the calculated model’s parameters. The article discusses the use of the finite difference method for evaluating statistical data quality in the problem of calculating the integral characteristic of a system for a number of observations. For this technique to be applicable, the data must be approximated with polynomials of lower degrees than the number of observations minus one. The assumption is tested empirically on a specific data set. 37 variables characterizing the quality of life of the population of Russia for 2010-2017 are considered. The dependencies of the quality of data approximation on the degree of polynomial regression are analyzed. The results of the numerical experiment make it possible to draw a conclusion about the legitimacy of evaluating data errors using the finite difference method. The use of the finite difference apparatus for analyzing the data shows the presence of fatal errors from 0.59% to 28.92%. Therefore, obtaining the composite characteristics of objects on the basis of such data must necessarily take into account the presence of a fatal error. In particular, the number of parameters characterizing the system should be large enough to compensate for random errors with averaging.
References
[2] Zhgun T.V. Investigation of data quality in the problem of calculating the composite index of a system from a series of observations. Journal of Physics: Conference Series. 2020; 1658:012082. (In Eng.) DOI: https://doi.org/10.1088/1742-6596/1658/1/012082
[3] Bandura R. Composite indicators and rankings: inventory 2011. Tech. rep., United Nations Development Programme – Office of Development Studies; 2011. (In Eng.)
[4] Saltelli A., Mundo G., Nardo M. From Complexity to Multidimensionality: The Role of Composite Indicators for Advocacy of EU Reform. Review of Business and Economic Literature. 2006; LI(3):221-235. Available at: https://ideas.repec.org/a/ete/revbec/20060303.html (accessed 21.06.2020). (In Eng.)
[5] Foa R., Tanner J.C. Methodology of the Indices of Social Development. ISD Working Paper Series. 2012; 04. International Institute of Social Studies of Erasmus University Rotterdam (ISS), The Hague. Available at: http://repub.eur.nl/pub/50510/ISD-WP-2012-4.pdf (accessed 21.06.2020). (In Eng.)
[6] Mundo G., Nardo M. Noncompensatory/nonlinear composite indicators for ranking countries: a defensible setting. Applied Economics. 2009; 41(12):1513-1523. (In Eng.) DOI: https://doi.org/10.1080/00036840601019364
[7] Auerbach A.J., Gorodnichenko Yu., Murphy D. Macroeconomic Frameworks. NBER Working Paper. 2019; 26365. (In Eng.)
[8] Nardo M., Saisana M., Saltelli A., Tarantola S. Tools for Composite Indicators Building. Joint Research Centre, Ispra, Italy; 2005. Available at: https://ec.europa.eu/jrc/en/publication/eur-scientific-and-technical-research-reports/tools-composite-indicators-building(accessed 21.06.2020). (In Eng.)
[9] Krishnan V. Development of a Multidimensional Living Conditions Index (LCI). Social Indicators Research. 2015; 120(2):455-481. (In Eng.) DOI: https://doi.org/10.1007/s11205-014-0591-0
[10] Jacobs R., Goddard M., Smith P. Measuring Performance: An Examination of Composite Performance Indicators: A report for the Department of Health. York, UK: Centre for Health Economics, University of York; 2004. Available at: https://www.york.ac.uk/che/pdf/tp29.pdf (accessed 21.06.2020). (In Eng.)
[11] Zhgun T.V. Complex index of a system's quality for a set of observations. Journal of Physics: Conference Series. 2019; 1352(1):012064. (In Eng.) DOI: https://doi.org/10.1088/1742-6596/1352/1/012064
[12] Becker W., Saisana M., Paruolo P., Vandecasteele I. Weights and importance in composite indicators: Closing the gap. Ecological Indicators. 2017; 80:12-22. (In Eng.) DOI: https://doi.org/10.1016/j.ecolind.2017.03.056
[13] Paruolo P., Saisana M., Saltelli A. Ratings and rankings: voodoo or science? Journal of the Royal Statistical Society: Series A (Statistics in Society). 2013; 176(3):609-634. (In Eng.) DOI: https://doi.org/10.1111/j.1467-985X.2012.01059.x
[14] Mazziotta M., Pareto A. On The Construction Of Composite Indices By Principal Components Analysis. RIEDS - Rivista Italiana di Economia, Demografia e Statistica - Italian Review of Economics, Demography and Statistics. 2016; 70(1):103-109. Available at: http://www.sieds.it/listing/RePEc/journl/2016LXX_N1_RIEDS_103-109_Mazziotta_Pareto.pdf (accessed 21.06.2020). (In Eng.)
[15] Zhgun T.V . Method for evaluating the robustness of rankings generated by composite indices. Journal of Physics: Conference Series. 2019; 1352(1):012064. (In Eng.) DOI: https://doi.org/10.1088/1742-6596/1352/1/012065
[16] Batini C., Scannapieca M. Data Quality Dimensions. In: Data Quality. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg; 2006. p. 19-49. (In Eng.) DOI: https://doi.org/10.1007/3-540-33173-5_2
[17] Herzog T.N., Scheuren F.J., Winkler W.E. What is Data Quality and Why Should We Care? In: Data Quality and Record Linkage Techniques. Springer, New York, NY; 2007. p. 7-15. (In Eng.) DOI: https://doi.org/10.1007/0-387-69505-2_2
[18] Wang R.Y., Kon H.B., Madnick S.E. Data quality requirements analysis and modeling. In: Proceedings of IEEE 9th International Conference on Data Engineering. Vienna, Austria; 1993. p. 670-677. (In Eng.) DOI: https://doi.org/10.1109/ICDE.1993.344012
[19] Weisberg S. Applied Linear Regression. 4th ed. Hoboken, NJ: Wiley; 2014. (In Eng.)
[20] Seber G.A.F., Lee A.J. Linear Regression Analysis. 2 nd ed. John Wiley & Sons, Inc.; 2003. (In Eng.) DOI: https://doi.org/10.1002/9780471722199
[21] Hoffmann J.P, Shafer K. Linear Regression Analysis. Assumptions and Applications. Washington: NASW Press; 2015. (In Eng.)
[22] Bingham N.H., Fry J.M. Regression: Linear Models in Statistics. London: Springer; 2010. (In Eng.) DOI: https://doi.org/10.1007/978-1-84882-969-5
[23] Montgomery D.C., Peck E.A., Vining G.G. Introduction to Linear Regression Analysis. Fifth ed. New York: Wiley; 2012. (In Eng.)
[24] Isakin M.A. Modification of the K-means method with an unknown number of classes. Applied Econometrics. 2006; (4):62-73. Available at: https://www.elibrary.ru/item.asp?id=9482376 (accessed 21.06.2020). (In Russ., abstract in Eng.)
[25] Aivazian S., Stepanov V., Kozlova M. Measuring the synthetic categories of quality of life in a region and identification of main trends to improve the social and economic policy (Samara region and its constituent territories). Applied Econometrics. 2006; (2):18-84. Available at: https://www.elibrary.ru/item.asp?id=9482361 (accessed 21.06.2020). (In Russ., abstract in Eng.)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication policy of the journal is based on traditional ethical principles of the Russian scientific periodicals and is built in terms of ethical norms of editors and publishers work stated in Code of Conduct and Best Practice Guidelines for Journal Editors and Code of Conduct for Journal Publishers, developed by the Committee on Publication Ethics (COPE). In the course of publishing editorial board of the journal is led by international rules for copyright protection, statutory regulations of the Russian Federation as well as international standards of publishing.
Authors publishing articles in this journal agree to the following: They retain copyright and grant the journal right of first publication of the work, which is automatically licensed under the Creative Commons Attribution License (CC BY license). Users can use, reuse and build upon the material published in this journal provided that such uses are fully attributed.