COMPARATIVE ANALYSIS OF RELATED SEQUENCES AND THEIR INCREMENTS ON THE BASIS OF DISCRIMINANT ANALYSIS

Abstract

The article is devoted to the study of the relationship between the lengths of orthologous proteins of four organisms, one of which is taken as the basic one ( more than 1200 proteins in total). The methods of multivariate statistical analysis are used, it is applied to pairs, triples and fours (strings) composed of lengths of orthologous proteins. The number of such lines is from 200 to 400. The analysis of pair correlations, orthogonal transformation and cluster analysis allowed us to distinguish two homogeneous clusters of four-lengths. At the same time, we studied the increments of the length of the orthologous protein relative to the basic organism. We showed that the lines form a non-uniform sample, and the increments form a homogeneous sample. Then the task was to expand the clusters with rows with incomplete data. It was shown that cluster analysis is not applicable for this task, so we used discriminant analysis with a training sample — clustering with complete data. A 100 percent separation of all incomplete rows by clusters was obtained; with the following description of the length dependences of clusters on the base. The adequacy of the resulting regression equations was tested. As a result of statistical analysis, the following conclusions were made. For a set of lengths of orthologous series, a generalizing factor was obtained, let's call it the size of an orthologic object from 4 lengths of orthologous proteins. For the given task such sizes of objects were obtained, and their average group values differ, they form two separate ranges of values, one for each group of the values obtained by other methods. For series of increments of the lengths of orthologous proteins from objects of four, an analysis performed by all methods showed homogeneity of the set. It was shown that the lengths of orthologous proteins have significant autocorrelation, as is the case with rows associated with the same basic series.

Author Biography

Светлана Николаевна Истомина, Moscow Aviation Institute (National Research University)

Candidate of Sciences in Chemistry, Associate Professor

References

[1] Seliverstov A.V., Zverkov O.A., Istomina S.N., Pirogov S.A., Kitsis P.S. Comparative Analysis of Apicoplast-Targeted Protein Extension Lengths in Apicomplexan Parasites. BioMed Research International. 2015; 2015:452958. DOI: 10.1155/2015/452958
[2] Seliverstov A.V., Rubanov L.I., Shilovsky G.A., Zverkov O.A., Lyubetsky V.A. Longevity in euarchontoglires: lost genes as a determinant. FEBS Open Bio. 2018; 8(Suppl. 1):456-457. DOI: 10.1002/2211-5463.12453
[3] Lyubetsky V.A., Gershgorin R.A., Gorbunov K.Yu. Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming. BMC Bioinformatics. 2017; 18:537. 18 pp. DOI: 10.1186/s12859-017-1944-x
[4] Lyubetsky V.A., Korotkova D.D., Ivanova A.S., Rubanov L.I., Seliverstov A.V., Zverkov O.A., Nesterenko A.M., Tereshina M.B., Zaraisky A.G. Novel transmembrane protein c-Answer revealed by bioinformatic screening of genes present only in well regenerating animals. FEBS Journal. 2017; 284(S1):155. DOI: doi.org/10.1111/febs.14174
[5] Korotkova D.D., Ivanova A.S., Lyubetsky V.A., Seliverstov A.V., Martynova N.Yu., Nesterenko A.M., Tereshina M.B., Zaraisky A.G. Novel FGF-signaling modulator c-Answer revealed by bioinformatics screening for genes present only in well-regenerative animals. Mechanisms of Development. 2017; 145:S49. DOI: doi.org/10.1016/j.mod.2017.04.089
[6] Istomina S.N. Comparative analysis of related series: lengths of orthologous proteins and their increments relative to the lengths of basic proteins. Modern Information Technology and IT-education. 2015; 11(2):594-599. Available at: https://elibrary.ru/item.asp?id=26167553 18 (accessed 05.08.2018). (In Russian)
[7] Gorbunov K.Yu., Lyubetsky V.A. The minimum-cost transformation of graphs. Doklady Mathematics. 2017; 96(2):503–505. DOI: 10.1134/S1064562417050313
[8] Gorbunov K.Yu., Lyubetsky V.A. Linear algorithm for minimal rearrangement of structures. Problems of Information Transmission. 2017; 53(1):60-78. Available at: https://elibrary.ru/item.asp?id=28876248 (accessed 05.08.2018). (In Russian)
[9] Lyubetsky V.A. Linear algorithm for minimal rearrangement of structures. Problems of Information Transmission. 2017; 53(1):55–72. DOI: 10.1134/S0032946017010057
[10] Gorbunov K.Yu., Lyubetsky V.A. A linear algorithm for the shortest transformation of graphs with different operation costs. Journal of Communications Technology and Electronics. 2017; 62(6):653–662. DOI: 10.1134/S1064226917060092
[11] Korolev S.A., Zverkov O.A., Seliverstov A.V., Lyubetsky V.A. Ribosome reinitiation at leader peptides increases translation of bacterial proteins. Biology Direct. 2016; 11(1):20. 6 pp. DOI: 10.1186/s13062-016-0123-8
[12] Lyubetsky V.A., Gershgorin R.A., Seliverstov A.V., Gorbunov K.Yu. Algorithms for reconstruction of chromosomal structures. BMC Bioinformatics. 2016; 17(1):40. 23 pp. DOI: 10.1186/s12859-016-0878-z
[13] Rubanov L.I., Seliverstov A.V., Zverkov O.A., Lyubetsky V.A. A method for identification of highly conserved elements and evolutionary analysis of superphylum Alveolata. BMC Bioinformatics. 2016; 17(1):385. 16 pp. DOI: 10.1186/s12859-016-1257-5
[14] Gorbunov K.Yu., Gershgorin R.A., Lyubetsky V.A. Rearrangement and Inference of Chromosome Structures. Molecular Biology. 2015; 49(3):327–338. DOI: 10.1134/S0026893315030073
[15] Zverkov O.A., Seliverstov A.V., Lyubetsky V.A. A Database of Plastid Protein Families from Red Algae and Apicomplexa and Expression Regulation of the moeB Gene. BioMed Research International. 2015; 2015:510598. 5 pp. DOI: 10.1155/2015/510598
[16] Rusin L.Yu., Lyubetskaya E.V., Gorbunov K.Yu., Lyubetsky V.A. Reconciliation of Gene and Species Trees. BioMed Research International. 2014; 2014:642089. 22 pp. DOI: 10.1155/2014/642089
[17] Lyubetsky V.A., Korolev S.A., Seliverstov A.V., Zverkov O.A., Rubanov L.I. Gene expression regulation of the PF00480 or PF14340 domain proteins suggests their involvement in sulfur metabolism. Computational Biology and Chemistry. 2014; 49:7-13. DOI: 10.1016/j.compbiolchem.2014.01.001
Published
2018-09-30
How to Cite
ИСТОМИНА, Светлана Николаевна. COMPARATIVE ANALYSIS OF RELATED SEQUENCES AND THEIR INCREMENTS ON THE BASIS OF DISCRIMINANT ANALYSIS. Modern Information Technologies and IT-Education, [S.l.], v. 14, n. 3, p. 672-678, sep. 2018. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/434>. Date accessed: 31 may 2026. doi: https://doi.org/10.25559/SITITO.14.201803.672-678.