BENCHMARKING BIG SPATIAL DATA PROCESSING FRAMEWORKS

Abstract

Today, the processing of large amounts of spatial data in distributed systems plays a crucial role in many areas of our life. Large data are often unstructured, and special algorithms are required for its processing. One of the methods for analyzing large data is a spatial analysis. The source of large data in this case is often the geographical information system.
In this article, a benchmark is considered to evaluate the frameworks that work with such data. Also, the evaluation results of three frameworks according to developed benchmark are presented: GeoSpark, STARK, SpecialSpark. In the course of this paper, we considered a benchmark of two types: macrobenchmark and microbenchmark.
In the paper, testing of topological predicates on various topological data is also considered. The comparison was made using the DE-9IM model. This model is used to determine the types of topological relationships, such as intersection, equality, etc. The main problem of comparing the data frameworks was that not all of them support the operations of the selected model, which influenced the formation of scenarios for the microbenchmark and macrobenchmark, since it was impossible to compare all the DE-9IM items.

Author Biographies

Анастасия Алексеевна Гараева, Kazan National Research Technical University named after A.N. Tupolev - KAI

post-graduate student of the Department Applied Mathematics and Computer science
Research supervisor: Svetlana V. Novikova, Doctor of Technical Sciences, Professor of the Department Applied Mathematics and Computer science, Kazan National Research Technical University named after A.N. Tupolev -KAI

Айрат Дмитриевич Кабиров, Kazan National Research Technical University named after A.N. Tupolev - KAI

post-graduate student of the Department Security Systems
Research supervisor: Igor V. Anikin, PhD (Technical Sciences), Head of the Department Security Systems, Kazan National Research Technical University named after A.N. Tupolev – KAI

Ольга Викторовна Тихонова, Kazan National Research Technical University named after A.N. Tupolev - KAI

post-graduate student of the Department Applied Mathematics and Computer science
Research supervisor: Lilia Yu. Emaletdinova, Doctor of Technical Sciences, Professor of the Department Applied Mathematics and Computer science, Kazan National Research Technical University named after A.N. Tupolev – KAI

References

[1] Zakharova I., Kuzenkov O., Soldatenko I., Yazenin A., Novikova S., Medvedeva S., Chukhnov A. Using SEFI framework for modernization of requirements system for mathematical education in Russia. Proceedings of the 44th SEFI Annual Conference 2016 - Engineering Education on Top of the World: Industry University Cooperation (SEFI 2016). 12-15 September 2016, Tampere, Finland. 15 p. Available at: http://sefibenvwh.cluster023.hosting.ovh.net/wp-content/uploads/2017/09/zakharova-using-sefi-framework-for-modernization-of-requirements-system-for-mathematical-education-155.pdf (accessed 10.02.18)
[2] Soldatenko I., Kuzenkov O., Zakharova I., Balandin D., Biryukov R., Kuzenkova G., Yazenin A., Novikova S. Modernization of math-related courses in engineering education in Russia based on best practices in European and Russian universities. Proceedings of the 44th SEFI Annual Conference 2016 - Engineering Education on Top of the World: Industry University Cooperation (SEFI 2016). 12-15 September 2016, Tampere, Finland. 16 pp. Available at: http://sefibenvwh.cluster023.hosting.ovh.net/wp-content/uploads/2017/09/soldatenko-modernization-of-math-related-courses-in-engineering-education-in-russia-based-133.pdf (accessed 10.02.18)
[3] Zakharova I., Kuzenkov O. Experience in implementing the requirements of the educational and professional standards in the field of ICT in the Russian education. Modern information technologies and IT-education. 2016; 12(3)-1:17-31. Available at: https://elibrary.ru/item.asp?id=27411971 (accessed 10.02.18) (In Russian)
[4] Bedny A., Erushkina L., Kuzenkov O. Modernising educational programmes in ICT based on the Tuning methodology. Tuning Journal for Higher Education. 2014; 1(2):387-404. Available at: http://www.tuningjournal.org/article/view/32/20 (accessed 10.02.18)
[5] Yoo J.S., Boulware D., Kimmey D. A Parallel Spatial Co-location Mining Algorithm Based on MapReduce. Proceedings of 2014 IEEE International Congress on Big Data. 27 June - 2 July 2014, Anchorage, AK, USA, 2014. p. 25-31. DOI: https://doi.org/10.1109/BigData.Congress.2014.14
[6] Refaye E.M., Hegazy O. Parallel Co-location Pattern Mining Discovery: Constraint Neighborhood Approach. International Journal of Applied Engineering Research. 2016; 11(1):586-591. Available at: https://www.ripublication.com/ijaer16/ijaerv11n1_86.pdf (accessed 10.02.18)
[7] Shekhar S., Zhang P., Huang Y., Vatsavai R.R. Trends in Spatial Data Mining. In: Kargupta H., Joshi A., Sivakumar K., Yesha Y. (eds.), Data Mining: Next Generation Challenges and Future Directions. Cambridge, USA: AAAI/MIT Press, 2003.
[8] Shekhar S., Huang Y. Co-location Rules Mining: A Summary of Results. In: Jensen C.S., Schneider M., Seeger B., Tsotras V.J. (eds.), Advances in Spatial and Temporal Databases – 2001. 7th International Symposium (SSTD 2001), LNCS 2121, July 12–15, 2001, Redondo Beach, CA, USA, 2001. p. 236-256.
[9] Huang Y., Zhang P. On The Relationships Between Clustering and Spatial Co-location Pattern Mining. Proceedings of 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06). 13-15 Nov. 2006, Arlington, VA, USA, 2006. DOI: https://doi.org/10.1109/ICTAI.2006.91
[10] Han J., Kamber M., Pei J. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers, 2001. 31 p.
[11] You S., Zhang J., Gruenwald L. Large-scale spatial join query processing in Cloud. Proceedings of 31st IEEE International Conference on Data Engineering Workshops (ICDEW 2015). 13-17 April 2015, Seoul, South Korea, 2015. DOI: https://doi.org/10.1109/ICDEW.2015.7129541
[12] Egenhofer M., Sharma J., Mark D. A critical comparison of the 4-intersection and 9-intersection models for spatial relations: Formal analysis. Proceedings of 1993 the AutoCarto Conference, Minneapolis, MN, USA, 30 October – 1 November 1993. p. 1 - 12. Available at: https://pdfs.semanticscholar.org/4c7c/eeaf64f969f5bb05f07f81aa51259c246d18.pdf (accessed 10.02.18)
[13] McKenney M., Schneider M. Topological Relationships between Map Geometries. In: Haritsa J.R., Kotagiri R., Pudi V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, 2008. Vol. 4947. p. 110-125. DOI: https://doi.org/10.1007/978-3-540-78568-2_11
[14] Ray S., Simion B., Brown A.D. Jackpine: A benchmark to evaluate spatial database performance. Proceedings of 2011 IEEE 27th International Conference on Data Engineering (ICDE 2011). 11-16 April 2011, Hannover, Germany, 2011. p. 1139 – 1150. DOI: https://doi.org/10.1109/ICDE.2011.5767929
[15] Yu J., Wu J., Sarwat M. GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems (Vol. 03-06 November 2015). Association for Computing Machinery, 2015. p. 70. DOI: https://doi.org/10.1145/2820783.2820860
[16] Eldawy A., Mokbel M.F. SpatialHadoop: A MapReduce Framework for Spatial Data. Proceedings of 2015 IEEE 31st International Conference on Data Engineering (ICDE 2015). 13-17 April 2015, Seoul, South Korea, 2015. DOI: https://doi.org/10.1109/ICDE.2015.7113382
[17] Yu J., Wu J., Sarwat M. A demonstration of GeoSpark: A cluster computing framework for processing big spatial data. Proceedings of 2016 IEEE 32nd International Conference on Data Engineering (ICDE 2016). Institute of Electrical and Electronics Engineers Inc. 2016. p. 1410-1413. DOI: https://doi.org/10.1109/ICDE.2016.7498357
[18] You S., Gorloo K. (Eds.) Big Spatial Data Processing using Spark. Available at: http://simin.me/projects/spatialspark/ (accessed 10.02.18)
[19] Hagedorn S., Goetze P., Sattler K-U. Big Spatial Data Processing Frameworks: Feature and Performance Evaluation. Advances in Database Technology - EDBT 2017, 20th International Conference on Extending Database Technology, March 21-24, Venice, Italy, 2017. p. 490-493. Available at: https://openproceedings.org/2017/conf/edbt/paper-344.pdf (accessed 10.02.18)
[20] Rigaux P., Scholl M., Voisard A. Spatial Databases – With Application to GIS. Morgan Kaufmann Publishers. 2002. 410 p.
[21] Eldawy A., Mokbel M.F. Pigeon: A spatial MapReduce language. Proceedings of 2014 IEEE 30th International Conference on Data Engineering (ICDE 2014). IEEE Computer Society, 2014. p. 1242-1245. DOI: https://doi.org/10.1109/ICDE.2014.6816751
[22] Whitman R.T., Park M.B., Ambrose S.M., Hoel E.G. Spatial indexing and analytics on Hadoop. Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL'14). ACM, New York, NY, USA, 2014. p. 73-82. DOI: http://dx.doi.org/10.1145/2666310.2666387
[23] Hagedorn S., Sattler K.-U. Piglet: Interactive and Platform Transparent Analytics for RDF & Dynamic Data. Proceedings of the 25th International Conference Companion on World Wide Web (WWW '16 Companion). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 2016. 4 p. DOI: https://doi.org/10.1145/2872518.2890530
[24] Kresse W., Danko D.M. (Eds.) Springer handbook of geographic information (1. Ed. ed.). Berlin: Springer, 2010. p. 82–83. ISBN 9783540726807
[25] Shekhar S., Chawla S. Spatial Databases: A Tour. Prentice Hall, 2003. 262 p.
[26] ESRI Press. ESRI Press titles include Modeling Our World: The ESRI Guide to Geodatabase Design, and Designing Geodatabases: Case Studies in GIS Data Modeling, Ben Franklin Award winner, PMA, The Independent Book Publishers Association, 2005.
[27] Rigaux P., Scholl M., Voisard A. Spatial Databases – With Application to GIS. Morgan Kaufmann Publishers. 2001. 410 p.
[28] Amirian P., Basiri A., Winstanley A. Evaluation of Data Management Systems for Geospatial Big Data. In: Murgante B. et al. (eds) Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, Springer, Cham, 2014. Vol. 8583. DOI: https://doi.org/10.1007/978-3-319-09156-3_47
Published
2018-03-30
How to Cite
ГАРАЕВА, Анастасия Алексеевна; КАБИРОВ, Айрат Дмитриевич; ТИХОНОВА, Ольга Викторовна. BENCHMARKING BIG SPATIAL DATA PROCESSING FRAMEWORKS. Modern Information Technologies and IT-Education, [S.l.], v. 14, n. 1, p. 126-137, mar. 2018. ISSN 2411-1473. Available at: <http://sitito.cs.msu.ru/index.php/SITITO/article/view/365>. Date accessed: 15 sep. 2025. doi: https://doi.org/10.25559/SITITO.14.201801.126-137.