Migrating Gis Big Data Computing From Hadoop To Spark: An Exemplary Study Using Twitter
Keywords
Centrographic analysis; GIS; Hadoop; KNN; Social media; Spark
Abstract
Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. To effectively process massive datasets, several platforms have been developed. Our previous study [20] explored Hadoop-based cloud computing for processing big amount of social media data [9] to study geographic distributions of social media users. In this paper, we investigate an emerging system named Spark and present a timely pilot experience on geospatial big data research. In our study, Spark has been utilized to perform some classic geospatial analyses like K-Nearest Neighbors (KNN), geographic mean and median points, and the distribution of the median points. Our design is tested on an Amazon EC2 cluster. An exemplary study using 60GB, 120GB and 180GB Twitter data has demonstrated the performance achievements by migrating computing tasks from Hadoop to Spark. In our experiments, the Spark-based solution can be up to 2.3x faster than the Hadoop-based solution due to its in-memory processing and coarse-grained resource allocation strategy. In the paper, we also discuss optimization strategies on using Spark for different geospatial computing tasks.
Publication Date
1-17-2017
Publication Title
IEEE International Conference on Cloud Computing, CLOUD
Number of Pages
351-358
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/CLOUD.2016.52
Copyright Status
Unknown
Socpus ID
85014263310 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85014263310
STARS Citation
Sun, Zhibo; Zhang, Hong; Liu, Zixia; Xu, Chen; and Wang, Liqiang, "Migrating Gis Big Data Computing From Hadoop To Spark: An Exemplary Study Using Twitter" (2017). Scopus Export 2015-2019. 7165.
https://stars.library.ucf.edu/scopus2015/7165