Migrating Gis Big Data Computing From Hadoop To Spark: An Exemplary Study Using Twitter

Keywords

Centrographic analysis; GIS; Hadoop; KNN; Social media; Spark

Abstract

Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. To effectively process massive datasets, several platforms have been developed. Our previous study [20] explored Hadoop-based cloud computing for processing big amount of social media data [9] to study geographic distributions of social media users. In this paper, we investigate an emerging system named Spark and present a timely pilot experience on geospatial big data research. In our study, Spark has been utilized to perform some classic geospatial analyses like K-Nearest Neighbors (KNN), geographic mean and median points, and the distribution of the median points. Our design is tested on an Amazon EC2 cluster. An exemplary study using 60GB, 120GB and 180GB Twitter data has demonstrated the performance achievements by migrating computing tasks from Hadoop to Spark. In our experiments, the Spark-based solution can be up to 2.3x faster than the Hadoop-based solution due to its in-memory processing and coarse-grained resource allocation strategy. In the paper, we also discuss optimization strategies on using Spark for different geospatial computing tasks.

Publication Date

1-17-2017

Publication Title

IEEE International Conference on Cloud Computing, CLOUD

Number of Pages

351-358

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/CLOUD.2016.52

Socpus ID

85014263310 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85014263310

This document is currently not available here.

Share

COinS