Abstract
Big spatio-textual data are prevalent in modern applications, where spatial objects are associated with textual descriptions. For querying spatio-textual data, spatial-keyword queries have been proposed, which entail challenges mainly because of the combination of spatial and textual dimensions. Furthermore, scalable processing is a key challenge, due to the immense volume of the underlying data. In this paper, we address the problem of parallel processing of spatial-keyword range queries, which retrieve all spatio-textual objects within a user-specified distance from a query location and having a textual description of sufficient similarity with the query keywords. Our approach relies on a mapping scheme that maps spatio-textual objects to a 2D space, thus creating compact data partitions. In turn, we can exploit these partitions in order to effectively distribute the mapped data to worker nodes and parallelize processing. Our implementation is in Apache Spark and it is shown to outperform both two baseline solutions as well as two state-of-the-art systems for processing big spatial data.