Efficient Semantic Similarity Search over Spatio-textual Data


George S. Theodoropoulos, Kjetil Nørvåg and Christos Doulkeridis


In this paper, we address the problem of semantic similarity search over spatio-textual data. In contrast with most existing works on spatial-keyword search that rely on the exact matching of query keywords to textual descriptions, we focus on semantic textual similarity using word embeddings, which have been shown to capture semantic similarity exceptionally well in practice. To support efficient 𝑘-nearest neighbour (𝑘-NN) search over a weighted combination of spatial and semantic dimensions, we propose a novel indexing approach (called CSSI) that ensures the correctness of results, alongside its approximate variant (called CSSIA)that introduces a small amount of error in exchange for improved performance. Both variants are based on a hybrid clustering scheme that jointly indexes the spatial and textual/semantic information, achieving high pruning percentages and improved performance and scalability.

Scientific Publications
Open Proceedings