Document Type

Article

Date

1998

Keywords

multidimensional similarity join, E-k-d-B tree, parallel algorithms

Language

English

Disciplines

Computer Sciences

Description/Abstract

Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other: The 6-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions in-creases compared to previous data structures. We present a cost model of the E-k-d-B tree and use it to optimize the leaf size. We present novel parallel algorithms for the similarity join using the E-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Further; its cost is proportional to the overall cost of the similarity join.

Share

COinS