multidimensional similarity join, E-k-d-B tree, parallel algorithms
Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other: The 6-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions in-creases compared to previous data structures. We present a cost model of the E-k-d-B tree and use it to optimize the leaf size. We present novel parallel algorithms for the similarity join using the E-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Further; its cost is proportional to the overall cost of the similarity join.
Alsabti, Khaled; Ranka, Sanjay; and Singh, Vineet, "An efficient parallel algorithm for high dimensional similarity join" (1998). Electrical Engineering and Computer Science. Paper 110.