algorithm, quantile estimates, disk-resident data
The '-quantile of an ordered sequence of data values is the element with rank ' \Theta n, where n is the total number of values. Accurate estimates of quantiles are required for the solution of many practical applications. In this paper, we present a new algorithm for estimating the quantile values for disk-resident data. Our algorithm has the following characteristics: (1) It requires only one pass over the data; (2) It is deterministic; (3) It produces good lower and upper bounds of the true values of the quantiles; (4) It requires no a priori knowledge of the distribution of the data set; (5) It has a scalable parallel formulation; (6) Extra time and memory for computing additional quantiles (beyond the first one) is constant per quantile. We present experimental results on the IBM SP-2. The experimental results show that the algorithm is indeed robust and does not depend on the distribution of the data sets.
Alsabti, Khaled; Ranka, Sanjay; and Singh, Vineet, "A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data" (1997). College of Engineering and Computer Science - Former Departments, Centers, Institutes and Projects. Paper 4.