Modern databases have to cope with multi-dimensional queries. For efficient processing of these queries, query optimization relies on multi-dimensional selectivity estimation techniques. These techniques in turn typically rely on histograms. A core challenge of histogram construction is the detection of regions with a density higher than the ones of their surroundings. In this paper, we show that subspace clustering algorithms, which detect such regions, can be used to build high quality histograms in multi-dimensional spaces. The clusters are transformed into a memory-efficient histogram representation, while preserving most of the information for the selectivity estimation. We derive a formal criterion for our transformation of clusters into buckets that minimizes the introduced estimation error. In practice, finding optimal buckets is hard, so we propose a heuristic. Our experiments show that our approach is efficient in terms of both runtime and memory usage. Overall, we demonstrate that subspace clustering enables multi-dimensional selectivity estimation with low estimation errors.
Efficient Selectivity Estimation by Histogram Construction based on Subspace Clustering
|Autor:||Andranik Khachatryan, Emmanuel Müller, and Klemens Böhm|
|Quelle:||Proceedings of the 23nd International Conference on Scientific and Statistical Database Management (SSDBM 2011), Portland, Oregon, USA|