Sensitivity of Self-Tuning Histograms: Query Order Affecting Accuracy and Robustness

  • Autor:

    Andranik Khachatryan, Emmanuel Müller, Christian Stier, and Klemens Böhm


  • Quelle: Proceedings of the 24th International Conference on Scientific and Statistical Database Management (SSDBM 2012), Chania, Crete, Greece
  • In scientific databases, the amount and the complexity of data calls for data summarization techniques. Such summaries are used to assist fast approximate query answering or query optimization. Histograms are a prominent class of model-free data summaries and are widely used in database systems.

    So-called self-tuning histograms look at query-execution results to refine themselves. An assumption with such histograms is that they can learn the dataset from scratch. We show that this is not the case and highlight a major challenge that stems from this. Traditional self-tuning is overly sensitive to the order of queries, and reaches only local optima with high estimation errors. We show that a self-tuning method can be improved significantly if it starts with a carefully chosen initial configuration. We propose initialization by subspace clusters in projections of the data. This improves both accuracy and robustness of self-tuning histograms.