In scientific databases, the amount and the complexity of data calls for data summarization techniques. Such summaries are used to assist fast approximate query answering or query optimization. Histograms are a prominent class of model-free data summaries and are widely used in database systems.
So-called self-tuning histograms look at query-execution results to refine themselves. An assumption with such histograms is that they can learn the dataset from scratch. We show that this is not the case and highlight a major challenge that stems from this. Traditional self-tuning is overly sensitive to the order of queries, and reaches only local optima with high estimation errors. We show that a self-tuning method can be improved significantly if it starts with a carefully chosen initial configuration. We propose initialization by subspace clusters in projections of the data. This improves both accuracy and robustness of self-tuning histograms.