Statistics Management in Databases (Seminar 'Informationssysteme' gemäß Modulhandbuch)

  • Typ: Seminar
  • Ort: nach Vereinbarung
  • Zeit:

    nach Vereinbarung

  • Beginn: 25.10.2007
  • Dozent:

    Prof. K. Böhm,
    A. Khachatryan,
    G. Sautter

  • SWS: 2
  • ECTS: 4
  • Prüfung:

    nicht prüfbar

  • Hinweis:

    Vorbesprechung findet am 25.10.07, 13-14 Uhr im Seminarraum 348, Gebäude 50.34 statt.

    Das Seminar findet in englischer Sprache statt.

Modern database systems deal with enormous amount of data. With loads on DBMS increasing faster than the hardware advance can cover, the database optimizers need to become more and more efficient. One of the hotspots of optimization is sophisticated planning of query execution. To choose the best query plan among numerous possible alternatives, the optimizer has to accurately estimate the selectivity of distinct predicates and sub-queries.
For this purpose, the DBMS stores cumulative and compressed information about attribute value distributions in system catalogues, the so-called data dictionary. While selectivity estimates have to be as precise as possible, the statistics data they are computed from has to be as concise as possible. The latter is for two reasons: First, the extra data must not overload the DBMS. Second, the lookups in the statistics have to be very fast because otherwise the lookup time would annihilate the performance gain that comes from having selectiviy estimates.
Our seminar is about how this statistical meta-information is created, maintained, stored, and used to perform the non-trivial task of selectivity estimation. It is divided into two main subsections: selectivity estimation in context of text data, and selectivity estimation in context of numeric data. This is because the data from these two domains are highly different in the way they can be handled, and the seminar covers the algorithms and data structures for both of them.