Data Science 2

  • type: Vorlesung (V)
  • chair: KIT-Fakultät für Informatik - Institut für Programmstrukturen und Datenorganisation - IPD Böhm
  • semester: SS 2022
  • place:

    Room -102 (-1. floor)
    50.34 INFORMATIK, Kollegiengebäude am Fasanengarten



  • time:

    Tuesday ,15:45 - 17:15, weekly


  • sws: 2
  • lv-no.: 2400042
  • information:

    Presence - The lecture will also be recorded and streamed.

Recording of the lecture:

The lecture will be recorded and streamed.
Access to the lecture recording is via the stream player of ATIS. A download link and more information about the player can be found here.


This lecture replaces the lecture "Analysis Techniques for Big Data 2". We want to give more attention to the Data Science process and explicitly cover the steps of this process. - Data Science techniques are attracting a lot of interest from users, especially for big data analytics. The spectrum is broad and includes traditional industries such as banking and insurance, newer players, especially Internet companies or operators of novel information services and social media, and natural sciences and engineering. In all cases, there is a desire to keep track of very large, sometimes distributed datasets, to extract interesting correlations from the dataset with as little effort as possible, and to be able to systematically compare expected system behavior with actual behavior. This lecture deals with both the preparation of data as a prerequisite for fast and powerful analysis and with modern techniques for the analysis itself. The course emphasizes phenomena and techniques not considered in the Data Science 1 lecture; these are approaches to data streams, special features of high-dimensional data sets, indexing of data sets with information integration and data warehousing methods, and compression and sampling of large data sets.


By the end of the course, students should have a good understanding of the need for advanced concepts in "Data Science" and be able to explain them. They should be able to evaluate and compare a wide variety of advanced approaches to managing and analyzing large data sets and data streams in terms of their effectiveness and applicability. The participants should understand which problems are currently open in the field of data analysis and have gained a broad and deep insight into the state of the art in this area.

Lecture Language:

The lecture will be held mainly in English. Questions can of course also be asked in German.