Distributed data storage

Recording of the lecture

Lecture recordings can be accessed via the ATIS stream player.

Abstract

Distribution is of fundamental importance in modern information systems. Centralized, monolithic database architectures may instead soon be a thing of the past in many scenarios. However, there are many fundamental problems associated with distributed data storage that have not yet been solved, or for which existing solutions do not satisfy us. It is true that there are a large number of products claiming to support distributed data storage. However, the solutions implemented there are not always really good, the application programmer has to solve a large part of the problem himself, or it may happen that an elegant solution, sound from a theoretical point of view, leads to unsatisfactory runtime behavior. (So you should not only attend this lecture if you are enthusiastic about fundamental problems of distributed data management. Also if you are particularly interested in practical usability and applications, these topics are important for you). The goal of this lecture is to introduce you to the theory of distributed data management and to introduce you to appropriate algorithms and methods. We will cover, among other things, the correct and fault-tolerant concurrent execution of transactions in distributed environments, both 'classical' solutions and very recent developments, the evaluation of distributed queries and 'Internet queries', i.e. queries that access information offerings on the Web and embed calls to Web services, data storage in distributed, coordinator-free environments ('peer-to-peer data storage'), and modern techniques for distributed caching and for dealing with replication.

References