Home | english  | Impressum | Datenschutz | Sitemap | KIT

Discovering Structure in Very Large Databases

Discovering Structure in Very Large Databases
Typ: Seminar (S)
Semester: WS 14/15

Die Vorbesprechung findet am Dienstag den 21.10.2014 von 14.30 Uhr bis 17.30 Uhr im Raum 348 statt.

50.34 Informatik, Kollegiengebäude am Fasanengarten 

Dozent: Prof.Dr.Ing. Klemens Böhm
Hoang Vu Nguyen
SWS: 2
LVNr.: 2400020

Anmeldung im Sekretariat Prof. Böhm, Geb. 50.34, Raum 367

Die Beschreibung des Seminars ist nur auf englisch erhältlich.
Jeder Teilnehmer kann aber wählen, ob er seine Ausarbeitung auf deutsch oder englisch macht.


Graph analysis is an important area of data mining research, with wide applications in social network mining, community detection, and online advertising campaigns, to name a few. One of the important issues in graph analysis is to measure the similarity among two given (sub)graphs for, say, graph matching in general as well as to compare different communities in online social networks. The latter in turn can be exploited in many ways, such as launching online advertisements of new products.

Discovering structures in very large databases encompasses many research problems; among which are detecting correlation among database columns, schema extraction, and schema matching. Schema matching in particular is concerned with matching two database schemas, given the relationships among their columns. It is a prerequisite for data analytics dispersed over different, possibly large heterogeneous databases, e.g., Facebook and Amazon co-purchase databases.

As we will demonstrate in the pro-seminar/seminar, graph analysis can be used to discover structure in very large databases. For instance, one of the promising approaches to schema matching thus far is to represent each database schema by a graph of columns, and schema matching boils down to matching two column graphs.


This seminar aims at providing students the knowledge on:

(a) graph similarity analysis and database structure discovery,

(b) the interactions between the two fields, in particular, how graph analysis provides new solutions to database structure discovery, and how database structure discovery provides new challenges for graph analysis to address, and

(c) the applicability of both fields to the more general problem of how to compare different patterns (e.g., the purchase pattern of PS4 and that of Xbox in Germany) as well as different structures (e.g., the friendship circle of KIT students and that of MIT students). The knowledge obtained from this seminar is beneficial for both students who want to pursue a career in industry, and those who want to delve further into computer science research.