A Query Algebra for Temporal Text Corpora

Abstract

Researching the evolution of the concepts represented by words, like “peace” or “freedom”, named conceptual history, is an important discipline in the humanities, but still a laborious task. It normally consists of reading and interpreting a large number of carefully selected texts, without however always having a comprehensive knowledge of all the potentially relevant material. Thus, our objective is to design a query algebra to access temporal text corpora. It shall comprehensively allow domain experts to formalize hypotheses on how concepts manifest in large-scale digital text corpora targeting at the complete works of Reinhart Koselleck, a highly prominent researcher in conceptual history. In cooperation with domain experts, we first determine the primary information types used in conceptual history, such as word usage frequency or sentiment. Based on this, we define database operators formalizing these types, which can be combined to formulate arbitrarily complex queries representing hypotheses. The result is a novel query algebra that enables researchers in conceptual history to access large text corpora and extensively analyze word behaviors over time in a comprehensive way. In a proof of concept, we demonstrate how to use our algebra resulting in the first novel insights. This proves the suitability of our algebra.

Download pdf

© Jens Willkomm, Christoph Schmidt-Petri, Martin Schäler, Michael Schefczyk, and Klemens Böhm 2018.

This is the author's version of the work. It is posted here for your personal use.
Not for redistribution. The definitive version was published in the
18th ACM/IEEE Joint Conference on Digital Libraries (JCDL ’18),
June 3–7, 2018, Fort Worth, TX, USA, https://doi.org/10.1145/3197026.3197044.

Citation

Cite this paper as:

ACM Reference Format:
Jens Willkomm, Christoph Schmidt-Petri, Martin Schäler, Michael Schefczyk, and Klemens Böhm. 2018. A Query Algebra for Temporal Text Corpora. In JCDL ’18: The 18th ACM/IEEE Joint Conference on Digital Libraries, June 3–7, 2018, Fort Worth, TX, USA. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3197026.3197044

Bibtex:
@inproceedings{Willkomm:2018:QAT:3197026.3197044,
    author = {Jens Willkomm and Christoph Schmidt-Petri and Martin Schäler and Michael Schefczyk and Klemens Böhm},
    title = {A Query Algebra for Temporal Text Corpora},
    booktitle = {Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries},
    series = {JCDL '18},
    year = {2018},
    isbn = {978-1-4503-5178-2},
    location = {Fort Worth, Texas, USA},
    pages = {183--192},
    numpages = {10},
    url = {http://doi.acm.org/10.1145/3197026.3197044},
    doi = {10.1145/3197026.3197044},
    acmid = {3197044},
    publisher = {ACM},
    address = {New York, NY, USA}
}