Efficient Interval-focused Similarity Search under Dynamic Time Warping

Abstract

Similarity search on time series from large temporal text corpora is interesting in many settings. Our use case is the Google Books Ngram corpus and historians interested in the changes of word frequencies over time. More specifically, users are interested in similarity search in a specific period of time, aka. interval-focused similarity search. Related work formalizes interval-focused similarity search, but the sparsely existing approaches are limited to metric distance measures, like the Euclidean distance. Most other approaches in this area, that address the usage of warping distance measures, focus on whole matching similarity search. In this work, we present a novel search tree that uses so-called time series envelopes to group objects. To speed up the tree traversal, our search tree approximates the envelopes based on the node height, i.e., envelopes are tighter further down in the tree. We combine this with various time series pruning techniques, mainly to reduce the number of expensive distance computations. Our experimental evaluation shows that this combination is worthwhile and indeed decisive for a significant speedup, compared to less sophisticated adaptations of known approaches. We, first, show that a combination of both pruning groups of time series and single time series outperforms the usage of a single pruning technique. Secondly, we compare the wall-clock run times of our data structure to existing approaches and determine a significant speed up for focused-interval similarity search queries on large temporal data sets, like the Google Books Ngram corpus.

Download pdf

© Jens Willkomm, Janek Bettinger, Martin Schäler, and Klemens Böhm 2019.

This is the author's version of the work. It is posted here for your personal use.
Not for redistribution. The definitive version was published in the 16th
International Symposium on Spatial and Temporal Databases (SSTD ’19), August
19–21, 2019, Vienna, Austria, https://doi.org/10.1145/3340964.3340969.

Note

In our paper, we use min and max to create the piecewise aggregate approximation (PAA) envelope (see Figure 2).
Due to the proof of Zhu and Shasha [1], using mean is also valid but creates a tighter lower bound (cf. Figure 5 in [1]).

Thanks to Eamon Keogh for suggesting this improvement.

[1] Yunyue Zhu and Dennis Shasha. "Warping indexes with envelope transforms for query by humming." Proceedings of the 2003 ACM SIGMOD international conference on Management of data. 2003.

Citation

Cite this paper as:

ACM Reference Format:
Jens Willkomm, Janek Bettinger, Martin Schäler, and Klemens Böhm. 2019. Efficient Interval-focused Similarity Search under Dynamic Time Warping. In 16th International Symposium on Spatial and Temporal Databases (SSTD ’19), August 19–21, 2019, Vienna, Austria. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3340964.3340969

Bibtex:
@inproceedings{Willkomm:2019:EIS:3340964.3340969,
     author = {Jens Willkomm and Janek Bettinger and Martin Schäler and Klemens Böhm},
     title = {Efficient Interval-focused Similarity Search under Dynamic Time Warping},
     booktitle = {Proceedings of the 16th International Symposium on Spatial and Temporal Databases},
     series = {SSTD '19},
     year = {2019},
     isbn = {978-1-4503-6280-1},
     location = {Vienna, Austria},
     pages = {130--139},
     numpages = {10},
     url = {http://doi.acm.org/10.1145/3340964.3340969},
     doi = {10.1145/3340964.3340969},
     acmid = {3340969},
     publisher = {ACM},
     address = {New York, NY, USA}
}