Instant Selection of High Contrast Projections in Multi-dimensional Data Streams
-
Autor:
Andrei Vanea, Emmanuel Müller, Fabian Keller, Klemens Böhm
-
Quelle:
Proceedings of the Workshop on Instant Interactive Data Mining (IID 2012) in conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), Bristol, UK
-
In many of today's applications we have to cope with multi-dimensional data streams, with some irrelevant dimensions that hinder stream mining. Considering all of the concurrent measurements leads to scattered distributions, while knowledge is hidden in some dependent dimensions. Usually, this dependence of dimensions changes over time and poses a major open challenge to stream mining.
In this work, we focus on dependent dimensions showing a high contrast between outliers and clustered objects. We present HCP-StreamMiner, a method for selecting high contrast projections in multi-dimensional streams. Our algorithm computes a ranked list of high contrast projections that is incrementally updated. Our quality measure (the contrast) of each projection is statistically determined by comparing the dependence of dimension sets and their marginal distribution. We propose a technique for computing the score out of stream data summaries and an update procedure that addresses the change of dependence over time.