Ranking Outlier Nodes in Subspaces of Attributed Graphs
-
Author:
Emmanuel Müller, Patricia Iglesias, Yvonne Mülle, Klemens Böhm
-
Source:
Proc. 4th International Workshop on Graph Data Management: Techniques and Applications (GDM 2013) in conjunction with IEEE 29th International Conference on Data Engineering (ICDE 2013), Brisbane, Australia (2013)
-
Outlier analysis is an important data mining task that aims to detect unexpected, rare, and suspicious objects. Outlier ranking enables enhanced outlier exploration, which assists the user-driven outlier analysis. It overcomes the binary detection of outliers vs. regular objects, which is not adequate for many applications. Traditional outlier ranking techniques focus on either vector data or on graph structures. However, many of today’s databases store both, multi dimensional numeric information and relations between objects in attributed graphs. An open challenge is how outlier ranking should cope with these different data types in a unified fashion.
In this work, we propose a first approach for outlier ranking in subspaces of attributed graphs. We rank graph nodes according to their degree of deviation in both graph and attribute properties. We describe novel challenges induced by this combination of data types and propose subspace analysis as important method for outlier ranking on attributed graphs. Subspace clustering provides a selected subset of nodes and its relevant attributes in which deviation of nodes can be observed. Our graph outlier ranking (GOutRank) introduces scoring functions based on these selected subgraphs and subspaces. In addition to this technical contribution, we provide an attributed graph extracted from the Amazon marketplace. It includes a ground truth of real outliers labeled in a user experiment. In order to enable sustainable and comparable research results, we publish this database on our website1 as benchmark for future publications. Our experiments on this graph demonstrate the potential and the capabilities of outlier ranking in subspaces of attributed graphs.