GOutRank

Amazon Benchmark

This dataset is a subgraph of the Amazon co-purchase Amazon Network [1] . We filtered products from the Disney category and we extend the product information with product prices extracted from the Amazon website on March 2012. A list of the attributes and their description can be found here.

Number of nodes	124
Number of edges	334
Average clustering coefficient	0.437
Dimensions	30

User Experiment

Pre-Configuration:

First, we clustered the Amazon subgraph with a modularity based technique [2]. Thus, we have ensured that students do not label global outliers (e.g., product with the highest price of the database). but they label contextual outliers inside graph clusters. (The file with the obtained graph clusters can be found here)
We choose those attributes which are available on the Amazon website (not aggregated attributes) , and we translate them into German in order to provide a product description to the students.

Experiment Sequence:

For each graph cluster (in total 8 clusters) :

a. All products in a graph cluster were shown to the students in the browser.

            b. Each group of two students had to choose 1 or 2 products that they considered unexpected, rare or suspicious in the group.
            c. For each labeled product, the two students had to fill out a form with the reasons and the attributes they considered rare.

                                                        Form

Results: All results of the experiment with the German comments of the students can be found here.

Experimental Evaluation

For the evaluation, we selected those products, where at least 50% of the students considered the products as outliers inside the group (in total 6 outliers were selected).

Evaluation with traditional techniques on relational data only

First, we evaluate the results with traditional outlier detection techniques provided by the SOREX toolkit. We used the relational dataset, without the information of the graph structure. For the evaluation of the detection quality we use the true file as outlier labels. The configurations of the algorithms used can be found in the following table:

Algorithm	Parameters
LOF	weka.outlier.LocalOutlierFactor -L 20 -U 20
SOF	weka.outlier.AggraWal -S 20 -P 10
RPLOF	weka.outlier.RPLOF -L 20 -U 20 -X 1

Evaluation with other competitors considering the graph structure

Finally, we also considered the dataset with the graph structure in order to evalute our approach and other competitors. The settings are shown in the following table:

Algorithm	Parameter
SCAN	µ= 2,ε=0.5
CODA	k=8, r = 6/124 , λ=0.1

Evaluation considering different clusterings

We evaluate our scoring functions using the results provided by graph subspace clustering techniques. The implementation of all these algorithms can be found here. The properties fles for the execution of each of the algorithms are shown in the following table:

Algorithm	Configuration File
GAMer	Config File
extension of Cocain	Config File
CoPaM	Config File

[1] J. Leskovec, L. Adamic and B. Adamic. The Dynamics of Viral Marketing. ACM Transactions on the Web (ACM TWEB), 1(1), 2007.
[2] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000

Access to this website

After publication of this work in April 2013, we encourage researchers in this area to use the proposed algorithm for their own publications as competitor. Our implementation will then be available for anyone to use. Thus, all algorithms, data sets and parameter setting will be available for the community.