Towards More Meaningful Notions of Similarity in NLP Embedding Models
International Journal on Digital Libraries (IJDL)
This is the supplementary material for the article "Towards More Meaningful Notions of Similarity in NLP Embedding Models".
Abstract Finding similar words with the help of word embedding models, such as Google’s Word2Vec or Glove, computed on large-scale digital libraries has yielded meaningful results in many cases. However, the underlying notion of similarity has remained ambiguous. In this paper, we examine when exactly similarity values in word embedding models are meaningful. To do so, we analyze the statistical distribution of similarity values systematically, conducting two series of experiments. The first one examines how the distribution of similarity values depends on the different embedding model algorithms and parameters. The second one starts by showing that intuitive similarity thresholds do not exist. We then propose a method stating which similarity values and thresholds actually are meaningful for a given embedding model. Based on these results, we calculate how these thresholds, when taken into account during evaluation, change the evaluation scores of the models in similarity test sets. In more abstract terms, our insights give way to a better understanding of the notion of similarity in embedding models and to more reliable evaluations of such models.
Here we provide all the embedding models we have trained for the publication. They are grouped the same as in the paper.
- Learning algorithm evaluation [4.4 GB]
- Dimensionality parameter evaluation [2.4 GB]
- Dictionary size parameter evaluation [1.8 GB]
- Corpus size parameter evaluation [4.5 GB]
- Optimization function parameter evaluation [0.4 GB]
- Iteration number parameter evaluation [0.7 GB]
- Advanced CBOW-like embedding model evaluation [12 GB]
- Advanced SG-like embedding model evaluation: [13 GB]
We provide the Python scripts of our experiments here. It contains the following scripts:
- Example on how to load a model and compute similarities
- Similarity value distribution calculation
- Similarity threshold calculation
- Similiarity threshold-aware evaluation