Skip to content

Commit 72bd1e3

Browse files
[ML] Add suitable scoring functions for Text embedding models (elastic#2183)
Co-authored-by: István Zoltán Szabó <[email protected]>
1 parent cf34fc2 commit 72bd1e3

File tree

1 file changed

+30
-4
lines changed

1 file changed

+30
-4
lines changed

docs/en/stack/ml/nlp/ml-nlp-model-ref.asciidoc

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -66,19 +66,45 @@ refer to <<ml-nlp-overview>>.
6666
[[ml-nlp-model-ref-text-embedding]]
6767
== Third party text embedding models
6868

69+
Text Embedding models are designed to work with specific scoring functions
70+
for calculating the similarity between the embeddings they produce.
71+
Examples of typical scoring functions are: `cosine`, `dot product` and
72+
`euclidean distance` (also known as `l2_norm`).
73+
74+
The embeddings produced by these models should be indexed in {es} using the
75+
{ref}/dense-vector.html[dense vector field type]
76+
with an appropriate {ref}/dense-vector.html#dense-vector-params[similarity function]
77+
chosen for the model.
78+
79+
To find similar embeddings in {es} use the efficient
80+
{ref}/knn-search.html#approximate-knn[Approximate k-nearest neighbor (kNN)]
81+
search API with a text embedding as the query vector. Approximate
82+
kNN search uses the similarity function defined in
83+
the dense vector field mapping is used to calculate the relevance.
84+
For the best results the function must be one of
85+
the suitable similarity functions for the model.
86+
87+
6988
Using `SentenceTransformerWrapper`:
7089

7190
* https://huggingface.co/sentence-transformers/all-distilroberta-v1[All DistilRoBERTa v1]
91+
Suitable similarity functions: `dot_product`, `cosine`, `l2_norm`
7292
* https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2[All MiniLM L12 v2]
93+
Suitable similarity functions: `dot_product`, `cosine`, `l2_norm`
7394
* https://huggingface.co/sentence-transformers/all-mpnet-base-v2[All MPNet base v2]
74-
* https://huggingface.co/sentence-transformers/bert-base-nli-cls-token[BERT base nli cls token]
95+
Suitable similarity functions: `dot_product`, `cosine`, `l2_norm`
7596
* https://huggingface.co/sentence-transformers/facebook-dpr-ctx_encoder-multiset-base[Facebook dpr-ctx_encoder multiset base]
97+
Suitable similarity functions: `dot_product`
7698
* https://huggingface.co/sentence-transformers/facebook-dpr-question_encoder-single-nq-base[Facebook dpr-question_encoder single nq base]
99+
Suitable similarity functions: `dot_product`
77100
* https://huggingface.co/sentence-transformers/LaBSE[LaBSE]
101+
Suitable similarity functions: `cosine`
78102
* https://huggingface.co/sentence-transformers/msmarco-distilbert-base-tas-b[msmarco DistilBERT base tas b]
79-
* https://huggingface.co/sentence-transformers/msmarco-MiniLM-L-12-v3[msmarco MiniLM L12 v3]
80-
* https://huggingface.co/sentence-transformers/nli-bert-base-cls-pooling[nli BERT base cls pooling]
81-
* https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2[paraphrase mpnet base v2]
103+
Suitable similarity functions: `dot_product`
104+
* https://huggingface.co/sentence-transformers/msmarco-MiniLM-L12-cos-v5[msmarco MiniLM L12 v5]
105+
Suitable similarity functions: `dot_product`, `cosine`, `l2_norm`
106+
* https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2[paraphrase mpnet base v2]
107+
Suitable similarity functions: `cosine`
82108

83109
Using `DPREncoderWrapper`:
84110

0 commit comments

Comments
 (0)