@@ -66,19 +66,45 @@ refer to <<ml-nlp-overview>>.
66
66
[[ml-nlp-model-ref-text-embedding]]
67
67
== Third party text embedding models
68
68
69
+ Text Embedding models are designed to work with specific scoring functions
70
+ for calculating the similarity between the embeddings they produce.
71
+ Examples of typical scoring functions are: `cosine`, `dot product` and
72
+ `euclidean distance` (also known as `l2_norm`).
73
+
74
+ The embeddings produced by these models should be indexed in {es} using the
75
+ {ref}/dense-vector.html[dense vector field type]
76
+ with an appropriate {ref}/dense-vector.html#dense-vector-params[similarity function]
77
+ chosen for the model.
78
+
79
+ To find similar embeddings in {es} use the efficient
80
+ {ref}/knn-search.html#approximate-knn[Approximate k-nearest neighbor (kNN)]
81
+ search API with a text embedding as the query vector. Approximate
82
+ kNN search uses the similarity function defined in
83
+ the dense vector field mapping is used to calculate the relevance.
84
+ For the best results the function must be one of
85
+ the suitable similarity functions for the model.
86
+
87
+
69
88
Using `SentenceTransformerWrapper`:
70
89
71
90
* https://huggingface.co/sentence-transformers/all-distilroberta-v1[All DistilRoBERTa v1]
91
+ Suitable similarity functions: `dot_product`, `cosine`, `l2_norm`
72
92
* https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2[All MiniLM L12 v2]
93
+ Suitable similarity functions: `dot_product`, `cosine`, `l2_norm`
73
94
* https://huggingface.co/sentence-transformers/all-mpnet-base-v2[All MPNet base v2]
74
- * https://huggingface.co/sentence-transformers/bert-base-nli-cls-token[BERT base nli cls token]
95
+ Suitable similarity functions: `dot_product`, `cosine`, `l2_norm`
75
96
* https://huggingface.co/sentence-transformers/facebook-dpr-ctx_encoder-multiset-base[Facebook dpr-ctx_encoder multiset base]
97
+ Suitable similarity functions: `dot_product`
76
98
* https://huggingface.co/sentence-transformers/facebook-dpr-question_encoder-single-nq-base[Facebook dpr-question_encoder single nq base]
99
+ Suitable similarity functions: `dot_product`
77
100
* https://huggingface.co/sentence-transformers/LaBSE[LaBSE]
101
+ Suitable similarity functions: `cosine`
78
102
* https://huggingface.co/sentence-transformers/msmarco-distilbert-base-tas-b[msmarco DistilBERT base tas b]
79
- * https://huggingface.co/sentence-transformers/msmarco-MiniLM-L-12-v3[msmarco MiniLM L12 v3]
80
- * https://huggingface.co/sentence-transformers/nli-bert-base-cls-pooling[nli BERT base cls pooling]
81
- * https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2[paraphrase mpnet base v2]
103
+ Suitable similarity functions: `dot_product`
104
+ * https://huggingface.co/sentence-transformers/msmarco-MiniLM-L12-cos-v5[msmarco MiniLM L12 v5]
105
+ Suitable similarity functions: `dot_product`, `cosine`, `l2_norm`
106
+ * https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2[paraphrase mpnet base v2]
107
+ Suitable similarity functions: `cosine`
82
108
83
109
Using `DPREncoderWrapper`:
84
110
0 commit comments