Adds text similarity task to the NLP docs (elastic#2223)

szabosteve · web-flow · commit fad0f34444db · 2022-09-01T09:11:55.000+02:00
* [DOCS] Adjusts line breaks.

* Adds text similarity task to the NLP docs.

* Addresses feedback.
diff --git a/docs/en/stack/ml/nlp/ml-nlp-classify-text.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-classify-text.asciidoc
@@ -21,13 +21,12 @@ include::ml-nlp-lang-ident.asciidoc[]
 [[ml-nlp-text-classification]]
 == Text classification
 
-Text classification assigns the input text to one of multiple classes
- that best describe the text. The classes used depend 
-on the model and the data set that was used to train it. Based on the
-number of classes, two main types of classification exist: binary
-classification, where the number of classes is exactly two, and
-multi-class classification, where the number of classes is more than
-two.
+Text classification assigns the input text to one of multiple classes that best 
+describe the text. The classes used depend on the model and the data set that 
+was used to train it. Based on the number of classes, two main types of 
+classification exist: binary classification, where the number of classes is 
+exactly two, and multi-class classification, where the number of classes is more 
+than two.
 
 This task can help you analyze text for markers of positive or negative 
 sentiment or classify text into various topics. For example, you might use a 
diff --git a/docs/en/stack/ml/nlp/ml-nlp-search-compare.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-search-compare.asciidoc
@@ -6,6 +6,8 @@
 The {stack-ml-features} can generate embeddings, which you can use to search in 
 unstructured text or compare different pieces of text.
 
+* <<ml-nlp-text-embedding>>
+* <<ml-nlp-text-similarity>>
 
 [discrete]
 [[ml-nlp-text-embedding]]
@@ -48,4 +50,48 @@ The task returns the following result:
 }
 ...
 ----------------------------------
+// NOTCONSOLE
+
+
+[discrete]
+[[ml-nlp-text-similarity]]
+== Text similarity
+
+The text similarity task estimates how similar two pieces of text are to each 
+other and expresses the similarity in a numeric value. This is commonly referred 
+to as cross-encoding. This task is useful for ranking document text when 
+comparing it to another provided text input.
+
+You can provide multiple strings of text to compare to another text input 
+sequence. Each string is compared to the given text sequence at inference time 
+and a prediction of similarity is calculated for every string of text.
+
+[source,js]
+----------------------------------
+{
+  "docs":[{ "text_field": "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."}, {"text_field": "New York City is famous for the Metropolitan Museum of Art."}],
+  "inference_config": {
+    "text_similarity": {
+      "text": "How many people live in Berlin?"
+    }
+  }
+}
+----------------------------------
+// NOTCONSOLE
+
+In the example above, every string in the `docs` array is compared individually 
+to the text provided in the `text_similarity`.`text` field and a predicted 
+similarity is calculated for both as the API response shows:
+
+[source,js]
+----------------------------------
+...
+{
+    "predicted_value": 7.235751628875732
+},
+{
+    "predicted_value": -11.562295913696289
+}
+...
+----------------------------------
 // NOTCONSOLE