Skip to content

Commit fad0f34

Browse files
authored
Adds text similarity task to the NLP docs (elastic#2223)
* [DOCS] Adjusts line breaks. * Adds text similarity task to the NLP docs. * Addresses feedback.
1 parent aaab62b commit fad0f34

File tree

2 files changed

+52
-7
lines changed

2 files changed

+52
-7
lines changed

docs/en/stack/ml/nlp/ml-nlp-classify-text.asciidoc

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,12 @@ include::ml-nlp-lang-ident.asciidoc[]
2121
[[ml-nlp-text-classification]]
2222
== Text classification
2323

24-
Text classification assigns the input text to one of multiple classes
25-
that best describe the text. The classes used depend
26-
on the model and the data set that was used to train it. Based on the
27-
number of classes, two main types of classification exist: binary
28-
classification, where the number of classes is exactly two, and
29-
multi-class classification, where the number of classes is more than
30-
two.
24+
Text classification assigns the input text to one of multiple classes that best
25+
describe the text. The classes used depend on the model and the data set that
26+
was used to train it. Based on the number of classes, two main types of
27+
classification exist: binary classification, where the number of classes is
28+
exactly two, and multi-class classification, where the number of classes is more
29+
than two.
3130

3231
This task can help you analyze text for markers of positive or negative
3332
sentiment or classify text into various topics. For example, you might use a

docs/en/stack/ml/nlp/ml-nlp-search-compare.asciidoc

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
The {stack-ml-features} can generate embeddings, which you can use to search in
77
unstructured text or compare different pieces of text.
88

9+
* <<ml-nlp-text-embedding>>
10+
* <<ml-nlp-text-similarity>>
911

1012
[discrete]
1113
[[ml-nlp-text-embedding]]
@@ -48,4 +50,48 @@ The task returns the following result:
4850
}
4951
...
5052
----------------------------------
53+
// NOTCONSOLE
54+
55+
56+
[discrete]
57+
[[ml-nlp-text-similarity]]
58+
== Text similarity
59+
60+
The text similarity task estimates how similar two pieces of text are to each
61+
other and expresses the similarity in a numeric value. This is commonly referred
62+
to as cross-encoding. This task is useful for ranking document text when
63+
comparing it to another provided text input.
64+
65+
You can provide multiple strings of text to compare to another text input
66+
sequence. Each string is compared to the given text sequence at inference time
67+
and a prediction of similarity is calculated for every string of text.
68+
69+
[source,js]
70+
----------------------------------
71+
{
72+
"docs":[{ "text_field": "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."}, {"text_field": "New York City is famous for the Metropolitan Museum of Art."}],
73+
"inference_config": {
74+
"text_similarity": {
75+
"text": "How many people live in Berlin?"
76+
}
77+
}
78+
}
79+
----------------------------------
80+
// NOTCONSOLE
81+
82+
In the example above, every string in the `docs` array is compared individually
83+
to the text provided in the `text_similarity`.`text` field and a predicted
84+
similarity is calculated for both as the API response shows:
85+
86+
[source,js]
87+
----------------------------------
88+
...
89+
{
90+
"predicted_value": 7.235751628875732
91+
},
92+
{
93+
"predicted_value": -11.562295913696289
94+
}
95+
...
96+
----------------------------------
5197
// NOTCONSOLE

0 commit comments

Comments
 (0)