Skip to content

Commit 266f906

Browse files
szabostevelcawl
andauthored
[DOCS] Adds item about fields containing arrays to anomaly detection limitations (elastic#1651)
Co-authored-by: Lisa Cawley <[email protected]>
1 parent cecc6d2 commit 266f906

File tree

1 file changed

+33
-4
lines changed

1 file changed

+33
-4
lines changed

docs/en/stack/ml/anomaly-detection/ml-limitations.asciidoc

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,31 @@ You cannot use the following field names in the `by_field_name` or
7070
`over_field_name` properties in a job: `by`; `count`; `over`. This limitation
7171
also applies to those properties when you create advanced jobs in {kib}.
7272

73+
74+
[discrete]
75+
[[ml-arrays-limitations]]
76+
=== Arrays in analyzed fields are turned into comma-separated strings
77+
78+
If an {anomaly-job} is configured to analyze an aggregatable field (a field that
79+
is part of the index mapping definition), and this field contains an array, then
80+
the array is turned into a comma-separated concatenated string. The items in the
81+
array are sorted alphabetically and the duplicated items are removed. For
82+
example, the array `["zebra", "dog", "cat", "alligator", "cat"]` becomes
83+
`alligator,cat,dog,zebra`. The Anomaly Explorer charts don't display any results
84+
for the job as the string does not exist in the source data. The Single Metric
85+
Viewer displays results if the model plot is enabled.
86+
87+
If an array field is not aggregatable and is retrieved from `_source`, the array
88+
is also turned into a comma-separated, concatenated list. However, the list
89+
items are not sorted alphabetically, nor are they deduplicated. Taking the
90+
example above, the comma-separated list, in this case, would be
91+
`zebra,dog,cat,alligator,cat`.
92+
93+
Analyzing large arrays results in long strings which may require more system
94+
resources. Consider using a query in the {dfeed} that filters on the relevant
95+
items of the array.
96+
97+
7398
[discrete]
7499
[[ml-frozen-limitations]]
75100
=== Frozen indices are not supported
@@ -109,10 +134,11 @@ For more information about any of these functions, see <<ml-functions>>.
109134
[[ml-limitations-runtime]]
110135
=== {anomaly-detect-cap} performs better on indexed fields
111136

112-
{anomaly-jobs-cap} sort all data by a user-defined time field, which is frequently
113-
accessed. If the time field is a {ref}/runtime.html[runtime field], the
114-
performance impact of calculating field values at query time can significantly slow
115-
the job. Use an indexed field as a time field when running {anomaly-jobs}.
137+
{anomaly-jobs-cap} sort all data by a user-defined time field, which is
138+
frequently accessed. If the time field is a {ref}/runtime.html[runtime field],
139+
the performance impact of calculating field values at query time can
140+
significantly slow the job. Use an indexed field as a time field when running
141+
{anomaly-jobs}.
116142

117143

118144
[discrete]
@@ -144,6 +170,7 @@ you send to the job must use the JSON format.
144170
For more information about this API, see
145171
{ref}/ml-post-data.html[Post Data to Jobs].
146172

173+
147174
[discrete]
148175
=== Misleading high missing field counts
149176
//See x-pack-elasticsearch/#684
@@ -288,6 +315,7 @@ To avoid this behavior, make sure that the aggregation interval in the {dfeed}
288315
configuration and the bucket span in the {anomaly-job} configuration have the
289316
same values.
290317

318+
291319
[discrete]
292320
[[ml-space-limitations]]
293321
=== Calendars and filters are visible in all {kib} spaces
@@ -298,6 +326,7 @@ that belong to your space. However, this limited scope does not apply to
298326
<<ml-calendars,calendars>> and <<ml-rules,filters>>; they are visible in all
299327
spaces.
300328

329+
301330
[discrete]
302331
[[ml-rollup-limitations]]
303332
=== Rollup indices and index patterns are not supported in {kib}

0 commit comments

Comments
 (0)