Skip to content

Commit 1a6831c

Browse files
authored
[DOCS] Improves influencer documentation (elastic#2221)
1 parent d065231 commit 1a6831c

File tree

1 file changed

+36
-22
lines changed

1 file changed

+36
-22
lines changed

docs/en/stack/ml/anomaly-detection/ml-influencers.asciidoc

Lines changed: 36 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -5,35 +5,49 @@ suspicions about which entities in your data set are likely causing
55
irregularities, you can identify them as influencers in your {anomaly-jobs}.
66
That is to say, _influencers_ are fields that you suspect contain information
77
about someone or something that influences or contributes to anomalies in your
8-
data.
9-
10-
Influencers can be any field in your data. If you use {dfeeds}, however, the
11-
field must exist in your {dfeed} query or aggregation; otherwise it is not
12-
included in the job analysis. If you use a query in your {dfeed}, there is an
13-
additional requirement: influencer fields must exist in the query results in the
14-
same hit as the detector fields. {dfeeds-cap} process data by paging through the
15-
query results; since search hits cannot span multiple indices or documents,
16-
{dfeeds} have the same limitation.
17-
18-
Influencers do not need to be fields that are specified in your {anomaly-job}
19-
detectors, though they often are. If you use aggregations in your {dfeed}, it is
20-
possible to use influencers that come from different indices than the detector
21-
fields. However, both indices must have a date field with the same name, which you
22-
specify in the `data_description`.`time_field` property for the {dfeed}.
8+
data. Influencers can be any field in your data.
9+
10+
You can pick influencers when you create your {anomaly-job} by using the
11+
**Advanced job wizard**.
12+
13+
.**Requirements when using the {ml} APIs to pick influencers**
14+
[%collapsible]
15+
====
16+
* The influencer field must exist in your {dfeed} query or aggregation;
17+
otherwise it is not included in the job analysis.
18+
* If you use a query in your {dfeed}: influencer fields must exist in the query
19+
results in the same hit as the detector fields. {dfeeds-cap} process data by
20+
paging through the query results; since search hits cannot span multiple indices
21+
or documents, {dfeeds} have the same limitation.
22+
* If you use aggregations in your {dfeed}, it is possible to use influencers
23+
that come from different indices than the detector fields. However, both indices
24+
must have a date field with the same name, which you specify in the
25+
`data_description`.`time_field` property for the {dfeed}.
26+
* Influencers do not need to be fields that are specified in your {anomaly-job}
27+
detectors, though they often are.
28+
====
2329
2430
Picking an influencer is strongly recommended for the following reasons:
2531

26-
* It allows you to more easily assign blame for anomalies
27-
* It simplifies and aggregates the results
32+
* It allows you to more easily assign blame for anomalies.
33+
* It simplifies and aggregates the results.
2834
2935
If you use {kib}, the job creation wizards can suggest which fields to use as
3036
influencers. The best influencer is the person or thing that you want to blame
31-
for the anomaly. In many cases, users or client IP addresses make excellent
32-
influencers.
37+
for the anomaly. In many cases, categorical data fields – like users or client
38+
IP addresses – make excellent influencers.
39+
40+
The **Anomaly Explorer** in {kib} lists the top influencers for a job and shows
41+
the most influental values for every anomaly. It also enables you to filter the
42+
data by a given influencer.
43+
44+
TIP: Do not pick too many influencers. For example, you generally do not need
45+
more than three. If you pick many influencers, the results can be overwhelming
46+
and there is a small overhead to the analysis.
3347

34-
TIP: As a best practice, do not pick too many influencers. For example, you
35-
generally do not need more than three. If you pick many influencers, the results
36-
can be overwhelming and there is a small overhead to the analysis.
48+
Refer to
49+
https://www.elastic.co/blog/interpretability-in-ml-identifying-anomalies-influencers-root-causes[this blog post]
50+
for further details on influencers.
3751

3852

3953
end::influencers[]

0 commit comments

Comments
 (0)