@@ -5,35 +5,49 @@ suspicions about which entities in your data set are likely causing
5
5
irregularities, you can identify them as influencers in your {anomaly-jobs} .
6
6
That is to say, _influencers_ are fields that you suspect contain information
7
7
about someone or something that influences or contributes to anomalies in your
8
- data.
9
-
10
- Influencers can be any field in your data. If you use {dfeeds} , however, the
11
- field must exist in your {dfeed} query or aggregation; otherwise it is not
12
- included in the job analysis. If you use a query in your {dfeed} , there is an
13
- additional requirement: influencer fields must exist in the query results in the
14
- same hit as the detector fields. {dfeeds-cap} process data by paging through the
15
- query results; since search hits cannot span multiple indices or documents,
16
- {dfeeds} have the same limitation.
17
-
18
- Influencers do not need to be fields that are specified in your {anomaly-job}
19
- detectors, though they often are. If you use aggregations in your {dfeed} , it is
20
- possible to use influencers that come from different indices than the detector
21
- fields. However, both indices must have a date field with the same name, which you
22
- specify in the `data_description` .`time_field` property for the {dfeed} .
8
+ data. Influencers can be any field in your data.
9
+
10
+ You can pick influencers when you create your {anomaly-job} by using the
11
+ **Advanced job wizard** .
12
+
13
+ .**Requirements when using the {ml} APIs to pick influencers**
14
+ [%collapsible]
15
+ ====
16
+ * The influencer field must exist in your {dfeed} query or aggregation;
17
+ otherwise it is not included in the job analysis.
18
+ * If you use a query in your {dfeed} : influencer fields must exist in the query
19
+ results in the same hit as the detector fields. {dfeeds-cap} process data by
20
+ paging through the query results; since search hits cannot span multiple indices
21
+ or documents, {dfeeds} have the same limitation.
22
+ * If you use aggregations in your {dfeed} , it is possible to use influencers
23
+ that come from different indices than the detector fields. However, both indices
24
+ must have a date field with the same name, which you specify in the
25
+ `data_description` .`time_field` property for the {dfeed} .
26
+ * Influencers do not need to be fields that are specified in your {anomaly-job}
27
+ detectors, though they often are.
28
+ ====
23
29
24
30
Picking an influencer is strongly recommended for the following reasons:
25
31
26
- * It allows you to more easily assign blame for anomalies
27
- * It simplifies and aggregates the results
32
+ * It allows you to more easily assign blame for anomalies.
33
+ * It simplifies and aggregates the results.
28
34
29
35
If you use {kib} , the job creation wizards can suggest which fields to use as
30
36
influencers. The best influencer is the person or thing that you want to blame
31
- for the anomaly. In many cases, users or client IP addresses make excellent
32
- influencers.
37
+ for the anomaly. In many cases, categorical data fields – like users or client
38
+ IP addresses – make excellent influencers.
39
+
40
+ The **Anomaly Explorer** in {kib} lists the top influencers for a job and shows
41
+ the most influental values for every anomaly. It also enables you to filter the
42
+ data by a given influencer.
43
+
44
+ TIP: Do not pick too many influencers. For example, you generally do not need
45
+ more than three. If you pick many influencers, the results can be overwhelming
46
+ and there is a small overhead to the analysis.
33
47
34
- TIP: As a best practice, do not pick too many influencers. For example, you
35
- generally do not need more than three. If you pick many influencers, the results
36
- can be overwhelming and there is a small overhead to the analysis .
48
+ Refer to
49
+ https://www.elastic.co/blog/interpretability-in-ml-identifying-anomalies- influencers-root-causes[this blog post]
50
+ for further details on influencers .
37
51
38
52
39
53
end::influencers[]
0 commit comments