|
| 1 | +[role="xpack"] |
| 2 | +[[ml-restart-failed-jobs]] |
| 3 | += Restart failed {anomaly-jobs} |
| 4 | + |
| 5 | +If an {anomaly-job} fails, try to restart the job by following the procedure |
| 6 | +described below. If the restarted job runs as expected, then the problem that |
| 7 | +caused the job to fail was transient and no further investigation is needed. If |
| 8 | +the job quickly fails after the restart, then the problem is persistent and |
| 9 | +needs further investigation. In this case, find out which node the failed job |
| 10 | +was running on by checking the job stats on the **Job management** pane in |
| 11 | +{kib}. Then get the logs for that node and look for exceptions and errors where |
| 12 | +the ID of the {anomaly-job} is in the message to have a better understanding of |
| 13 | +the issue. |
| 14 | + |
| 15 | +If an {anomaly-job} has failed, do the following to recover from `failed` state: |
| 16 | + |
| 17 | +. _Force_ stop the corresponding {dfeed} by using the |
| 18 | +{ref}/ml-stop-datafeed.html[Stop {dfeed} API] with the `force` parameter being |
| 19 | +`true`. For example, the following request force stops the `my_datafeed` |
| 20 | +{dfeed}. |
| 21 | ++ |
| 22 | +-- |
| 23 | +[source,console] |
| 24 | +-------------------------------------------------- |
| 25 | +POST _ml/datafeeds/my_datafeed/_stop |
| 26 | +{ |
| 27 | + "force": "true" |
| 28 | +} |
| 29 | +-------------------------------------------------- |
| 30 | +// TEST[skip] |
| 31 | +-- |
| 32 | + |
| 33 | +. _Force_ close the {anomaly-job} by using the |
| 34 | +{ref}/ml-close-job.html[Close {anomaly-job} API] with the `force` parameter |
| 35 | +being `true`. For example, the following request force closes the `my_job` |
| 36 | +{anomaly-job}: |
| 37 | ++ |
| 38 | +-- |
| 39 | +[source,console] |
| 40 | +-------------------------------------------------- |
| 41 | +POST _ml/anomaly_detectors/my_job/_close?force=true |
| 42 | +-------------------------------------------------- |
| 43 | +// TEST[skip] |
| 44 | +-- |
| 45 | + |
| 46 | +. Restart the {anomaly-job} on the **Job management** pane in {kib}. |
| 47 | + |
0 commit comments