|
| 1 | +[[ml-nlp-ner-example]] |
| 2 | += How to deploy named entity recognition |
| 3 | + |
| 4 | +++++ |
| 5 | +<titleabbrev>Named entity recognition</titleabbrev> |
| 6 | +++++ |
| 7 | +:keywords: {ml-init}, {stack}, {nlp} |
| 8 | + |
| 9 | +You can use these instructions to deploy a |
| 10 | +<<ml-nlp-ner,named entity recognition (NER)>> model in {es}, test the model, and |
| 11 | +add it to an {infer} ingest pipeline. The model that is used in the example is |
| 12 | +publicly available on https://huggingface.co/[HuggingFace]. |
| 13 | + |
| 14 | + |
| 15 | +[discrete] |
| 16 | +[[ex-ner-requirements]] |
| 17 | +== Requirements |
| 18 | + |
| 19 | +include::ml-nlp-shared.asciidoc[tag=nlp-requirements] |
| 20 | + |
| 21 | + |
| 22 | +[discrete] |
| 23 | +[[ex-ner-deploy]] |
| 24 | +== Deploy a NER model |
| 25 | + |
| 26 | +include::ml-nlp-shared.asciidoc[tag=nlp-eland-clone-docker-build] |
| 27 | + |
| 28 | +Select a NER model from the |
| 29 | +{ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-ner[third-party model reference list]. |
| 30 | +This example uses an |
| 31 | +https://huggingface.co/elastic/distilbert-base-uncased-finetuned-conll03-english[uncased NER model]. |
| 32 | + |
| 33 | +Install the model by running the `eland_import_model_hub` command in the Docker |
| 34 | +image: |
| 35 | + |
| 36 | +[source,shell] |
| 37 | +-------------------------------------------------- |
| 38 | +docker run -it --rm elastic/eland \ |
| 39 | + eland_import_hub_model \ |
| 40 | + --cloud-id $CLOUD_ID \ |
| 41 | + -u <username> -p <password> \ |
| 42 | + --hub-model-id elastic/distilbert-base-uncased-finetuned-conll03-english \ |
| 43 | + --task-type ner \ |
| 44 | + --start |
| 45 | +
|
| 46 | +-------------------------------------------------- |
| 47 | + |
| 48 | +You need to provide an administrator username and its password and replace the |
| 49 | +`$CLOUD_ID` with the ID of your Cloud deployment. This Cloud ID can be copied |
| 50 | +from the deployments page on your Cloud website. |
| 51 | + |
| 52 | +include::ml-nlp-shared.asciidoc[tag=nlp-start] |
| 53 | + |
| 54 | +include::ml-nlp-shared.asciidoc[tag=nlp-sync] |
| 55 | + |
| 56 | + |
| 57 | +[discrete] |
| 58 | +[[ex-ner-test]] |
| 59 | +== Test the NER model |
| 60 | + |
| 61 | +Deployed models can be evaluated in {kib} under **{ml-app}** > |
| 62 | +**Trained Models** by selecting the **Test model** action for the respective |
| 63 | +model. |
| 64 | + |
| 65 | +[role="screenshot"] |
| 66 | +image::images/ml-nlp-ner-test.png[Test trained model UI] |
| 67 | + |
| 68 | +.**Test the model by using the _infer API** |
| 69 | +[%collapsible] |
| 70 | +==== |
| 71 | +You can also evaluate your models by using the |
| 72 | +{ref}/infer-trained-model-deployment.html[_infer API]. In the following request, |
| 73 | +`text_field` is the field name where the model expects to find the input, as |
| 74 | +defined in the model configuration. By default, if the model was uploaded via |
| 75 | +Eland, the input field is `text_field`. |
| 76 | +
|
| 77 | +[source,js] |
| 78 | +-------------------------------------------------- |
| 79 | +POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/_infer |
| 80 | +{ |
| 81 | + "docs": [ |
| 82 | + { |
| 83 | + "text_field": "Elastic is headquartered in Mountain View, California." |
| 84 | + } |
| 85 | + ] |
| 86 | +} |
| 87 | +-------------------------------------------------- |
| 88 | +
|
| 89 | +The API returns a response similar to the following: |
| 90 | +
|
| 91 | +[source,js] |
| 92 | +-------------------------------------------------- |
| 93 | +{ |
| 94 | + "inference_results": [ |
| 95 | + { |
| 96 | + "predicted_value": "[Elastic](ORG&Elastic) is headquartered in [Mountain View](LOC&Mountain+View), [California](LOC&California).", |
| 97 | + "entities": [ |
| 98 | + { |
| 99 | + "entity": "elastic", |
| 100 | + "class_name": "ORG", |
| 101 | + "class_probability": 0.9958921231805256, |
| 102 | + "start_pos": 0, |
| 103 | + "end_pos": 7 |
| 104 | + }, |
| 105 | + { |
| 106 | + "entity": "mountain view", |
| 107 | + "class_name": "LOC", |
| 108 | + "class_probability": 0.9844731508992688, |
| 109 | + "start_pos": 28, |
| 110 | + "end_pos": 41 |
| 111 | + }, |
| 112 | + { |
| 113 | + "entity": "california", |
| 114 | + "class_name": "LOC", |
| 115 | + "class_probability": 0.9972361009811214, |
| 116 | + "start_pos": 43, |
| 117 | + "end_pos": 53 |
| 118 | + } |
| 119 | + ] |
| 120 | + } |
| 121 | + ] |
| 122 | +} |
| 123 | +-------------------------------------------------- |
| 124 | +// NOTCONSOLE |
| 125 | +==== |
| 126 | + |
| 127 | +Using the example text "Elastic is headquartered in Mountain View, California.", |
| 128 | +the model finds three entities: an organization "Elastic", and two locations |
| 129 | +"Mountain View" and "California". |
| 130 | + |
| 131 | + |
| 132 | +[discrete] |
| 133 | +[[ex-ner-ingest]] |
| 134 | +== Add the NER model to an {infer} ingest pipeline |
| 135 | + |
| 136 | +You can perform bulk {infer} on documents as they are ingested by using an |
| 137 | +{ref}/inference-processor.html[{infer} processor] in your ingest pipeline. The |
| 138 | +novel _Les Misérables_ by Victor Hugo is used as an example for {infer} in |
| 139 | +the following example. |
| 140 | +https://github.com/elastic/stack-docs/blob/8.5/docs/en/stack/ml/nlp/data/les-miserables-nd.json[Download] |
| 141 | +the novel text split by paragraph as a JSON file, then upload it by using the |
| 142 | +{kibana-ref}/connect-to-elasticsearch.html#upload-data-kibana[Data Visualizer]. |
| 143 | +Give the new index the name `les-miserables` when uploading the file. |
| 144 | + |
| 145 | +Now create an ingest pipeline either in the |
| 146 | +{ml-docs}/ml-nlp-inference.html#ml-nlp-inference-processor[Stack management UI] |
| 147 | +or by using the API: |
| 148 | + |
| 149 | +[source,js] |
| 150 | +-------------------------------------------------- |
| 151 | +PUT _ingest/pipeline/ner |
| 152 | +{ |
| 153 | + "description": "NER pipeline", |
| 154 | + "processors": [ |
| 155 | + { |
| 156 | + "inference": { |
| 157 | + "model_id": "elastic__distilbert-base-uncased-finetuned-conll03-english", |
| 158 | + "target_field": "ml.ner", |
| 159 | + "field_map": { |
| 160 | + "paragraph": "text_field" |
| 161 | + } |
| 162 | + } |
| 163 | + }, |
| 164 | + { |
| 165 | + "script": { |
| 166 | + "lang": "painless", |
| 167 | + "if": "return ctx['ml']['ner'].containsKey('entities')", |
| 168 | + "source": "Map tags = new HashMap(); for (item in ctx['ml']['ner']['entities']) { if (!tags.containsKey(item.class_name)) tags[item.class_name] = new HashSet(); tags[item.class_name].add(item.entity);} ctx['tags'] = tags;" |
| 169 | + } |
| 170 | + } |
| 171 | + ], |
| 172 | + "on_failure": [ |
| 173 | + { |
| 174 | + "set": { |
| 175 | + "description": "Index document to 'failed-<index>'", |
| 176 | + "field": "_index", |
| 177 | + "value": "failed-{{{ _index }}}" |
| 178 | + } |
| 179 | + }, |
| 180 | + { |
| 181 | + "set": { |
| 182 | + "description": "Set error message", |
| 183 | + "field": "ingest.failure", |
| 184 | + "value": "{{_ingest.on_failure_message}}" |
| 185 | + } |
| 186 | + } |
| 187 | + ] |
| 188 | +} |
| 189 | +-------------------------------------------------- |
| 190 | + |
| 191 | +The `field_map` object of the `inference` processor maps the `paragraph` field |
| 192 | +in the _Les Misérables_ documents to `text_field` (the name of the |
| 193 | +field the model is configured to use). The `target_field` is the name of the |
| 194 | +field to write the inference results to. |
| 195 | + |
| 196 | +The `script` processor pulls out the entities and groups them by type. The end |
| 197 | +result is lists of people, locations, and organizations detected in the input |
| 198 | +text. This painless script enables you to build visualizations from the fields |
| 199 | +that are created. |
| 200 | + |
| 201 | +The purpose of the `on_failure` clause is to record errors. It sets the `_index` |
| 202 | +meta field to a new value, and the document is now stored there. It also sets a |
| 203 | +new field `ingest.failure` and the error message is written to this field. |
| 204 | +{infer-cap} can fail for a number of easily fixable reasons. Perhaps the model |
| 205 | +has not been deployed, or the input field is missing in some of the source |
| 206 | +documents. By redirecting the failed documents to another index and setting the |
| 207 | +error message, those failed inferences are not lost and can be reviewed later. |
| 208 | +When the errors are fixed, reindex from the failed index to recover the |
| 209 | +unsuccessful requests. |
| 210 | + |
| 211 | +Ingest the text of the novel - the index `les-miserables` - through the pipeline |
| 212 | +you created: |
| 213 | + |
| 214 | +[source,js] |
| 215 | +-------------------------------------------------- |
| 216 | +POST _reindex |
| 217 | +{ |
| 218 | + "source": { |
| 219 | + "index": "les-miserables" |
| 220 | + }, |
| 221 | + "dest": { |
| 222 | + "index": "les-miserables-infer", |
| 223 | + "pipeline": "ner" |
| 224 | + } |
| 225 | +} |
| 226 | +-------------------------------------------------- |
| 227 | + |
| 228 | +Take a random paragraph from the source document as an example: |
| 229 | + |
| 230 | +[source,js] |
| 231 | +-------------------------------------------------- |
| 232 | +{ |
| 233 | + "paragraph": "Father Gillenormand did not do it intentionally, but inattention to proper names was an aristocratic habit of his.", |
| 234 | + "line": 12700 |
| 235 | +} |
| 236 | +-------------------------------------------------- |
| 237 | + |
| 238 | +After the text is ingested through the NER pipeline, find the resulting document |
| 239 | +stored in {es}: |
| 240 | + |
| 241 | +[source,js] |
| 242 | +-------------------------------------------------- |
| 243 | +GET /les-miserables-infer/_search |
| 244 | +{ |
| 245 | + "query": { |
| 246 | + "term": { |
| 247 | + "line": 12700 |
| 248 | + } |
| 249 | + } |
| 250 | +} |
| 251 | +-------------------------------------------------- |
| 252 | + |
| 253 | +The request returns the document marked up with one identified person: |
| 254 | + |
| 255 | +[source,js] |
| 256 | +-------------------------------------------------- |
| 257 | +(...) |
| 258 | +"paragraph": "Father Gillenormand did not do it intentionally, but inattention to proper names was an aristocratic habit of his.", |
| 259 | + "@timestamp": "2020-01-01T17:38:25.000+01:00", |
| 260 | + "line": 12700, |
| 261 | + "ml": { |
| 262 | + "ner": { |
| 263 | + "predicted_value": "Father [Gillenormand](PER&Gillenormand) did not do it intentionally, but inattention to proper names was an aristocratic habit of his.", |
| 264 | + "entities": [ |
| 265 | + { |
| 266 | + "entity": "gillenormand", |
| 267 | + "class_name": "PER", |
| 268 | + "class_probability": 0.9452480789333386, |
| 269 | + "start_pos": 7, |
| 270 | + "end_pos": 19 |
| 271 | + } |
| 272 | + ], |
| 273 | + "model_id": "elastic__distilbert-base-uncased-finetuned-conll03-english" |
| 274 | + } |
| 275 | + }, |
| 276 | + "tags": { |
| 277 | + "PER": [ |
| 278 | + "gillenormand" |
| 279 | + ] |
| 280 | + } |
| 281 | +(...) |
| 282 | +-------------------------------------------------- |
| 283 | + |
| 284 | + |
| 285 | +[discrete] |
| 286 | +[[ex-ner-visual]] |
| 287 | +== Visualize results |
| 288 | + |
| 289 | +You can create a tag cloud to visualize your data processed by the {infer} |
| 290 | +pipeline. A tag cloud is a visualization that scales words by the frequency at |
| 291 | +which they occur. It is a handy tool for viewing the entities found in the data. |
| 292 | + |
| 293 | +In {kib}, open **Stack management** > **{data-sources-cap}**, and create a new |
| 294 | +{data-source} from the `les-miserables-infer` index pattern. |
| 295 | + |
| 296 | +Open **Dashboard** and create a new dashboard. Select the |
| 297 | +*Aggregation based-type > Tag cloud* visualization. Choose the new {data-source} |
| 298 | +as the source. |
| 299 | + |
| 300 | +Add a new bucket with a term aggregation, select the `tags.PER.keyword` field, |
| 301 | +and increase the size to 20. |
| 302 | + |
| 303 | +Optionally, adjust the time selector to cover the data points in the |
| 304 | +{data-source} if you selected a time field when creating it. |
| 305 | + |
| 306 | +Update and save the visualization. |
| 307 | + |
| 308 | +[role="screenshot"] |
| 309 | +image::images/ml-nlp-tag-cloud.png[alt="Tag cloud created from Les Misérables",align="center"] |
0 commit comments