[ML] ingest node crashes when running semantic_text inference during ml node shutdown

### Version:
9.0.0

### Build:

```
  "build": {
    "hash": "8ccfb227c2131c859033f409ee37a87023fada62",
    "date": "2024-10-16T00:43:30.449150814Z"
  },
```

### Error:
Node crashed with the error: `java.lang.IllegalStateException: index [0] has already been set`




### Step to reproduce:
1. Deploy a multi-nodes env locally. must contain one ml node and one ingest node.  My environment (`get _cat/node`):

```
192.168.68.51 57 100 21 4.52   it - node-5
192.168.68.51 61 100 24 4.52   l  - node-4
192.168.68.51 59 100 21 4.52   m  - node-1
192.168.68.51 12 100 21 4.52   m  * node-0
192.168.68.51 51 100 21 4.52   d  - node-3
192.168.68.51 33 100 21 4.52   m  - node-2
```

node-4 is ml node, node-5 is ingest node

2. Create an inference endpoint:

```
PUT _inference/sparse_embedding/elser-endpoint
{
  "service": "elser", 
  "service_settings": {"num_threads": 4, "adaptive_allocations": {"enabled": true}}
}
```

3. create an index with semantic_text in mapping:
```
...
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "content": { 
          "type": "semantic_text", 
          "inference_id": "elser-endpoint" 
        },
        ...
      }
    }
```

4. Start ingest data to the above index, I was using python client: `es.index(index=index_name, document=document)`
5. During the indexing, manually stop ml node: node-4

**Observed:**
After some seconds, it shows this in python client:
```
2024-10-16 00:02:18,557 DEBUG : Starting new HTTPS connection (2): localhost:9200
2024-10-16 00:02:46,400 DEBUG : https://localhost:9200 "POST /news-rss-feeds-espn-2024-10/_doc HTTP/11" 500 0
2024-10-16 00:02:46,401 ERROR : Exception type: ApiError
2024-10-16 00:02:46,401 ERROR : Error message: ApiError(500, 'node_disconnected_exception', '[node-5][192.168.68.51:9304][indices:data/write/bulk] disconnected')
2024-10-16 00:02:46,405 ERROR : Traceback (most recent call last):
```

Then index node: node-5 crashed

es log shows:
```
[2024-10-16T00:00:23,419][ERROR][o.e.a.s.SubscribableListener] [node-5] exception thrown while handling another exception in listener [org.elasticsearch.action.ActionListenerImplementations$MappedActionListener/org.elasticsearch.action.support.ContextPreservingActionListener/org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.tasks.TaskManager$1{org.elasticsearch.action.support.ContextPreservingActionListener/org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction$1@b59a5ea}{CancellableTask{Task{id=1373, type='transport', action='cluster:monitor/xpack/ml/trained_models/deployment/infer', description='infer_trained_model_deployment[elser-endpoint]', parentTask=ubKEYCjuSpC_vtLNjywQMw:1372, startTime=1729051218557, headers={}, startTimeNanos=804107876203041}, reason='null', isCancelled=false}}/org.elasticsearch.action.support.TransportAction$$Lambda/0x00001e000249ca88@7bf152ac/org.elasticsearch.xpack.security.action.filter.SecurityActionFilter$$Lambda/0x00001e000249a110@3b622e21]
java.lang.IllegalStateException: index [0] has already been set
	at org.elasticsearch.common.util.concurrent.AtomicArray.setOnce(AtomicArray.java:70) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.xpack.inference.chunking.EmbeddingRequestChunker$DebatchingListener.onFailure(EmbeddingRequestChunker.java:327) ~[?:?]
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:64) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:75) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:32) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
```

Full es log:
[8-nodes-9.0.0-main-884.log](https://github.com/user-attachments/files/17396386/8-nodes-9.0.0-main-884.log)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] ingest node crashes when running semantic_text inference during ml node shutdown #114909

Version:

Build:

Error:

Step to reproduce:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] ingest node crashes when running semantic_text inference during ml node shutdown #114909

Description

Version:

Build:

Error:

Step to reproduce:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions