Skip to content

[ML] ClearInferenceEndpointCacheAction breaks rolling upragde tests #134809

@tlrx

Description

@tlrx

The ClearInferenceEndpointCacheAction added in #133860 is always sent to the master node, even if that node is in a previous version that do not support the action.

In case of unkown action, the InboundAggregator throws an assertion, which causes the node to exit the JVM and the test to fail:

[2025-09-16T12:44:09,677][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [test-cluster-1] fatal error in thread [elasticsearch-error-rethrower], exiting
java.lang.AssertionError: cluster:internal/xpack/inference/clear_inference_endpoint_cache
	at org.elasticsearch.transport.InboundAggregator.lambda$new$0(InboundAggregator.java:47) ~[elasticsearch-8.19.0.jar:?]
	at org.elasticsearch.transport.InboundAggregator.initializeRequestState(InboundAggregator.java:198) ~[elasticsearch-8.19.0.jar:?]
	at org.elasticsearch.transport.InboundAggregator.headerReceived(InboundAggregator.java:67) ~[elasticsearch-8.19.0.jar:?]
	at org.elasticsearch.transport.InboundPipeline.headerReceived(InboundPipeline.java:139) ~[elasticsearch-8.19.0.jar:?]
	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:113) ~[elasticsearch-8.19.0.jar:?]
	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:97) ~[elasticsearch-8.19.0.jar:?]
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:62) ~[elasticsearch-8.19.0.jar:?]
	at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:55) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:107) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:796) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:697) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:660) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	at java.lang.Thread.run(Thread.java:1447) ~[?:?]

This was caught when reenabling BWC tests in #134784 in which a quick workaround was implemented (do not send the action when xpack.inference.endpoint.cache.enabled is false)

Metadata

Metadata

Assignees

Labels

:mlMachine learning>bugTeam:MLMeta label for the ML teamv9.2.0

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions