Skip to content

Issues with GraphQL subscriptions #1094

@yassenb

Description

@yassenb

We've been having a number of errors regularly each day for months now and I've been postponing logging this until we upgraded to Spring Boot 3.4 but the errors persist. I haven't been able to pin point any determining factor and obviously subscriptions over websockets for the most part work, we use them heavily, but the errors are still in the logs. Here are the two common ones with stacktraces:

reactor.core.Exceptions$OverflowException: Queue is full: Reactive Streams source doesn't respect backpressure
at reactor.core.Exceptions.failWithOverflow ( reactor/core/Exceptions.java:251 )
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.onNext ( reactor/core.publisher/FluxPublishOn.java:233 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext ( reactor/core.publisher/MonoFlatMapMany.java:251 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext ( reactor/core.publisher/FluxOnErrorResume.java:79 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber.onNext ( reactor/core.publisher/FluxConcatArray.java:180 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxMap$MapSubscriber.onNext ( reactor/core.publisher/FluxMap.java:122 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxPeek$PeekSubscriber.onNext ( reactor/core.publisher/FluxPeek.java:200 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxMap$MapSubscriber.onNext ( reactor/core.publisher/FluxMap.java:122 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at graphql.execution.reactive.CompletionStageSubscriber.whenNextFinished ( graphql/execution.reactive/CompletionStageSubscriber.java:95 )
at graphql.execution.reactive.CompletionStageSubscriber.lambda$whenComplete$0 ( graphql/execution.reactive/CompletionStageSubscriber.java:78 )
at java.util.concurrent.CompletableFuture.uniWhenComplete
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire
at java.util.concurrent.CompletableFuture.postComplete
at java.util.concurrent.CompletableFuture.complete
at graphql.execution.ExecutionStrategy.lambda$buildFieldValueMap$2 ( graphql/execution/ExecutionStrategy.java:283 )
at java.util.concurrent.CompletableFuture.uniWhenComplete
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire
at java.util.concurrent.CompletableFuture.postComplete
at java.util.concurrent.CompletableFuture.complete
at graphql.execution.Async$Many.lambda$await$0 ( graphql/execution/Async.java:226 )
at java.util.concurrent.CompletableFuture.uniWhenComplete
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire
at java.util.concurrent.CompletableFuture.postComplete
at java.util.concurrent.CompletableFuture.complete
at graphql.execution.ExecutionStrategy.lambda$buildFieldValueMap$2 ( graphql/execution/ExecutionStrategy.java:283 )
at java.util.concurrent.CompletableFuture.uniWhenComplete
at java.util.concurrent.CompletableFuture.uniWhenCompleteStage
at java.util.concurrent.CompletableFuture.whenComplete
at graphql.execution.ExecutionStrategy.lambda$executeObject$0 ( graphql/execution/ExecutionStrategy.java:234 )
at java.util.concurrent.CompletableFuture.uniWhenComplete
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire
at java.util.concurrent.CompletableFuture.postComplete
at java.util.concurrent.CompletableFuture.complete
at graphql.execution.Async$Many.lambda$await$0 ( graphql/execution/Async.java:226 )
at java.util.concurrent.CompletableFuture.uniWhenComplete
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire
at java.util.concurrent.CompletableFuture.postComplete
at java.util.concurrent.CompletableFuture.complete
at reactor.core.publisher.MonoToCompletableFuture.onNext ( reactor/core.publisher/MonoToCompletableFuture.java:64 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext ( reactor/core.publisher/FluxContextWrite.java:107 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.MonoCompletionStage$MonoCompletionStageSubscription.apply ( reactor/core.publisher/MonoCompletionStage.java:121 )
at reactor.core.publisher.MonoCompletionStage$MonoCompletionStageSubscription.apply ( reactor/core.publisher/MonoCompletionStage.java:67 )
at java.util.concurrent.CompletableFuture.uniHandle
at java.util.concurrent.CompletableFuture$UniHandle.tryFire
at java.util.concurrent.CompletableFuture.postComplete
at java.util.concurrent.CompletableFuture.complete
at org.dataloader.DataLoaderHelper.lambda$dispatchQueueBatch$2 ( org/dataloader/DataLoaderHelper.java:267 )
at java.util.concurrent.CompletableFuture$UniApply.tryFire
at java.util.concurrent.CompletableFuture.postComplete
at java.util.concurrent.CompletableFuture.complete
at reactor.core.publisher.MonoToCompletableFuture.onNext ( reactor/core.publisher/MonoToCompletableFuture.java:64 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext ( reactor/core.publisher/FluxContextWrite.java:107 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.MonoCompletionStage$MonoCompletionStageSubscription.apply ( reactor/core.publisher/MonoCompletionStage.java:121 )
at reactor.core.publisher.MonoCompletionStage$MonoCompletionStageSubscription.apply ( reactor/core.publisher/MonoCompletionStage.java:67 )
at java.util.concurrent.CompletableFuture.uniHandle
at java.util.concurrent.CompletableFuture$UniHandle.tryFire
at java.util.concurrent.CompletableFuture.postComplete
at java.util.concurrent.CompletableFuture.complete
at org.springframework.graphql.data.method.InvocableHandlerMethodSupport.lambda$adaptCallable$1 ( org/springframework.graphql.data.method/InvocableHandlerMethodSupport.java:158 )
at java.util.concurrent.ThreadPerTaskExecutor$TaskRunner.run
at java.lang.VirtualThread.run

and

reactor.core.Exceptions$ReactorRejectedExecutionException: Scheduler unavailable
at reactor.core.Exceptions.failWithRejected ( reactor/core/Exceptions.java:285 )
at reactor.core.publisher.Operators.onRejectedExecution ( reactor/core.publisher/Operators.java:1075 )
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.trySchedule ( reactor/core.publisher/FluxPublishOn.java:333 )
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.onNext ( reactor/core.publisher/FluxPublishOn.java:237 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext ( reactor/core.publisher/MonoFlatMapMany.java:251 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext ( reactor/core.publisher/FluxOnErrorResume.java:79 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber.onNext ( reactor/core.publisher/FluxConcatArray.java:180 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxMap$MapSubscriber.onNext ( reactor/core.publisher/FluxMap.java:122 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxPeek$PeekSubscriber.onNext ( reactor/core.publisher/FluxPeek.java:200 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxMap$MapSubscriber.onNext ( reactor/core.publisher/FluxMap.java:122 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at graphql.execution.reactive.CompletionStageSubscriber.whenNextFinished ( graphql/execution.reactive/CompletionStageSubscriber.java:95 )
at graphql.execution.reactive.CompletionStageSubscriber.lambda$whenComplete$0 ( graphql/execution.reactive/CompletionStageSubscriber.java:78 )
at java.util.concurrent.CompletableFuture.uniWhenComplete
at java.util.concurrent.CompletableFuture.uniWhenCompleteStage
at java.util.concurrent.CompletableFuture.whenComplete
at java.util.concurrent.CompletableFuture.whenComplete
at graphql.execution.reactive.CompletionStageSubscriber.onNext ( graphql/execution.reactive/CompletionStageSubscriber.java:66 )
at reactor.core.publisher.StrictSubscriber.onNext ( reactor/core.publisher/StrictSubscriber.java:89 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext ( reactor/core.publisher/FluxContextWrite.java:107 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext ( reactor/core.publisher/FluxOnErrorResume.java:79 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/TracingSubscriber.java:68 )
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext ( reactor/core.publisher/FluxOnErrorResume.java:79 )
at kotlinx.coroutines.reactive.FlowSubscription$consumeFlow$2.emit ( kotlinx/coroutines.reactive/ReactiveFlow.kt:234 )
at kotlinx.coroutines.flow.internal.UndispatchedContextCollector$emitRef$1.invokeSuspend ( kotlinx/coroutines.flow.internal/ChannelFlow.kt:208 )
at kotlinx.coroutines.flow.internal.UndispatchedContextCollector$emitRef$1.invoke
at kotlinx.coroutines.flow.internal.UndispatchedContextCollector$emitRef$1.invoke
at kotlinx.coroutines.flow.internal.ChannelFlowKt.withContextUndispatched ( kotlinx/coroutines.flow.internal/ChannelFlow.kt:223 )
at kotlinx.coroutines.flow.internal.UndispatchedContextCollector.emit ( kotlinx/coroutines.flow.internal/ChannelFlow.kt:211 )
at havelock.gateway.subscription.SubscriptionController$counter$$inlined$map$1$2.emit ( havelock/gateway.subscription/Emitters.kt:219 )
at kotlinx.coroutines.flow.FlowKt__ChannelsKt.emitAllImpl$FlowKt__ChannelsKt ( kotlinx/coroutines.flow/Channels.kt:33 )
at kotlinx.coroutines.flow.FlowKt__ChannelsKt.access$emitAllImpl$FlowKt__ChannelsKt ( kotlinx/coroutines.flow/Channels.kt:1 )
at kotlinx.coroutines.flow.FlowKt__ChannelsKt$emitAllImpl$1.invokeSuspend
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith ( kotlin/coroutines.jvm.internal/ContinuationImpl.kt:33 )
at kotlinx.coroutines.DispatchedTaskKt.resume ( kotlinx/coroutines/DispatchedTask.kt:231 )
at kotlinx.coroutines.DispatchedTaskKt.resumeUnconfined ( kotlinx/coroutines/DispatchedTask.kt:187 )
at kotlinx.coroutines.DispatchedTaskKt.dispatch ( kotlinx/coroutines/DispatchedTask.kt:159 )
at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume ( kotlinx/coroutines/CancellableContinuationImpl.kt:466 )
at kotlinx.coroutines.CancellableContinuationImpl.completeResume ( kotlinx/coroutines/CancellableContinuationImpl.kt:582 )
at kotlinx.coroutines.channels.BufferedChannelKt.tryResume0 ( kotlinx/coroutines.channels/BufferedChannel.kt:2927 )
at kotlinx.coroutines.channels.BufferedChannelKt.access$tryResume0 ( kotlinx/coroutines.channels/BufferedChannel.kt:1 )
at kotlinx.coroutines.channels.BufferedChannel$BufferedChannelIterator.tryResumeHasNext ( kotlinx/coroutines.channels/BufferedChannel.kt:1717 )
at kotlinx.coroutines.channels.BufferedChannel.tryResumeReceiver ( kotlinx/coroutines.channels/BufferedChannel.kt:665 )
at kotlinx.coroutines.channels.BufferedChannel.updateCellSend ( kotlinx/coroutines.channels/BufferedChannel.kt:481 )
at kotlinx.coroutines.channels.BufferedChannel.access$updateCellSend ( kotlinx/coroutines.channels/BufferedChannel.kt:36 )
at kotlinx.coroutines.channels.BufferedChannel.send$suspendImpl ( kotlinx/coroutines.channels/BufferedChannel.kt:3120 )
at kotlinx.coroutines.channels.BufferedChannel.send
at havelock.gateway.subscription.events.Event$emit$1$1$1.invokeSuspend ( havelock/gateway.subscription.events/Event.kt:41 )
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith ( kotlin/coroutines.jvm.internal/ContinuationImpl.kt:33 )
at kotlinx.coroutines.DispatchedTask.run ( kotlinx/coroutines/DispatchedTask.kt:104 )
at io.opentelemetry.javaagent.instrumentation.kotlinxcoroutines.RunnableWrapper.lambda$stopPropagation$0 ( io/opentelemetry.javaagent.instrumentation.kotlinxcoroutines/RunnableWrapper.java:16 )
at java.util.concurrent.ThreadPerTaskExecutor$TaskRunner.run
at java.lang.VirtualThread.run

Caused by: java.util.concurrent.RejectedExecutionException
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution
at java.util.concurrent.ThreadPoolExecutor.reject
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule
at java.util.concurrent.ScheduledThreadPoolExecutor.submit
at reactor.core.scheduler.Schedulers.workerSchedule ( Schedulers.java:1410 )
at reactor.core.scheduler.ExecutorServiceWorker.schedule ( ExecutorServiceWorker.java:50 )
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.trySchedule ( FluxPublishOn.java:312 )

There is nothing in these back traces that I see that can help me further diagnose and I can't reliably reproduce the issue.

The lesser encountered one is

java.io.IOException: The current thread was interrupted while waiting for a blocking send to complete
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlockInternal ( org/apache.tomcat.websocket/WsRemoteEndpointImplBase.java:308 )
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock ( org/apache.tomcat.websocket/WsRemoteEndpointImplBase.java:266 )
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock ( org/apache.tomcat.websocket/WsRemoteEndpointImplBase.java:250 )
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString ( org/apache.tomcat.websocket/WsRemoteEndpointImplBase.java:223 )
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText ( org/apache.tomcat.websocket/WsRemoteEndpointBasic.java:48 )
at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage ( org/springframework.web.socket.adapter.standard/StandardWebSocketSession.java:217 )
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage ( org/springframework.web.socket.adapter/AbstractWebSocketSession.java:108 )
at org.springframework.graphql.server.webmvc.GraphQlWebSocketHandler.lambda$handleInternal$2 ( org/springframework.graphql.server.webmvc/GraphQlWebSocketHandler.java:245 )
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.ContextPropagationOperator$RunnableWrapper.run ( io/opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1/ContextPropagationOperator.java:373 )
at reactor.core.scheduler.SchedulerTask.call ( reactor/core.scheduler/SchedulerTask.java:68 )
at reactor.core.scheduler.SchedulerTask.call ( reactor/core.scheduler/SchedulerTask.java:28 )
at java.util.concurrent.FutureTask.run
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run
at java.util.concurrent.ThreadPoolExecutor.runWorker
at java.util.concurrent.ThreadPoolExecutor$Worker.run
at java.lang.Thread.run

Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos
at java.util.concurrent.Semaphore.tryAcquire
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.acquireMessagePartInProgressSemaphore ( WsRemoteEndpointImplBase.java:355 )
at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.acquireMessagePartInProgressSemaphore ( WsRemoteEndpointImplServer.java:146 )
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlockInternal ( WsRemoteEndpointImplBase.java:298 ) 

which I think happens upon application shutdown.

If it's any help, we're returning Kotlin Flow-s from our controller subscription methods which are backed by Kotlin Channel-s. We create a Channel, send to it asyncrhonously and convert the Channel to a Flow via consumeAsFlow

Can you look into those or let me know how I can further diagnose what's going on?

Also are there any plans on having another implementation of GraphQL subscriptions that doesn't use the Reactive stack? We would much prefer Kotlin coroutines or virtual threads and that's the only bit in Spring that still forces us to deal with the Reactive stack and it's exactly issues like these that are very hard to diagnose that we'd like to avoid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    status: invalidAn issue that we don't feel is valid

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions