Skip to content

Agent keeps retrying large messages when it is failing #748

@gnunn1

Description

@gnunn1

Describe the bug

The Agent is spamming the below log message about a message being too large and stops processing other events such as updating applications. I'm not sure what is triggering this but I suspect the resource proxy since it occurs whenever I click on a resource to bring up the sidebar with the live and desired manifests. Note that this seems to happen even with a small resource.

I started my home lab this morning and after going to view a resource in the UI the agent logs are constantly getting this log message:
time="2026-02-10T16:41:11Z" level=error msg="Error while receiving: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (28602193 vs. 4194304)" clientAddr="192.168.1.5:443" direction=Recv module=StreamEvent

On the Principal side I see the following messages:

time="2026-02-10T22:34:17Z" level=error msg="Timeout communicating to the agent: context deadline exceeded SUBSCRIBE {app|managed-resources|argocd-agent-hub-prod-local_cert-manager|1.8.3}" agentName=argocd-agent-hub-prod-local connectionUUID=b5ce3f8a-9bbb-4d3e-91f4-2778a35297ce function=sendSynchronousRedisMessageToAgent module=server sendEventUUID=1487b066-1ba8-4dc4-826c-80954f14ce7f
time="2026-02-10T22:34:17Z" level=error msg="unexpected nil in get response" connUUID=b5ce3f8a-9bbb-4d3e-91f4-2778a35297ce function=redisFxn module=redisProxy
time="2026-02-10T22:34:17Z" level=error msg="exit due to unable to handle agent subscribe" connUUID=b5ce3f8a-9bbb-4d3e-91f4-2778a35297ce error="unexpected nil in get response" function=redisFxn module=redisProxy
time="2026-02-10T22:34:17Z" level=error msg="Timeout communicating to the agent: context deadline exceeded SUBSCRIBE {app|managed-resources|argocd-agent-hub-prod-local_cert-manager|1.8.3}" agentName=argocd-agent-hub-prod-local connectionUUID=0bf3b4cc-f537-42b9-87c5-b57f30843a0a function=sendSynchronousRedisMessageToAgent module=server sendEventUUID=ec7ee005-9612-4013-9c11-072cd4607098
time="2026-02-10T22:34:17Z" level=error msg="unexpected nil in get response" connUUID=0bf3b4cc-f537-42b9-87c5-b57f30843a0a function=redisFxn module=redisProxy
time="2026-02-10T22:34:17Z" level=error msg="exit due to unable to handle agent subscribe" connUUID=0bf3b4cc-f537-42b9-87c5-b57f30843a0a error="unexpected nil in get response" function=redisFxn module=redisProxy

Steps to reproduce the behaviour

  1. Click on a resource to view the popout with the manifests for the resource
  2. (...)
  3. (...)

Expected behavior

Agent should fail cleanly on a large message rather then constantly retrying.

Additional context

I'm using OpenShift GitOps 1.19 and 0.5.4 of the supported Red Hat Agent chart.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions