Skip to content

NoCondensationAvailableException: 100% failure rate with Converse Nemotron on main branch #2703

@juanmichelini

Description

@juanmichelini

Description

The conversation condenser is failing immediately for all instances when using the litellm_proxy/converse-nemotron-super-3-120b model on SWE-bench evaluations, resulting in 100% failure rate.

Error

NoCondensationAvailableException: Unable to compute forgotten events

Reproduction

Confirmed on main branch (commit 4ad68fda427b886fac0fdfcd3ce8c0e99f8f008b):

  • Evaluation ID: 23964349600
  • Model: litellm_proxy/converse-nemotron-super-3-120b
  • Benchmark: SWE-bench Verified (20 instances)
  • Result: All 20/20 instances failed with the same error

Also reproduces on PR #2685 (commit 423991a):

Failure Pattern

Timeline

  1. Conversation starts successfully
  2. ~7 events are logged (SystemPromptEvent, MessageEvent, ConversationStateUpdateEvents)
  3. CondensationRequest event is triggered
  4. ~20-30 seconds later: NoCondensationAvailableException is raised
  5. Conversation terminates with error status

Event Sequence (from conversation logs)

[
  {"kind": "SystemPromptEvent"},
  {"kind": "MessageEvent"},
  {"kind": "ConversationStateUpdateEvent"},
  {"kind": "ConversationStateUpdateEvent"},
  {"kind": "CondensationRequest"},
  {"kind": "ConversationStateUpdateEvent"},
  {
    "kind": "ConversationErrorEvent",
    "code": "NoCondensationAvailableException",
    "detail": "Unable to compute forgotten events"
  }
]

Retry Behavior

  • System classifies as failure_category=non_resource
  • Does NOT escalate resources
  • Retries fail with identical error (exhausts all 3 retries)

Affected Instances (Sample)

All 20 instances in the evaluation failed, including:

  • scikit-learn__scikit-learn-14983
  • django__django-13279
  • astropy__astropy-14309
  • sphinx-doc__sphinx-7757
  • scikit-learn__scikit-learn-25232
  • (and 15 more...)

Log Evidence

Instance django__django-12155 - 2026-04-03 22:33:59,334 - WARNING - [worker] runtime init failure instance=django__django-12155 attempt=1 retry=1 runtime_id=uhgibzfrmlyepcuv session_id=agent-server-bbe28e65-7e4d-46b3-b206-776e9880b0cf error=Conversation run failed for id=a567f2c4-d792-4192-9bc7-8c722945ece2: Remote conversation ended with error
Instance django__django-12155 - 2026-04-03 22:33:59,334 - WARNING - [worker] Instance django__django-12155: failure_category=non_resource, escalate_resources=False, runtime_failure_count=0

Environment

  • SDK Commit (main): 4ad68fda427b886fac0fdfcd3ce8c0e99f8f008b
  • Model: litellm_proxy/converse-nemotron-super-3-120b
  • Benchmark: SWE-bench Verified
  • Eval Limit: 20 instances
  • Configuration:
    • enable_conversation_event_logging: true
    • enable_condenser: null (default)
    • condenser_max_size: null (default)
    • condenser_keep_first: null (default)

Questions

  1. Is this specific to Converse Nemotron? (Other models like Claude, Gemini, Qwen, MiniMax work fine)
  2. Why does condensation trigger so early? (after only 7 events)
  3. What causes "unable to compute forgotten events"? (Is it a model response format issue?)

Impact

  • Severity: Critical - 100% failure rate for this model
  • Scope: Blocks all Converse Nemotron evaluations on SWE-bench
  • Workaround: Unknown (disabling condenser via config might help?)

Related

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions