Skip to content

Fix Anthropic Claude integration via LiteLLM model routing#25

Open
Neelkanthsahu02 wants to merge 11 commits into
VRSEN:mainfrom
Neelkanthsahu02:claude/fix-claude-api-integration-1SL0O
Open

Fix Anthropic Claude integration via LiteLLM model routing#25
Neelkanthsahu02 wants to merge 11 commits into
VRSEN:mainfrom
Neelkanthsahu02:claude/fix-claude-api-integration-1SL0O

Conversation

@Neelkanthsahu02
Copy link
Copy Markdown

The previous onboarding default DEFAULT_MODEL=litellm/claude-sonnet-4-6 was stripped to a bare 'claude-sonnet-4-6' before being passed to LiteLLM, which has no provider prefix to route on and 404s.

  • config._resolve now strips the optional 'litellm/' prefix and auto-prepends 'anthropic/' (or 'gemini/') when the resulting name is a bare model id, matching the convention already used in slides_agent/tools/InsertNewSlides.py.
  • is_openai_provider correctly classifies bare claude-* / gemini-* names as non-OpenAI so reasoning settings aren't applied to them.
  • onboard.py writes the explicit 'litellm/anthropic/claude-sonnet-4-6' default for new Anthropic setups; legacy .env files keep working.
  • .env.example documents that ANTHROPIC_API_KEY accepts both standard API keys (sk-ant-api...) and Claude Pro/Max/Code subscription OAuth tokens (sk-ant-oat01-...) generated via 'claude setup-token'.

claude added 11 commits May 9, 2026 09:13
The previous onboarding default DEFAULT_MODEL=litellm/claude-sonnet-4-6
was stripped to a bare 'claude-sonnet-4-6' before being passed to
LiteLLM, which has no provider prefix to route on and 404s.

- config._resolve now strips the optional 'litellm/' prefix and
  auto-prepends 'anthropic/' (or 'gemini/') when the resulting name
  is a bare model id, matching the convention already used in
  slides_agent/tools/InsertNewSlides.py.
- is_openai_provider correctly classifies bare claude-* / gemini-*
  names as non-OpenAI so reasoning settings aren't applied to them.
- onboard.py writes the explicit 'litellm/anthropic/claude-sonnet-4-6'
  default for new Anthropic setups; legacy .env files keep working.
- .env.example documents that ANTHROPIC_API_KEY accepts both standard
  API keys (sk-ant-api...) and Claude Pro/Max/Code subscription OAuth
  tokens (sk-ant-oat01-...) generated via 'claude setup-token'.
Adds first-class support for routing every model call through a local
OpenAI-compatible proxy (claude-code-router, 9router, ccproxy, etc.)
that authenticates against a Claude Pro/Max/Claude Code subscription.
Set OPENAI_BASE_URL to the router URL and pick whatever model name
the router maps to Claude.

- config._resolve: when OPENAI_BASE_URL is set, return the model string
  as-is so the OpenAI client hits the router; do not auto-route bare
  claude-* names through LiteLLM (LiteLLM ignores OPENAI_BASE_URL).
- config.is_openai_provider: return False whenever OPENAI_BASE_URL is
  set, so OpenAI-only ModelSettings (reasoning summaries) are not sent
  to the router/Claude.
- swarm.py: disable tracing in router mode — the dummy OPENAI_API_KEY
  the router accepts would be rejected by OpenAI's real tracing
  endpoint.
- .env.example: document the router workflow as the supported way to
  use a Claude subscription, and remove the previous claim that
  ANTHROPIC_API_KEY accepts sk-ant-oat01-... OAuth tokens directly
  (LiteLLM does not implement the OAuth header protocol, so that path
  silently fails).
InsertNewSlides and ModifySlide hardcoded gpt-5.3-codex (OpenAI) and
'anthropic/claude-sonnet-4-6' via direct LiteLLM, bypassing the local
router. Bare 'gpt-5.3-codex' isn't exposed by routers like 9router
(it's served as 'cx/gpt-5.3-codex'), and direct LiteLLM/Anthropic
calls don't go through subscription auth.

When OPENAI_BASE_URL is set:
- Use the agency's DEFAULT_MODEL via OpenAIResponsesModel against the
  router-bound AsyncOpenAI() client. Skip the ANTHROPIC_API_KEY
  branch and the 'gpt-5.3-codex' fallback entirely.
- Disable reasoning summary / verbosity ModelSettings — these are
  OpenAI-Responses-only options that the router/Claude reject.

Non-router behavior is unchanged.
Reported: agency-swarm rejected DEFAULT_MODEL=cc/claude-opus-4-7 with
'Unknown prefix: cc' because anything-with-a-slash gets parsed as
'<provider>/<model>' and 'cc' is not a registered provider.

Routers like 9router namespace models with their own prefixes
(cc/..., cx/...). To prevent agency-swarm from interpreting that
prefix at all, hand it a pre-built OpenAIResponsesModel instance
bound to the router-aware AsyncOpenAI() client. The SDK then uses
the instance directly and forwards 'cc/claude-opus-4-7' verbatim
to the router.
Reported: 'validation error for ModelResponse output: Input should be a
valid list, input_value=None'. 9router's /v1/responses endpoint does
not return Responses-API-shaped output items, so the SDK gets None
where it expects a list and the run fails after the user-visible
message has already streamed in.

Chat Completions has been a stable, universally-implemented OpenAI
API for years; routers and proxies that wrap subscription auth
implement it correctly. Switching to OpenAIChatCompletionsModel for
router mode in config.py — and reusing the resolver in the slides
agent helpers — fixes the validation error without changing
behaviour for direct OpenAI/LiteLLM paths.
Reported: 'Hosted tools are not supported with the ChatCompletions API.
Got tool type: WebSearchTool'.

Hosted tools (WebSearchTool, FileSearchTool, HostedMCPTool,
ImageGenerationTool, CodeInterpreterTool, ComputerTool) are
server-side OpenAI Responses-API features that only run against
api.openai.com. With router mode using Chat Completions the SDK
rejects them at agent construction.

Add config.filter_hosted_tools() that no-ops when not in router mode
and strips known hosted tool instances when it is. Apply it to the
five agents that include WebSearchTool: deep_research, docs_agent,
data_analyst_agent, virtual_assistant, slides_agent.

Web search is therefore unavailable in router mode for now. Users
who need search can either run with direct OpenAI auth, or wire a
custom search tool (the repo already includes ScholarSearch /
ProductSearch backed by SEARCH_API_KEY).
New ClaudeOAuthModel — a custom OpenAI Agents SDK Model implementation
that calls api.anthropic.com via the official anthropic Python SDK
with a Claude Code subscription OAuth token (sk-ant-oat01-...). This
gives subscription billing without depending on a local router proxy.

Auth:
- auth_token=<token> on AsyncAnthropic → Authorization: Bearer header.
- default_headers={'anthropic-beta': 'oauth-2025-04-20'}.
- System prompt prefixed with 'You are Claude Code, Anthropic's
  official CLI for Claude.' as required by the subscription path.

Wiring:
- config.is_oauth_mode(): true when CLAUDE_CODE_OAUTH_TOKEN is set
  or ANTHROPIC_API_KEY starts with sk-ant-oat.
- config._resolve takes the OAuth branch ahead of router/LiteLLM,
  strips any 'litellm/'/'cc/'/'cx/' prefix from DEFAULT_MODEL, and
  hands a ClaudeOAuthModel instance to each agent.
- Hosted tools (WebSearchTool etc.) are also stripped in OAuth mode
  via filter_hosted_tools, since Anthropic does not run OpenAI's
  server-side tools.
- requirements.txt picks up anthropic>=0.40.0.
- .env.example documents the OAuth path with model ids and token
  generation instructions (claude setup-token).

Translation layer (in claude_oauth_model.py):
- OpenAI Responses-API input items → Anthropic messages with
  text/tool_use/tool_result blocks.
- OpenAI Tool/Handoff schemas → Anthropic tools (name, description,
  input_schema).
- Anthropic content blocks → OpenAI Responses-style output items
  (message + function_call) so the rest of agency-swarm consumes
  the response unchanged.

Streaming is best-effort: stream_response delegates to get_response.
Real per-token streaming requires emitting the right
TResponseStreamEvent variants and is left for a future iteration —
the TUI still renders the final assistant message correctly.

Untested in CI; needs iteration on first run against a real OAuth
token. Existing router and OpenAI paths are unchanged.
… crash

Reported: "'async for' requires an object with __aiter__ method, got
coroutine" when the TUI consumed a ClaudeOAuthModel response.

Cause: stream_response was 'async def' that returned a generator
object, so calling it produced a coroutine, not an iterator. The
SDK's `async for event in stream_response(...)` then failed.

Fix:
- stream_response is now a regular def that returns the async
  generator method `_stream_iter` un-awaited. The SDK iterates it
  directly.
- _stream_iter drives anthropic.messages.stream() and emits
  ResponseTextDeltaEvent for each Anthropic text_delta plus a final
  ResponseCompletedEvent carrying the assembled ModelResponse. We use
  Pydantic's model_construct() so we don't have to populate every
  field of Response/ResponseCompletedEvent.

This restores live text rendering in the TUI for OAuth mode.
Tool-call streaming is intentionally minimal — the final completed
event carries the full output items list, which the SDK uses for
function-call dispatch.
Investigated openswarm-ai/openswarm to see how they let users run
on a Claude subscription without dealing with OAuth tokens. Their
backend (apps/settings/credentials.py:get_anthropic_client) just
constructs the Anthropic Python SDK with:

    anthropic.AsyncAnthropic(
        api_key="9router",
        base_url="http://localhost:20128",
    )

The router does the OAuth dance internally; the SDK sees a normal
Anthropic Messages endpoint. No Bearer header, no oauth-2025-04-20
beta, no claude setup-token needed.

Refactor:
- ClaudeOAuthModel → ClaudeRouterModel (alias kept for compat).
  Constructor takes ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY (default
  literal "9router") and points the anthropic SDK at the router.
  Drops auth_token, drops the anthropic-beta header, drops the
  Claude-Code system-prompt preamble.
- config.is_oauth_mode now triggers on ANTHROPIC_BASE_URL (the
  canonical anthropic SDK env var) — keeps CLAUDE_CODE_OAUTH_TOKEN
  and sk-ant-oat... detection as legacy aliases.
- config._resolve passes the model id through unchanged so the router
  can route by its own prefix (cc/, cx/, gc/).
- .env.example documents the simpler setup:
      ANTHROPIC_BASE_URL=http://localhost:20128
      ANTHROPIC_API_KEY=9router
      DEFAULT_MODEL=cc/claude-opus-4-7

This should resolve the auth failures the previous OAuth path hit.
Replaces the Anthropic-SDK-with-OAuth-tokens approach (which kept
hitting auth issues) with the same pattern openswarm-ai uses:
spawn the local `claude` CLI via claude_agent_sdk and inherit its
stored subscription credentials.

claude_oauth_model.py rewritten as ClaudeAgentSDKModel:
- get_response/stream_response build an in-process SDK MCP server
  exposing the agent's tools as MCP tools, then call
  claude_agent_sdk.query() with that server registered.
- Tool calls execute INSIDE one query() invocation — Claude calls
  our wrapped MCP tool, the SDK runs our wrapper which dispatches
  back to the original agency-swarm Tool.on_invoke_tool, the result
  is returned to Claude, the loop continues until Claude emits a
  final assistant message.
- Built-in Claude Code tools (Bash, WebSearch, Edit, ...) are
  disabled by default for behavior parity. CLAUDE_USE_BUILTIN_TOOLS=1
  re-enables them.
- Streaming yields ResponseTextDeltaEvent per text chunk and a final
  ResponseCompletedEvent — same wire shape used elsewhere in
  agency-swarm.
- Past tool calls in the input history are flattened to text;
  Claude can't replay tool_use blocks it didn't generate this run.

config.is_oauth_mode now also auto-detects: if claude-agent-sdk is
installed AND DEFAULT_MODEL looks like a Claude id AND no other
backend env is set, route through the SDK. Existing
ANTHROPIC_BASE_URL / sk-ant-oat / CLAUDE_CODE_OAUTH_TOKEN paths
still trigger it. CLAUDE_USE_AGENT_SDK=1 is an explicit override.

requirements.txt picks up claude-agent-sdk>=0.1.0.
.env.example documents the SDK-CLI workflow as the recommended
subscription path; remote ANTHROPIC_BASE_URL stays as an alternative.

Backwards-compat: ClaudeOAuthModel and ClaudeRouterModel are kept as
aliases for ClaudeAgentSDKModel.
A focused agency-swarm setup tailored for YouTube content workflows,
backed by ClaudeAgentSDKModel for Claude subscription auth. v1 scope
is intentionally narrow: idea generation and scripting only — no
thumbnails, video production, or scheduling yet.

Agents (each ~30 lines):
- Orchestrator: routes the request, holds the user-facing voice,
  assembles deliverables. Never produces them itself.
- Researcher: gets Claude Code's built-in WebSearch + WebFetch
  (use_builtin_tools=True passed to its model adapter). Returns a
  structured research brief — top performers, trending angles,
  audience pain points, sources.
- Ideator: turnsthe brief into 8 ranked ideas with hooks, target
  viewers, and brief-grounded "why this works" citations. Toolless.
- Scripter: takes one idea and writes a complete ready-to-record
  script (hook, intro, body with B-roll cues, CTA, outro). One
  script per call. Toolless.

Layout:
- youtube_swarm/instructions/*.md  prompts (orchestrator/researcher/
  ideator/scripter), each with tight format/voice rules so v1 output
  is deterministic.
- youtube_swarm/agents.py          agent factories.
- youtube_swarm/__init__.py        re-exports.
- youtube_swarm.py                 entry point: `python youtube_swarm.py`
  builds the Agency with orchestrator-as-entry communication flows
  and launches the agency-swarm TUI.

Per-agent toolset switching:
- ClaudeAgentSDKModel.__init__ now accepts use_builtin_tools so the
  Researcher can enable built-in tools (WebSearch/WebFetch) while
  the others stay toolless. Default behaviour for the rest of the
  codebase is unchanged.

Doesn't touch the existing openswarm orchestrator/swarm.py — both
entry points coexist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants