Skip to content

Add multi-provider LLM support and two-tier model configuration#280

Open
goodb wants to merge 20 commits intosnap-stanford:mainfrom
goodb:goodb-wobd
Open

Add multi-provider LLM support and two-tier model configuration#280
goodb wants to merge 20 commits intosnap-stanford:mainfrom
goodb:goodb-wobd

Conversation

@goodb
Copy link

@goodb goodb commented Feb 11, 2026

Summary

This PR adds support for multiple LLM providers (OpenAI, Anthropic, Gemini, Groq, Bedrock) with intelligent defaults and a two-tier model configuration for cost optimization.

Key Changes

  • Multi-provider LLM support: Automatically detects available API keys and selects appropriate models
  • Two-tier model configuration:
    • llm: Primary model for agent reasoning (e.g., gpt-4o, claude-sonnet-4-5)
    • llm_lite: Lightweight model for simple tasks like parsing and classification (e.g., gpt-4o-mini, claude-haiku-3-5)
  • GPT compatibility improvements: Enhanced prompting and parsing for OpenAI models
  • Bug fixes: JSON brace escaping in GEO queries, UTF-8 decode errors in bash output
  • Test organization: Moved test files to tests/ directory with proper structure

Configuration

Users can configure via environment variables in .env:

BIOMNI_LLM=gpt-4o              # Primary reasoning model
BIOMNI_LLM_LITE=gpt-4o-mini    # Lightweight model for simple tasks

Files Changed

  • biomni/config.py - Added llm_lite configuration
  • biomni/llm.py - Added DEFAULT_MODELS, DEFAULT_MODELS_LITE, smart model selection
  • biomni/agent/a1.py - GPT compatibility improvements, MCP server support
  • biomni/tool/database.py - Use llm_lite for query parsing, fixed brace escaping
  • biomni/tool/genomics.py - Use llm_lite for cell type annotation
  • biomni/tool/literature.py - Added provider-agnostic advanced_web_search() fallback
  • biomni/utils.py - Use configurable LLM, fixed UTF-8 errors
  • tests/ - Organized test files with proper structure

Test plan

  • Verified multi-provider detection works correctly
  • Tested GEO query integration with OpenAI models
  • Verified environment variable overrides work
  • All existing functionality preserved

🤖 Generated with Claude Code

goodb and others added 20 commits February 6, 2026 14:36
- Add configurable LLM selection based on available API keys
  - New _get_default_model() detects ANTHROPIC/OPENAI/GEMINI/GROQ/BEDROCK keys
  - DEFAULT_MODELS dict defines preferred model for each provider
  - Fix config.llm_model -> config.llm bug

- Improve GPT model compatibility in agent
  - Add explicit XML tag format instructions for GPT models
  - Add task completion verification to prevent premature conclusions
  - Add fallback parsing for responses without proper tags
  - Add persistence indicators to keep agent working on multi-step tasks

- Fix query_geo JSON brace escaping bug
  - Escape braces in format string examples ({} -> {{}})

- Add provider-agnostic web search fallback
  - New advanced_web_search() works with any LLM provider
  - advanced_web_search_claude() falls back gracefully when Claude unavailable

- Add API request timeouts (300s) for stability

- Fix UTF-8 decode errors in bash script output

- Add GEO query integration tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add llm_lite config for lightweight models used in simple tasks
  - config.llm: Primary model for agent reasoning (claude-sonnet-4-5, gpt-4o)
  - config.llm_lite: Lightweight model for parsing/classification (claude-haiku-3-5, gpt-4o-mini)

- Add BIOMNI_LLM_LITE env var override support

- Add DEFAULT_MODELS_LITE dict and _get_default_model_lite() function
  - Anthropic: claude-haiku-3-5
  - OpenAI: gpt-4o-mini
  - Gemini: gemini-1.5-flash
  - Groq: llama-3.1-8b-instant
  - Bedrock: anthropic.claude-3-haiku-20240307-v1:0

- Update helper functions to use llm_lite:
  - _query_llm_for_api() in database.py (query parsing)
  - annotate_celltype_scRNA() in genomics.py (classification)
  - write_python_code() in utils.py (code generation)

- Update tests to configure both llm and llm_lite

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move test_geo_query.py, test_agent_geo.py, test_meta_analysis.py to tests/
- Add tests/__init__.py to make it a proper Python package
- Update test files to:
  - Use config from .env instead of hardcoding models
  - Add sys.path setup for proper imports
  - Display both llm and llm_lite config values

Test files:
- test_geo_query.py: Direct GEO query function tests
- test_agent_geo.py: Agent with simple GEO task
- test_meta_analysis.py: Agent with complex multi-step task
- test_geo_query_integration.py: Pytest integration tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add download_geo function to download and parse GEO data (GSE/GSM)
- Use HTTPS instead of FTP for reliable downloads from NCBI
- Add GEOparse to library dictionaries (env_desc.py, env_desc_cm.py)
- Add GEOparse as optional dependency in pyproject.toml
- Add workspace config setting for output directory (BIOMNI_WORKSPACE)
- Tool automatically extracts expression matrix and sample metadata

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add download_gpl_annotation function to download platform annotation files
  with probe-to-gene mappings from GEO
- Add map_expression_to_genes function to map probe-level expression data
  to gene-level using GPL annotations (supports mean/median/max/sum aggregation)
- Add normalize_llm_content helper in utils.py to handle OpenAI Responses API
  content format (list of content blocks vs string)
- Update tool descriptions to make new GPL tools discoverable by agent
- Update retriever, genomics, and env_collection to use normalize_llm_content

These tools enable cross-platform GEO meta-analysis by mapping different
platform probe IDs to common gene identifiers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Anthropic: claude-haiku-3-5 → claude-haiku-4-5 (3-5 EOL Feb 19 2026)
- OpenAI: gpt-4o-mini → gpt-5-mini (4o deprecated)
- Bedrock: claude-3-haiku → claude-3-5-haiku

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When llm_lite is not explicitly set, automatically select a lite model
that matches the main model's provider (e.g., gpt-5 → gpt-5-mini,
claude-sonnet → claude-haiku). Previously llm_lite always defaulted
to claude-haiku regardless of the main model, causing cross-provider
API calls when using OpenAI/Gemini models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenAI models (gpt-5, gpt-4o, etc.) now use with_structured_output()
with a Pydantic schema (AgentResponse) instead of fragile XML tag
parsing. Claude models continue using XML tags unchanged.

Key changes in a1.py:
- Add AgentResponse Pydantic model (reasoning/action/content fields)
- Add _is_openai_model property for provider detection
- Split generate() into structured-output path (OpenAI) and XML path (Claude)
- Split system prompt format instructions by provider
- Strip markdown code fences from execute content (defense-in-depth in
  both generate() and execute())
- Detect duplicate code execution to prevent gpt-5 re-run loops
- Sync default_config.llm_lite when constructor overrides to different provider
- Sanitize MCP server names to valid Python identifiers (okn-wobd → okn_wobd)
- Remove ~75 lines of fragile OpenAI heuristics (still_working_indicators,
  final_answer_indicators, code block detection fallbacks)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_openai_structured_output.py: 28 unit tests + 6 live API tests
  covering llm_lite inference, AgentResponse schema, structured output
  generate/fallback paths, system prompt branching
- test_openai_e2e.py: 6 live end-to-end tests exercising the full
  ReAct loop with gpt-5-mini, gpt-5, retriever, and MCP integration
- conftest.py: adds --live flag for tests requiring API keys
- mcp_config.yaml: OKN-WOBD MCP server config for local use
- launch_biomni.py: convenience launcher with MCP + Gradio

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, each MCP tool call spawned a new server process via
stdio_client. This broke stateful patterns like background jobs
(submit job → poll for result) because the polling call hit a new
process with no knowledge of the job.

Now each MCP server gets a single long-lived process started during
add_mcp(), with a persistent ClientSession maintained in a background
thread. All tool calls for that server reuse the same session, so
in-memory state (job queues, caches) persists across calls.

This matches how Claude Code handles MCP servers and fixes the
"No job found with id" error when polling OKN-WOBD background jobs.

Sessions are cleaned up via atexit handler on process exit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Return content.text instead of content.json() from MCP wrapper to
  avoid passing the full Pydantic envelope as tool output. Auto-parse
  JSON responses so agent code gets dicts directly.
- Add guard rejecting 'solution' action when no code has been executed,
  preventing gpt-5 from hallucinating results without running any tools.
- Update tests for premature solution guard behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Moves heavy optional dependencies (esm, torch) from top-level imports
to local imports inside the functions that need them, so importing
other functions like gene_set_enrichment_analysis doesn't fail when
these packages aren't installed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@goodb
Copy link
Author

goodb commented Feb 17, 2026

I worked on these changes because: 1) using a brand new anthropic api account, I was blocked almost immediately by tier 1 rate limits. 2) switching to an existing (warmed up) openai API account resulted in a variety of failures based on code expecting certain anthropic behaviors - most notably consistency with regard to the use of xml tags in the midst of text responses.

There is also work in there to make things work with a new MCP that requires polling for long-running jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments