Add multi-provider LLM support and two-tier model configuration by goodb · Pull Request #280 · snap-stanford/Biomni

goodb · 2026-02-11T22:33:39Z

Summary

This PR adds support for multiple LLM providers (OpenAI, Anthropic, Gemini, Groq, Bedrock) with intelligent defaults and a two-tier model configuration for cost optimization.

Key Changes

Multi-provider LLM support: Automatically detects available API keys and selects appropriate models
Two-tier model configuration:
- llm: Primary model for agent reasoning (e.g., gpt-4o, claude-sonnet-4-5)
- llm_lite: Lightweight model for simple tasks like parsing and classification (e.g., gpt-4o-mini, claude-haiku-3-5)
GPT compatibility improvements: Enhanced prompting and parsing for OpenAI models
Bug fixes: JSON brace escaping in GEO queries, UTF-8 decode errors in bash output
Test organization: Moved test files to tests/ directory with proper structure

Configuration

Users can configure via environment variables in .env:

BIOMNI_LLM=gpt-4o              # Primary reasoning model
BIOMNI_LLM_LITE=gpt-4o-mini    # Lightweight model for simple tasks

Files Changed

biomni/config.py - Added llm_lite configuration
biomni/llm.py - Added DEFAULT_MODELS, DEFAULT_MODELS_LITE, smart model selection
biomni/agent/a1.py - GPT compatibility improvements, MCP server support
biomni/tool/database.py - Use llm_lite for query parsing, fixed brace escaping
biomni/tool/genomics.py - Use llm_lite for cell type annotation
biomni/tool/literature.py - Added provider-agnostic advanced_web_search() fallback
biomni/utils.py - Use configurable LLM, fixed UTF-8 errors
tests/ - Organized test files with proper structure

Test plan

Verified multi-provider detection works correctly
Tested GEO query integration with OpenAI models
Verified environment variable overrides work
All existing functionality preserved

🤖 Generated with Claude Code

- Add configurable LLM selection based on available API keys - New _get_default_model() detects ANTHROPIC/OPENAI/GEMINI/GROQ/BEDROCK keys - DEFAULT_MODELS dict defines preferred model for each provider - Fix config.llm_model -> config.llm bug - Improve GPT model compatibility in agent - Add explicit XML tag format instructions for GPT models - Add task completion verification to prevent premature conclusions - Add fallback parsing for responses without proper tags - Add persistence indicators to keep agent working on multi-step tasks - Fix query_geo JSON brace escaping bug - Escape braces in format string examples ({} -> {{}}) - Add provider-agnostic web search fallback - New advanced_web_search() works with any LLM provider - advanced_web_search_claude() falls back gracefully when Claude unavailable - Add API request timeouts (300s) for stability - Fix UTF-8 decode errors in bash script output - Add GEO query integration tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add llm_lite config for lightweight models used in simple tasks - config.llm: Primary model for agent reasoning (claude-sonnet-4-5, gpt-4o) - config.llm_lite: Lightweight model for parsing/classification (claude-haiku-3-5, gpt-4o-mini) - Add BIOMNI_LLM_LITE env var override support - Add DEFAULT_MODELS_LITE dict and _get_default_model_lite() function - Anthropic: claude-haiku-3-5 - OpenAI: gpt-4o-mini - Gemini: gemini-1.5-flash - Groq: llama-3.1-8b-instant - Bedrock: anthropic.claude-3-haiku-20240307-v1:0 - Update helper functions to use llm_lite: - _query_llm_for_api() in database.py (query parsing) - annotate_celltype_scRNA() in genomics.py (classification) - write_python_code() in utils.py (code generation) - Update tests to configure both llm and llm_lite Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Move test_geo_query.py, test_agent_geo.py, test_meta_analysis.py to tests/ - Add tests/__init__.py to make it a proper Python package - Update test files to: - Use config from .env instead of hardcoding models - Add sys.path setup for proper imports - Display both llm and llm_lite config values Test files: - test_geo_query.py: Direct GEO query function tests - test_agent_geo.py: Agent with simple GEO task - test_meta_analysis.py: Agent with complex multi-step task - test_geo_query_integration.py: Pytest integration tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

- Add download_geo function to download and parse GEO data (GSE/GSM) - Use HTTPS instead of FTP for reliable downloads from NCBI - Add GEOparse to library dictionaries (env_desc.py, env_desc_cm.py) - Add GEOparse as optional dependency in pyproject.toml - Add workspace config setting for output directory (BIOMNI_WORKSPACE) - Tool automatically extracts expression matrix and sample metadata Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…db-wobd

for more information, see https://pre-commit.ci

- Add download_gpl_annotation function to download platform annotation files with probe-to-gene mappings from GEO - Add map_expression_to_genes function to map probe-level expression data to gene-level using GPL annotations (supports mean/median/max/sum aggregation) - Add normalize_llm_content helper in utils.py to handle OpenAI Responses API content format (list of content blocks vs string) - Update tool descriptions to make new GPL tools discoverable by agent - Update retriever, genomics, and env_collection to use normalize_llm_content These tools enable cross-platform GEO meta-analysis by mapping different platform probe IDs to common gene identifiers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

- Anthropic: claude-haiku-3-5 → claude-haiku-4-5 (3-5 EOL Feb 19 2026) - OpenAI: gpt-4o-mini → gpt-5-mini (4o deprecated) - Bedrock: claude-3-haiku → claude-3-5-haiku Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When llm_lite is not explicitly set, automatically select a lite model that matches the main model's provider (e.g., gpt-5 → gpt-5-mini, claude-sonnet → claude-haiku). Previously llm_lite always defaulted to claude-haiku regardless of the main model, causing cross-provider API calls when using OpenAI/Gemini models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

OpenAI models (gpt-5, gpt-4o, etc.) now use with_structured_output() with a Pydantic schema (AgentResponse) instead of fragile XML tag parsing. Claude models continue using XML tags unchanged. Key changes in a1.py: - Add AgentResponse Pydantic model (reasoning/action/content fields) - Add _is_openai_model property for provider detection - Split generate() into structured-output path (OpenAI) and XML path (Claude) - Split system prompt format instructions by provider - Strip markdown code fences from execute content (defense-in-depth in both generate() and execute()) - Detect duplicate code execution to prevent gpt-5 re-run loops - Sync default_config.llm_lite when constructor overrides to different provider - Sanitize MCP server names to valid Python identifiers (okn-wobd → okn_wobd) - Remove ~75 lines of fragile OpenAI heuristics (still_working_indicators, final_answer_indicators, code block detection fallbacks) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- test_openai_structured_output.py: 28 unit tests + 6 live API tests covering llm_lite inference, AgentResponse schema, structured output generate/fallback paths, system prompt branching - test_openai_e2e.py: 6 live end-to-end tests exercising the full ReAct loop with gpt-5-mini, gpt-5, retriever, and MCP integration - conftest.py: adds --live flag for tests requiring API keys - mcp_config.yaml: OKN-WOBD MCP server config for local use - launch_biomni.py: convenience launcher with MCP + Gradio Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously, each MCP tool call spawned a new server process via stdio_client. This broke stateful patterns like background jobs (submit job → poll for result) because the polling call hit a new process with no knowledge of the job. Now each MCP server gets a single long-lived process started during add_mcp(), with a persistent ClientSession maintained in a background thread. All tool calls for that server reuse the same session, so in-memory state (job queues, caches) persists across calls. This matches how Claude Code handles MCP servers and fixes the "No job found with id" error when polling OKN-WOBD background jobs. Sessions are cleaned up via atexit handler on process exit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Return content.text instead of content.json() from MCP wrapper to avoid passing the full Pydantic envelope as tool output. Auto-parse JSON responses so agent code gets dicts directly. - Add guard rejecting 'solution' action when no code has been executed, preventing gpt-5 from hallucinating results without running any tools. - Update tests for premature solution guard behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

Moves heavy optional dependencies (esm, torch) from top-level imports to local imports inside the functions that need them, so importing other functions like gene_set_enrichment_analysis doesn't fail when these packages aren't installed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

goodb · 2026-02-17T17:30:03Z

I worked on these changes because: 1) using a brand new anthropic api account, I was blocked almost immediately by tier 1 rate limits. 2) switching to an existing (warmed up) openai API account resulted in a variety of failures based on code expecting certain anthropic behaviors - most notably consistency with regard to the use of xml tags in the midst of text responses.

There is also work in there to make things work with a new MCP that requires polling for long-running jobs.

goodb and others added 20 commits February 6, 2026 14:36

set up for use of proto_okn_mcp services

e6602ee

[pre-commit.ci] auto fixes from pre-commit.com hooks

2d56107

for more information, see https://pre-commit.ci

Merge branch 'goodb-wobd' of https://github.com/goodb/Biomni into goo…

0fdc1ac

…db-wobd

[pre-commit.ci] auto fixes from pre-commit.com hooks

5a1f29f

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

50709b5

for more information, see https://pre-commit.ci

Update deprecated lite model defaults to current versions

48e25cf

- Anthropic: claude-haiku-3-5 → claude-haiku-4-5 (3-5 EOL Feb 19 2026) - OpenAI: gpt-4o-mini → gpt-5-mini (4o deprecated) - Bedrock: claude-3-haiku → claude-3-5-haiku Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

5faadb9

for more information, see https://pre-commit.ci

Add analysis data and test outputs to .gitignore

f74af63

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix ruff B017: use ValueError instead of blind Exception

0ccb7fd

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-provider LLM support and two-tier model configuration#280

Add multi-provider LLM support and two-tier model configuration#280
goodb wants to merge 20 commits intosnap-stanford:mainfrom
goodb:goodb-wobd

goodb commented Feb 11, 2026

Uh oh!

goodb commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

goodb commented Feb 11, 2026

Summary

Key Changes

Configuration

Files Changed

Test plan

Uh oh!

goodb commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments