Add multi-provider LLM support and two-tier model configuration#280
Open
goodb wants to merge 20 commits intosnap-stanford:mainfrom
Open
Add multi-provider LLM support and two-tier model configuration#280goodb wants to merge 20 commits intosnap-stanford:mainfrom
goodb wants to merge 20 commits intosnap-stanford:mainfrom
Conversation
- Add configurable LLM selection based on available API keys
- New _get_default_model() detects ANTHROPIC/OPENAI/GEMINI/GROQ/BEDROCK keys
- DEFAULT_MODELS dict defines preferred model for each provider
- Fix config.llm_model -> config.llm bug
- Improve GPT model compatibility in agent
- Add explicit XML tag format instructions for GPT models
- Add task completion verification to prevent premature conclusions
- Add fallback parsing for responses without proper tags
- Add persistence indicators to keep agent working on multi-step tasks
- Fix query_geo JSON brace escaping bug
- Escape braces in format string examples ({} -> {{}})
- Add provider-agnostic web search fallback
- New advanced_web_search() works with any LLM provider
- advanced_web_search_claude() falls back gracefully when Claude unavailable
- Add API request timeouts (300s) for stability
- Fix UTF-8 decode errors in bash script output
- Add GEO query integration tests
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add llm_lite config for lightweight models used in simple tasks - config.llm: Primary model for agent reasoning (claude-sonnet-4-5, gpt-4o) - config.llm_lite: Lightweight model for parsing/classification (claude-haiku-3-5, gpt-4o-mini) - Add BIOMNI_LLM_LITE env var override support - Add DEFAULT_MODELS_LITE dict and _get_default_model_lite() function - Anthropic: claude-haiku-3-5 - OpenAI: gpt-4o-mini - Gemini: gemini-1.5-flash - Groq: llama-3.1-8b-instant - Bedrock: anthropic.claude-3-haiku-20240307-v1:0 - Update helper functions to use llm_lite: - _query_llm_for_api() in database.py (query parsing) - annotate_celltype_scRNA() in genomics.py (classification) - write_python_code() in utils.py (code generation) - Update tests to configure both llm and llm_lite Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move test_geo_query.py, test_agent_geo.py, test_meta_analysis.py to tests/ - Add tests/__init__.py to make it a proper Python package - Update test files to: - Use config from .env instead of hardcoding models - Add sys.path setup for proper imports - Display both llm and llm_lite config values Test files: - test_geo_query.py: Direct GEO query function tests - test_agent_geo.py: Agent with simple GEO task - test_meta_analysis.py: Agent with complex multi-step task - test_geo_query_integration.py: Pytest integration tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Add download_geo function to download and parse GEO data (GSE/GSM) - Use HTTPS instead of FTP for reliable downloads from NCBI - Add GEOparse to library dictionaries (env_desc.py, env_desc_cm.py) - Add GEOparse as optional dependency in pyproject.toml - Add workspace config setting for output directory (BIOMNI_WORKSPACE) - Tool automatically extracts expression matrix and sample metadata Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Add download_gpl_annotation function to download platform annotation files with probe-to-gene mappings from GEO - Add map_expression_to_genes function to map probe-level expression data to gene-level using GPL annotations (supports mean/median/max/sum aggregation) - Add normalize_llm_content helper in utils.py to handle OpenAI Responses API content format (list of content blocks vs string) - Update tool descriptions to make new GPL tools discoverable by agent - Update retriever, genomics, and env_collection to use normalize_llm_content These tools enable cross-platform GEO meta-analysis by mapping different platform probe IDs to common gene identifiers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Anthropic: claude-haiku-3-5 → claude-haiku-4-5 (3-5 EOL Feb 19 2026) - OpenAI: gpt-4o-mini → gpt-5-mini (4o deprecated) - Bedrock: claude-3-haiku → claude-3-5-haiku Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When llm_lite is not explicitly set, automatically select a lite model that matches the main model's provider (e.g., gpt-5 → gpt-5-mini, claude-sonnet → claude-haiku). Previously llm_lite always defaulted to claude-haiku regardless of the main model, causing cross-provider API calls when using OpenAI/Gemini models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenAI models (gpt-5, gpt-4o, etc.) now use with_structured_output() with a Pydantic schema (AgentResponse) instead of fragile XML tag parsing. Claude models continue using XML tags unchanged. Key changes in a1.py: - Add AgentResponse Pydantic model (reasoning/action/content fields) - Add _is_openai_model property for provider detection - Split generate() into structured-output path (OpenAI) and XML path (Claude) - Split system prompt format instructions by provider - Strip markdown code fences from execute content (defense-in-depth in both generate() and execute()) - Detect duplicate code execution to prevent gpt-5 re-run loops - Sync default_config.llm_lite when constructor overrides to different provider - Sanitize MCP server names to valid Python identifiers (okn-wobd → okn_wobd) - Remove ~75 lines of fragile OpenAI heuristics (still_working_indicators, final_answer_indicators, code block detection fallbacks) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_openai_structured_output.py: 28 unit tests + 6 live API tests covering llm_lite inference, AgentResponse schema, structured output generate/fallback paths, system prompt branching - test_openai_e2e.py: 6 live end-to-end tests exercising the full ReAct loop with gpt-5-mini, gpt-5, retriever, and MCP integration - conftest.py: adds --live flag for tests requiring API keys - mcp_config.yaml: OKN-WOBD MCP server config for local use - launch_biomni.py: convenience launcher with MCP + Gradio Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, each MCP tool call spawned a new server process via stdio_client. This broke stateful patterns like background jobs (submit job → poll for result) because the polling call hit a new process with no knowledge of the job. Now each MCP server gets a single long-lived process started during add_mcp(), with a persistent ClientSession maintained in a background thread. All tool calls for that server reuse the same session, so in-memory state (job queues, caches) persists across calls. This matches how Claude Code handles MCP servers and fixes the "No job found with id" error when polling OKN-WOBD background jobs. Sessions are cleaned up via atexit handler on process exit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Return content.text instead of content.json() from MCP wrapper to avoid passing the full Pydantic envelope as tool output. Auto-parse JSON responses so agent code gets dicts directly. - Add guard rejecting 'solution' action when no code has been executed, preventing gpt-5 from hallucinating results without running any tools. - Update tests for premature solution guard behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Moves heavy optional dependencies (esm, torch) from top-level imports to local imports inside the functions that need them, so importing other functions like gene_set_enrichment_analysis doesn't fail when these packages aren't installed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
I worked on these changes because: 1) using a brand new anthropic api account, I was blocked almost immediately by tier 1 rate limits. 2) switching to an existing (warmed up) openai API account resulted in a variety of failures based on code expecting certain anthropic behaviors - most notably consistency with regard to the use of xml tags in the midst of text responses. There is also work in there to make things work with a new MCP that requires polling for long-running jobs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds support for multiple LLM providers (OpenAI, Anthropic, Gemini, Groq, Bedrock) with intelligent defaults and a two-tier model configuration for cost optimization.
Key Changes
llm: Primary model for agent reasoning (e.g., gpt-4o, claude-sonnet-4-5)llm_lite: Lightweight model for simple tasks like parsing and classification (e.g., gpt-4o-mini, claude-haiku-3-5)tests/directory with proper structureConfiguration
Users can configure via environment variables in
.env:Files Changed
biomni/config.py- Addedllm_liteconfigurationbiomni/llm.py- AddedDEFAULT_MODELS,DEFAULT_MODELS_LITE, smart model selectionbiomni/agent/a1.py- GPT compatibility improvements, MCP server supportbiomni/tool/database.py- Usellm_litefor query parsing, fixed brace escapingbiomni/tool/genomics.py- Usellm_litefor cell type annotationbiomni/tool/literature.py- Added provider-agnosticadvanced_web_search()fallbackbiomni/utils.py- Use configurable LLM, fixed UTF-8 errorstests/- Organized test files with proper structureTest plan
🤖 Generated with Claude Code