fix(agent-server/sdk): subagents missing in remote workspace#2275
fix(agent-server/sdk): subagents missing in remote workspace#2275VascoSch92 wants to merge 13 commits intomainfrom
Conversation
API breakage checks (Griffe)Result: Failed Log excerpt (first 1000 characters) |
Agent server REST API breakage checks (OpenAPI)Result: Passed |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste - Clean solution that mirrors existing patterns and simplifies the API.
openhands-agent-server/openhands/agent_server/conversation_service.py
Outdated
Show resolved
Hide resolved
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste - Clean solution that mirrors existing patterns and simplifies the API.
KEY INSIGHT: Storing the full AgentDefinition instead of just description is better data structure design - gives you access to all agent metadata without additional lookups. The API simplification (register_agent_if_absent now takes just factory + definition) eliminates parameter duplication since the definition already contains name and description.
VERDICT: ✅ Core logic is solid, solves a real problem pragmatically by reusing the tool_module_qualnames pattern for subagents.
There was a problem hiding this comment.
Can we add a new test to tests/cross/test_remote_conversation_live_server.py that spawns up a REAL agent server with the real sub-agent requests, and make sure it works?
There was a problem hiding this comment.
I found that register_agent() was producing hollow AgentDefinition stubs (just name + description, no tools / system_prompt). These would leak to the remote server via get_registered_agent_definitions() and get recreated as useless agents (no tools or/and attached system prompt).
I didn't see that before because the machinery to load subagents is using register_agent_if_absent() which needs an AgentDefinition (so from this side everything is good).
I solved this by adding a AgentDefinition.from_agent() classmethod that introspects a live Agent instance and extracts tools, system_prompt, and model into a complete definition. register_agent() now calls the factory with a placeholder LLM and delegates to AgentDefinition.from_agent().
Having this as a classmethod on AgentDefinition is much better from a maintainability perspective — the logic for building a definition from an agent lives right next to the schema it populates (alongside
load() for file-based definitions), making it easier to keep in sync when fields change.
Why this solution? This keeps register_agent(name, factory_func, description) user-friendly — no signature change needed — while producing complete definitions that survive the HTTP roundtrip to the
server.
Everything is tested, and also cross tested.
| definition = AgentDefinition.from_agent( | ||
| agent, name=name, description=description | ||
| ) | ||
| # If the model was our placeholder, the factory didn't set one explicitly |
There was a problem hiding this comment.
Why is introspect necessary? Could it happen differently?
There was a problem hiding this comment.
I was thinking about that. The point is, it could but I will have to change the signature of register_agent, i.e., adding as an arg the AgentDefinition.
The contro on this choiche are:
- we change the API
- the API for an sdk-user to register an agent become very complex
In this way we can reconstruct the AgentDefinition from the factory.
This time I went for giving the user a better experience :-)
There was a problem hiding this comment.
@OpenHands understand the question and answer, investigate all you need, including PR description and discussion and PR code. Why exactly do we need two different behaviors, split via introspect, and what are 2-3 alternatives? Note that you should also take a look at LLM Profiles in the SDK codebase
There was a problem hiding this comment.
I'm on it! enyst can track my progress at all-hands.dev
There was a problem hiding this comment.
(HUMAN: I'll just replace here the answers)
Alternatives (2–3):
- Make declarative registration explicit: add a
register_agent_definition(...)/ require passing anAgentDefinitionfor remote-forwardable agents (no introspection). Split local-only vs remote-forwardable APIs: keepregister_agent()local, add a separate remote-safe registration function that only accepts declarative definitions.- Extend
AgentDefinitionto support LLM Profiles / full LLM config: e.g.,llm_profileor embedded LLM config—more faithful than introspection, but introduces distribution/security concerns.
LLM Profiles note
I confirmed the SDK’s LLMProfileStore persists and loads full LLM configurations, but AgentDefinition currently only carries a model string (“inherit” vs override) and does not represent “use profile X”. That makes introspection inherently limited for profile-driven factories, and helps explain why a fallback path is necessary.
HUMAN:
isn't the
modelactually a profile name?
No—in the subagent system, AgentDefinition.model is treated as an LLM model identifier, not an LLM profile name.
You can see that in two places:
-
In the schema itself (
openhands-sdk/openhands/sdk/subagent/schema.py):model: str = Field(default="inherit", description="Model to use ('inherit' uses parent model)")
-
And, more importantly, in how it’s applied when building a subagent (
agent_definition_to_factoryinsubagent/registry.py):if agent_def.model and agent_def.model != "inherit": llm = llm.model_copy(update={"model": agent_def.model})
That code only changes LLM.model (a string passed through to LiteLLM). It does not load anything from disk or resolve a “profile”.
Where profiles are used in the SDK
LLM Profiles are a separate mechanism:
-
Profiles are persisted by
LLMProfileStore(openhands.sdk.llm.llm_profile_store.LLMProfileStore) -
They’re referenced by name in
FallbackStrategy.fallback_llms, and loaded like:fb = self._profile_store.load(name) # name is a profile name like "gpt4.json"
So a “profile name” is something like "gpt4" / "gpt4.json" passed to LLMProfileStore.load(...), not something that LLM.model_copy(update={"model": ...}) would understand.
Implication for the introspection logic
If a user factory internally loads a profile (e.g., LLMProfileStore().load("foo")), introspection may extract the resulting .model string, but remote instantiation still won’t reproduce the full profile config (base_url, provider-specific auth, etc.) because the forwarded AgentDefinition only carries model, not “profile”.
If we wanted model: in agent markdown / AgentDefinition to mean “profile”, we’d need a deliberate design change (e.g., add llm_profile field, or support a profile:xyz syntax and resolve it on the server).
There was a problem hiding this comment.
I’m not sure, this one makes sense to me 🤔
Extend AgentDefinition to support LLM Profiles / full LLM config: e.g., llm_profile or embedded LLM config
Remote has to send full LLM profile today for main agent. We could
- send full profile for sub-agents too
- send profile references for sub-agents / agents
Seems the same touched in this discussion? #2186 (comment)
The latter implies a mechanism to send/retrieve profiles, which we need anyway outside sub-agents, simply for profile management from a remote client I think, no?
There was a problem hiding this comment.
This approach carries the problem I mentioned above and because of that I want for introspection.
Precisely: Dropping introspection means register_agent(name, factory_func, description) would produce definitions without tools and system_prompt, which we need on the remote side to build subagents.
If we really want to go for this direction (i.e. giving the LLM profile to the AgentDefinition) there are two options:
Option A — extend the signature with optional fields:
def register_agent(
name, factory_func, description,
tools: list[str] | None = None,
system_prompt: str | None = None,
)backwards-compatible. Users already know these values since they wrote the factory.
Option B — accept an optional AgentDefinition:
def register_agent(
name, factory_func, description,
definition: AgentDefinition | None = None,
)More future-proof (new fields automatically supported), but heavier UX.
There was a problem hiding this comment.
@enyst I was looking at other features for file-based agents. And I think having an introspect method from the factory is actually a very cool solution for the user.
The function register_agent stays super simple and user friendly and we take care of the rest (anyway I don't think we will add a lot of features after the one discussed in the issue #2186 )
There was a problem hiding this comment.
What about
Option C
def register_agent(
name,
factory_func,
definition: AgentDefinition | str,
)And then we construct the AgentDefinition from the string if necessary. No breaking changes but support for AgentDefinition.
There was a problem hiding this comment.
I deeply apologize but today I don't have much bandwidth. I just want to note quickly, IMHO introspection is simply a smell, it smells like maybe we can dig into to see if something's not quite right. I'm not at war with it 🙏 😅
I won't lie, I am kinda at war with this: "the forwarded AgentDefinition carries model, not “profile”. --- I don't see much good from carrying model, except in special cases when we want it to, e.g., have the same profile id with model name, to satisfy Anthropic.
I don't really understand why, after the other PR too, we couldn't just say ".model" is a profile id; we add a small API to CRUD profiles for remote-local sync; maybe in the future deprecate full LLM config sending over in any other way except profiles API - but maybe I'm dense sorry, I might miss details
|
Hey @enyst, I’ve made some changes. I apologize for grouping these structural changes with a bug fix, but they are tightly coupled and this was the right time to address them. My motivation is that since the subagent module hasn't been released yet, we have the freedom to reshape the API before it stabilizes. These changes make AgentDefinition the single source of truth: it now holds the full agent configuration and can produce its own factory. This makes it straightforward to add new features to file-based agents (like working_dir) by simply adding a field and extraction logic in one place, rather than threading values through multiple functions. @neubig, the solution we discussed previously wasn't entirely complete, as it left us with potential data leakage. The current approach offers the best balance of a clean API and maintenability. Summary of the changes
Breaking changes
Test plan
|
|
@OpenHands Use Then do a /codereview-roasted on it, and publish the review on github api (you do not comment, you review, which is allowed) Introduce yourself in the review so that readers know who is speaking. Before you publish review, take a deep breath and look at all with fresh eyes: we want a very smooth experience for remote conversations, where it feels almost like local conversations; so maybe that implies compatibility to some high extent. Also, I'd like an answer to how difficult it will be, on this PR's code design, to use LLM Profiles by preference (see comment #2275 (comment)) Then include it all in your review. |
|
I'm on it! enyst can track my progress at all-hands.dev |
There was a problem hiding this comment.
Hi, I'm OpenHands (automated reviewer) taking a fresh look.
Taste Rating: 🔴 Needs improvement
[CRITICAL ISSUES]
- schema.py, lines 215-260 Model override regression:
from_factory_func()creates a placeholderLLM(model="__introspect__", api_key="n/a")and then persistsagent.llm.modelas the definition model. Into_factory()(Lines 258–260), any model that is not "inherit" overrides the parent LLM. Net result: anyregister_agent()factory that doesn’t explicitly override the model now forces subagents to run on the fake "introspect" model (local and remote). That’s a silent behavior break and destroys local/remote parity. Treat the sentinel as "inherit" or avoid setting a fake model in the first place. - schema.py, lines 217-314 Lossy introspection breaks user-defined tools/skills and remote parity: introspection collapses
Tool/Skillobjects into names, thento_factory()re-resolves them from registries and filesystem. This means: (1) custom programmatic tools/skills that aren’t registered or on disk now fail, (2) tool/skill registration order becomes a new constraint because resolution is eager. That’s a compatibility regression vs. the previous "factory is source of truth" behavior and risks a less smooth remote experience. Either preserve tool/skill objects, or offer an explicitAgentDefinitionregistration path (and keep the original factory for local execution) to avoid lossy reconstruction. - sdk/init.py lines 50-125 and subagent/init.py lines 17-31 Public API removal without deprecation:
agent_definition_to_factorydisappears from the public surface and the function itself is gone. This is a breaking change (already flagged by the API breakage check) and needs a deprecation path or an alias, plus version bump if removal is intended.
[IMPROVEMENT OPPORTUNITIES]
- schema.py lines 98-132 and schema.py lines 238-280 LLM Profiles by preference: this PR bakes in
model: strandto_factory()only callsllm.model_copy(update={"model": ...}). There is no representation of a profile name or a profile payload, and introspection cannot recover profile names from a factory. So supporting "profile preference" with this design is moderately hard: you’ll need a new schema field (e.g.,llm_profile), plus resolution logic viaLLMProfileStore(or a profile-sync API) on the server. That also implies updating remote payloads and probably the registration API to accept explicitAgentDefinition/profile info, since the factory-only path is fundamentally lossy.
VERDICT: ❌ Needs rework
KEY INSIGHT: Introspection-based registration is inherently lossy; it breaks parity between local and remote execution and makes profile-first LLM selection significantly harder to support.
|
Final summary: I used |
There was a problem hiding this comment.
Re-read with fresh eyes: this is more major than minor.
Section-by-section reassessment:
- Model override regression: This is a major behavior break, not a nit. It silently forces subagents onto the fake "introspect" model unless factories explicitly override. That changes both local and remote runs and undermines the "remote feels like local" goal.
- Lossy introspection + eager resolution: Also major. It changes the contract from "factory defines tools/skills" to "registry/filesystem must already contain them", and makes registration order a new failure mode. That is a compatibility regression for any programmatic tool/skill use.
- (HUMAN: we know --- Public API removal: Breaking change, but relatively smaller than the runtime regressions. Still requires a deprecation path or version bump.)
- (HUMAN: the agent kinda misunderstood, it wasn't in scope --- LLM profile preference: Not a bug, but a structural limitation. With this design, adding profile-first behavior is moderately hard and likely requires schema/API work, so it’s a significant design headwind.)
Overall: the issues are major and need rework before merge to preserve local/remote parity and avoid silent behavior changes.
There was a problem hiding this comment.
Hi, I'm OpenHands (automated reviewer) following up after a deeper verification.
Answering the question "custom tools can be registered remotely; is that broken?":
Short answer: remote custom tools themselves are not broken, as long as they are registered on the client before conversation creation. The break is about when resolution happens and what information survives the new introspection path.
Evidence & reasoning:
-
Remote tool registration still works via module qualnames.
- Client sends tool module qualnames in RemoteConversation: remote_conversation.py lines 636-660
- Server imports those modules to register tools: conversation_service.py lines 220-244
So if your custom tool module is imported on the client (and thus in tool_module_qualnames), it will be registered on the server.
-
The new failure mode is eager resolution + lossy introspection.
register_agent()now builds an AgentDefinition via introspection and immediately callsdefinition.to_factory():
registry.py lines 70-99to_factory()calls_resolve_tools()right away and errors if the tool name isn’t already in the registry:
schema.py lines 238-314
Implication: tool registration order is now a hard constraint. If someone calls
register_agent()before importing/registering their custom tool module, it will now fail. Previously, the factory could succeed and the tool would resolve later at agent init time.
(HUMAN: this sounds OK to me)
-
Introspection drops tool params and programmatic tool objects.
from_factory_func()only records tool names (t.name) and discards params:
schema.py lines 215-227
If a factory used
Tool(name="X", params={...}), those params are lost in the definition and won’t be replayed. That’s a behavior change, even if the tool is registered. -
The same lossy behavior applies to skills: programmatic Skill objects are reduced to names and reloaded from disk.
_resolve_skills()usesload_available_skillsand fails if the name isn’t on disk:
schema.py lines 282-301
So if a factory created a Skill instance directly (not from filesystem), it won’t survive the introspection roundtrip.
Conclusion:
- Custom tools registered on both client and server are still fine, assuming they are registered before
register_agent()is called. - What is broken is support for programmatic tools/skills (including tool params) and the previous flexibility around registration order. That is a real compatibility regression even if remote tool registration is still possible.
enyst
left a comment
There was a problem hiding this comment.
Hi, I'm OpenHands (automated reviewer) — I ran example-derived checks on this PR.
What I ran:
-
examples/05_skills_and_plugins/01_loading_agentskills/main.py(non-LLM portion).- It runs fine and exits early because
LLM_API_KEYisn’t set.
- It runs fine and exits early because
-
Targeted reproductions based on examples/01_standalone_sdk/25_agent_delegation.py
(the user-defined subagent section uses inline Skill objects).
Reproduction A: inline Skill objects (same pattern as create_lodging_planner)
register_agent: failed -> ValueError: Skills not found but given to agent 'lodging_planner': lodging_planning
This is the new _resolve_skills() behavior in AgentDefinition.to_factory(). It requires skills to exist on disk, so in-memory Skill objects from the example no longer work. That means the user-defined subagent section of example 25 will now fail at registration, even before any LLM calls.
Reproduction B: unregistered custom tool referenced in a subagent factory
register_agent: failed -> ValueError: Tools not registered but given to agent 'custom_tool_agent': MyCustomTool
This shows the registration order constraint: tools must be registered before register_agent() because to_factory() resolves tools eagerly. Previously, the factory could succeed and tools resolved later at agent init.
Answer to your question (“custom tools can be registered remotely; is that broken?”):
- Remote custom tools themselves are not broken when used directly in a main Agent (e.g., the remote custom tool example). The tool module qualname still gets sent and imported on the server.
- But subagent registration now breaks if the factory uses programmatic Skill objects or custom tools that aren’t registered before
register_agent()is called. That is a real regression, and example 25 demonstrates it.
So yes: the breakage is real and reproducible on this PR. It’s tied to the new eager resolution + lossy introspection in AgentDefinition.to_factory().
enyst
left a comment
There was a problem hiding this comment.
Hi, I'm OpenHands (automated reviewer) — I updated from main and ran the full example as requested.
Run command:
LLM_API_KEY=$OPENAI_API_KEY LLM_MODEL=gpt-5-nano LLM_BASE_URL= \
./.venv/bin/python examples/01_standalone_sdk/25_agent_delegation.py
Result (fails before any LLM calls):
ValueError: Skills not found but given to agent 'lodging_planner': lodging_planning
The failure happens at the user-defined subagent registration in
examples/01_standalone_sdk/25_agent_delegation.py, lines 176-185, which uses inline Skill objects. With the current PR, register_agent() calls AgentDefinition.to_factory() and _resolve_skills() immediately, so non-file-backed skills now fail at registration.
So yes — this example is currently broken on the PR branch even after updating from main.
|
HUMAN: Sorry 🙏 I tried to keep its verbosity in check... that doesn't look like the most successful attempt ever 😅 |
Problem
When using a remote workspace (agent-server), the delegate tool description shown to the LLM always says "No user-registered agents yet", even though builtin subagents (bash, explore, default) are registered on the client side.
Root cause:
register_builtins_agents()runs on the client process (eval harness / SDK consumer), populating the client's in-memory_agent_factories registry. But with remote workspaces, the agent loop runs on a separate agent-server process, which has its own empty registry. The existingtool_module_qualnamesmechanism forwards tool registrations to the server, but there was no equivalent for subagent registrations. So whenDelegateTool.create()(orTaskSetTool.create()) callsget_factory_info()on the server, it sees an empty registry.Fix
Mirror the existing tool_module_qualnames pattern for subagents:
Changes
DelegateTool/TaskSetToolon the remote side can see registered agents instead of "No user-registered agents yet".Note
I also run an eval with this branch to be sure it was working. And it seems to work fine.
Eval: https://openhands-ai.slack.com/archives/C09QGUDQVTL/p1772531782817999
Checklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:a046ed2-pythonRun
All tags pushed for this build
About Multi-Architecture Support
a046ed2-python) is a multi-arch manifest supporting both amd64 and arm64a046ed2-python-amd64) are also available if needed