Skip to content

fix(agent-server/sdk): subagents missing in remote workspace#2275

Closed
VascoSch92 wants to merge 13 commits intomainfrom
vasco/fix-subagent-registration-on-server
Closed

fix(agent-server/sdk): subagents missing in remote workspace#2275
VascoSch92 wants to merge 13 commits intomainfrom
vasco/fix-subagent-registration-on-server

Conversation

@VascoSch92
Copy link
Copy Markdown
Contributor

@VascoSch92 VascoSch92 commented Mar 3, 2026

Problem

When using a remote workspace (agent-server), the delegate tool description shown to the LLM always says "No user-registered agents yet", even though builtin subagents (bash, explore, default) are registered on the client side.

Root cause: register_builtins_agents() runs on the client process (eval harness / SDK consumer), populating the client's in-memory _agent_factories registry. But with remote workspaces, the agent loop runs on a separate agent-server process, which has its own empty registry. The existing tool_module_qualnames mechanism forwards tool registrations to the server, but there was no equivalent for subagent registrations. So when DelegateTool.create() (or TaskSetTool.create()) calls get_factory_info() on the server, it sees an empty registry.

Fix

Mirror the existing tool_module_qualnames pattern for subagents:

Changes

  • Forward subagent definitions to the server so DelegateTool/TaskSetTool on the remote side can see registered agents instead of "No user-registered agents yet".
  • Simplify register_agent_if_absent signature from (name, factory_func, description, definition) to just (factory_func, definition) — all callers already had a definition in hand. Moreover, this is just used in the internal API.

Note

I also run an eval with this branch to be sure it was working. And it seems to work fine.
Eval: https://openhands-ai.slack.com/archives/C09QGUDQVTL/p1772531782817999

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a046ed2-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a046ed2-python \
  ghcr.io/openhands/agent-server:a046ed2-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a046ed2-golang-amd64
ghcr.io/openhands/agent-server:a046ed2-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a046ed2-golang-arm64
ghcr.io/openhands/agent-server:a046ed2-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a046ed2-java-amd64
ghcr.io/openhands/agent-server:a046ed2-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a046ed2-java-arm64
ghcr.io/openhands/agent-server:a046ed2-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a046ed2-python-amd64
ghcr.io/openhands/agent-server:a046ed2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:a046ed2-python-arm64
ghcr.io/openhands/agent-server:a046ed2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:a046ed2-golang
ghcr.io/openhands/agent-server:a046ed2-java
ghcr.io/openhands/agent-server:a046ed2-python

About Multi-Architecture Support

  • Each variant tag (e.g., a046ed2-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a046ed2-python-amd64) are also available if needed

@VascoSch92 VascoSch92 requested a review from all-hands-bot March 3, 2026 11:16
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)

============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟢 Good taste - Clean solution that mirrors existing patterns and simplifies the API.

⚠️ Eval Risk: This changes agent behavior (DelegateTool now shows builtin subagents in remote workspaces instead of "No user-registered agents yet"). Needs lightweight eval testing before merge.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   conversation_service.py3697579%72, 75, 164, 170, 178–179, 188–191, 200, 209, 231–232, 251, 278, 281, 292–296, 298–301, 304–309, 389, 396–400, 403–404, 408–412, 415–416, 420–424, 427–428, 434–439, 446–447, 451, 453–454, 459–460, 466–467, 473–475, 493, 517, 747
openhands-sdk/openhands/sdk
   __init__.py21290%74–75
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py3412193%282, 287, 315, 358, 376, 389, 451, 600–601, 604, 756, 764, 766, 777, 779–781, 806, 968, 975–976
   remote_conversation.py59211081%132, 159, 172, 174–177, 187, 209–210, 215–218, 294, 304–306, 312, 353, 485–488, 490, 510–514, 519–522, 525, 666–667, 671–672, 683, 702–703, 722, 733–734, 754–757, 759–760, 778, 784–785, 789, 794–795, 801–803, 806–810, 812–813, 817, 819–827, 829, 866, 993, 1061–1062, 1066, 1071–1075, 1081–1087, 1100–1101, 1177, 1184, 1190–1191, 1241, 1247–1248, 1252, 1255–1257, 1262–1263
openhands-sdk/openhands/sdk/subagent
   registry.py85594%93, 157–158, 164, 198
   schema.py115496%52, 65, 80, 309
openhands-tools/openhands/tools/delegate
   impl.py13711615%31, 33–34, 43–44, 48, 54–55, 58–61, 63, 75–76, 80–82, 86–87, 94–96, 105–106, 116–120, 122, 126, 130–131, 133–134, 139–141, 144–148, 150, 153, 159–160, 164, 166, 169, 174, 179, 183, 188–190, 207–208, 215–217, 226, 228–230, 233–238, 240, 247–248, 250–251, 254–257, 259–260, 264–267, 270–272, 277–278, 281–282, 287–291, 296, 298–302, 304, 307–309, 311–312, 315, 317, 322–324
openhands-tools/openhands/tools/preset
   default.py531669%104, 109–110, 114–115, 119, 123–127, 130–133, 137
openhands-tools/openhands/tools/task
   manager.py1298434%64–66, 70–72, 79, 81, 84, 87–88, 92–93, 97, 101–104, 107–109, 111, 135–136, 138–139, 144, 150, 157–158, 163–165, 173, 180, 195, 197–198, 204–207, 209, 211, 219, 225, 235–236, 238–241, 243, 259–260, 264–265, 267–268, 271, 273, 277–278, 280–283, 285–293, 295–296, 298, 305–306, 312–313, 315
TOTAL20352548273% 

@VascoSch92 VascoSch92 marked this pull request as ready for review March 3, 2026 11:26
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟢 Good taste - Clean solution that mirrors existing patterns and simplifies the API.

⚠️ Eval Risk: This changes agent behavior (DelegateTool now shows builtin subagents in remote workspaces instead of "No user-registered agents yet"). Needs lightweight eval testing before merge.

KEY INSIGHT: Storing the full AgentDefinition instead of just description is better data structure design - gives you access to all agent metadata without additional lookups. The API simplification (register_agent_if_absent now takes just factory + definition) eliminates parameter duplication since the definition already contains name and description.

VERDICT: ✅ Core logic is solid, solves a real problem pragmatically by reusing the tool_module_qualnames pattern for subagents.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a new test to tests/cross/test_remote_conversation_live_server.py that spawns up a REAL agent server with the real sub-agent requests, and make sure it works?

Copy link
Copy Markdown
Contributor Author

@VascoSch92 VascoSch92 Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingyaoww

I found that register_agent() was producing hollow AgentDefinition stubs (just name + description, no tools / system_prompt). These would leak to the remote server via get_registered_agent_definitions() and get recreated as useless agents (no tools or/and attached system prompt).

I didn't see that before because the machinery to load subagents is using register_agent_if_absent() which needs an AgentDefinition (so from this side everything is good).

I solved this by adding a AgentDefinition.from_agent() classmethod that introspects a live Agent instance and extracts tools, system_prompt, and model into a complete definition. register_agent() now calls the factory with a placeholder LLM and delegates to AgentDefinition.from_agent().

Having this as a classmethod on AgentDefinition is much better from a maintainability perspective — the logic for building a definition from an agent lives right next to the schema it populates (alongside
load() for file-based definitions), making it easier to keep in sync when fields change.

Why this solution? This keeps register_agent(name, factory_func, description) user-friendly — no signature change needed — while producing complete definitions that survive the HTTP roundtrip to the
server.

Everything is tested, and also cross tested.

@VascoSch92 VascoSch92 requested a review from xingyaoww March 3, 2026 15:02
definition = AgentDefinition.from_agent(
agent, name=name, description=description
)
# If the model was our placeholder, the factory didn't set one explicitly
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is introspect necessary? Could it happen differently?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that. The point is, it could but I will have to change the signature of register_agent, i.e., adding as an arg the AgentDefinition.

The contro on this choiche are:

  • we change the API
  • the API for an sdk-user to register an agent become very complex

In this way we can reconstruct the AgentDefinition from the factory.

This time I went for giving the user a better experience :-)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands understand the question and answer, investigate all you need, including PR description and discussion and PR code. Why exactly do we need two different behaviors, split via introspect, and what are 2-3 alternatives? Note that you should also take a look at LLM Profiles in the SDK codebase

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Copy Markdown

@openhands-ai openhands-ai bot Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(HUMAN: I'll just replace here the answers)

Alternatives (2–3):

  1. Make declarative registration explicit: add a register_agent_definition(...) / require passing an AgentDefinition for remote-forwardable agents (no introspection).
  2. Split local-only vs remote-forwardable APIs: keep register_agent() local, add a separate remote-safe registration function that only accepts declarative definitions.
  3. Extend AgentDefinition to support LLM Profiles / full LLM config: e.g., llm_profile or embedded LLM config—more faithful than introspection, but introduces distribution/security concerns.

LLM Profiles note

I confirmed the SDK’s LLMProfileStore persists and loads full LLM configurations, but AgentDefinition currently only carries a model string (“inherit” vs override) and does not represent “use profile X”. That makes introspection inherently limited for profile-driven factories, and helps explain why a fallback path is necessary.

HUMAN:

isn't the model actually a profile name?

No—in the subagent system, AgentDefinition.model is treated as an LLM model identifier, not an LLM profile name.

You can see that in two places:

  • In the schema itself (openhands-sdk/openhands/sdk/subagent/schema.py):

    model: str = Field(default="inherit", description="Model to use ('inherit' uses parent model)")
  • And, more importantly, in how it’s applied when building a subagent (agent_definition_to_factory in subagent/registry.py):

    if agent_def.model and agent_def.model != "inherit":
        llm = llm.model_copy(update={"model": agent_def.model})

That code only changes LLM.model (a string passed through to LiteLLM). It does not load anything from disk or resolve a “profile”.

Where profiles are used in the SDK

LLM Profiles are a separate mechanism:

  • Profiles are persisted by LLMProfileStore (openhands.sdk.llm.llm_profile_store.LLMProfileStore)

  • They’re referenced by name in FallbackStrategy.fallback_llms, and loaded like:

    fb = self._profile_store.load(name)  # name is a profile name like "gpt4.json"

So a “profile name” is something like "gpt4" / "gpt4.json" passed to LLMProfileStore.load(...), not something that LLM.model_copy(update={"model": ...}) would understand.

Implication for the introspection logic

If a user factory internally loads a profile (e.g., LLMProfileStore().load("foo")), introspection may extract the resulting .model string, but remote instantiation still won’t reproduce the full profile config (base_url, provider-specific auth, etc.) because the forwarded AgentDefinition only carries model, not “profile”.

If we wanted model: in agent markdown / AgentDefinition to mean “profile”, we’d need a deliberate design change (e.g., add llm_profile field, or support a profile:xyz syntax and resolve it on the server).

Copy link
Copy Markdown
Collaborator

@enyst enyst Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure, this one makes sense to me 🤔

Extend AgentDefinition to support LLM Profiles / full LLM config: e.g., llm_profile or embedded LLM config

Remote has to send full LLM profile today for main agent. We could

  • send full profile for sub-agents too
  • send profile references for sub-agents / agents

Seems the same touched in this discussion? #2186 (comment)

The latter implies a mechanism to send/retrieve profiles, which we need anyway outside sub-agents, simply for profile management from a remote client I think, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach carries the problem I mentioned above and because of that I want for introspection.

Precisely: Dropping introspection means register_agent(name, factory_func, description) would produce definitions without tools and system_prompt, which we need on the remote side to build subagents.

If we really want to go for this direction (i.e. giving the LLM profile to the AgentDefinition) there are two options:

Option A — extend the signature with optional fields:

def register_agent(
    name, factory_func, description,
    tools: list[str] | None = None,
    system_prompt: str | None = None,
)

backwards-compatible. Users already know these values since they wrote the factory.

Option B — accept an optional AgentDefinition:

def register_agent(
    name, factory_func, description,
    definition: AgentDefinition | None = None,
)

More future-proof (new fields automatically supported), but heavier UX.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enyst I was looking at other features for file-based agents. And I think having an introspect method from the factory is actually a very cool solution for the user.

The function register_agent stays super simple and user friendly and we take care of the rest (anyway I don't think we will add a lot of features after the one discussed in the issue #2186 )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

Option C

def register_agent(
    name,
    factory_func,
    definition: AgentDefinition | str,
)

And then we construct the AgentDefinition from the string if necessary. No breaking changes but support for AgentDefinition.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deeply apologize but today I don't have much bandwidth. I just want to note quickly, IMHO introspection is simply a smell, it smells like maybe we can dig into to see if something's not quite right. I'm not at war with it 🙏 😅

I won't lie, I am kinda at war with this: "the forwarded AgentDefinition carries model, not “profile”. --- I don't see much good from carrying model, except in special cases when we want it to, e.g., have the same profile id with model name, to satisfy Anthropic.

I don't really understand why, after the other PR too, we couldn't just say ".model" is a profile id; we add a small API to CRUD profiles for remote-local sync; maybe in the future deprecate full LLM config sending over in any other way except profiles API - but maybe I'm dense sorry, I might miss details

@VascoSch92
Copy link
Copy Markdown
Contributor Author

Hey @enyst, I’ve made some changes.

I apologize for grouping these structural changes with a bug fix, but they are tightly coupled and this was the right time to address them.

My motivation is that since the subagent module hasn't been released yet, we have the freedom to reshape the API before it stabilizes. These changes make AgentDefinition the single source of truth: it now holds the full agent configuration and can produce its own factory. This makes it straightforward to add new features to file-based agents (like working_dir) by simply adding a field and extraction logic in one place, rather than threading values through multiple functions.

@neubig, the solution we discussed previously wasn't entirely complete, as it left us with potential data leakage. The current approach offers the best balance of a clean API and maintenability.

Summary of the changes

  • Add working_dir field to AgentDefinition for controlling sub-agent working directories (absolute, relative to parent, or None to inherit)
  • Move agent_definition_to_factory into AgentDefinition.to_factory() —> the definition now knows how to build its own factory
  • Rename from_agent to from_factory_func for clarity: it introspects a factory function, not a bare agent
  • Extract _resolve_tools() and _resolve_skills() as private methods with improved error messages that list all missing references at once
  • Resolve tools eagerly at factory creation time (same as skills) so missing references fail fast
  • Add AgentFactoryFunc type alias
  • Clean up and simplify test suite

Breaking changes

  • agent_definition_to_factory() is deleted. It was weird to have this method in the registry script
  • Tools are now resolved eagerly at to_factory() time (previously resolved lazily at factory invocation time).

Test plan

  • pytest tests/sdk/subagent/ — schema loading, factory creation, registry operations
  • pytest tests/tools/task/ — task delegation with sub-agents
  • pytest tests/agent_server/test_subagent_registration_on_server.py — remote registration
  • Verify working_dir loads from frontmatter (absolute, relative, omitted)
  • Verify _resolve_tools / _resolve_skills list all missing references in error messages
  • Verify from_factory_func extracts tools, skills, system_prompt, and model

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 5, 2026

@OpenHands Use gh to read all comments and inline review threads in this PR.

Then do a /codereview-roasted on it, and publish the review on github api (you do not comment, you review, which is allowed)

Introduce yourself in the review so that readers know who is speaking.

Before you publish review, take a deep breath and look at all with fresh eyes: we want a very smooth experience for remote conversations, where it feels almost like local conversations; so maybe that implies compatibility to some high extent. Also, I'd like an answer to how difficult it will be, on this PR's code design, to use LLM Profiles by preference (see comment #2275 (comment))

Then include it all in your review.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 5, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm OpenHands (automated reviewer) taking a fresh look.

Taste Rating: 🔴 Needs improvement

[CRITICAL ISSUES]

  • schema.py, lines 215-260 Model override regression: from_factory_func() creates a placeholder LLM(model="__introspect__", api_key="n/a") and then persists agent.llm.model as the definition model. In to_factory() (Lines 258–260), any model that is not "inherit" overrides the parent LLM. Net result: any register_agent() factory that doesn’t explicitly override the model now forces subagents to run on the fake "introspect" model (local and remote). That’s a silent behavior break and destroys local/remote parity. Treat the sentinel as "inherit" or avoid setting a fake model in the first place.
  • schema.py, lines 217-314 Lossy introspection breaks user-defined tools/skills and remote parity: introspection collapses Tool/Skill objects into names, then to_factory() re-resolves them from registries and filesystem. This means: (1) custom programmatic tools/skills that aren’t registered or on disk now fail, (2) tool/skill registration order becomes a new constraint because resolution is eager. That’s a compatibility regression vs. the previous "factory is source of truth" behavior and risks a less smooth remote experience. Either preserve tool/skill objects, or offer an explicit AgentDefinition registration path (and keep the original factory for local execution) to avoid lossy reconstruction.
  • sdk/init.py lines 50-125 and subagent/init.py lines 17-31 Public API removal without deprecation: agent_definition_to_factory disappears from the public surface and the function itself is gone. This is a breaking change (already flagged by the API breakage check) and needs a deprecation path or an alias, plus version bump if removal is intended.

[IMPROVEMENT OPPORTUNITIES]

  • schema.py lines 98-132 and schema.py lines 238-280 LLM Profiles by preference: this PR bakes in model: str and to_factory() only calls llm.model_copy(update={"model": ...}). There is no representation of a profile name or a profile payload, and introspection cannot recover profile names from a factory. So supporting "profile preference" with this design is moderately hard: you’ll need a new schema field (e.g., llm_profile), plus resolution logic via LLMProfileStore (or a profile-sync API) on the server. That also implies updating remote payloads and probably the registration API to accept explicit AgentDefinition/profile info, since the factory-only path is fundamentally lossy.

VERDICT: ❌ Needs rework

KEY INSIGHT: Introspection-based registration is inherently lossy; it breaks parity between local and remote execution and makes profile-first LLM selection significantly harder to support.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 5, 2026

Final summary: I used gh to fetch the PR body, diff, issue comments, and inline review threads. I reviewed the changes with a “fresh eyes” focus on remote/local compatibility and LLM profile support, then submitted a GitHub COMMENT review in the requested /codereview-roasted style (with an introduction, critical issues, and a clear note that profile-preference support is moderately hard under the current design). No code changes were made, so no diff or push was needed.

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-read with fresh eyes: this is more major than minor.

Section-by-section reassessment:

  • Model override regression: This is a major behavior break, not a nit. It silently forces subagents onto the fake "introspect" model unless factories explicitly override. That changes both local and remote runs and undermines the "remote feels like local" goal.
  • Lossy introspection + eager resolution: Also major. It changes the contract from "factory defines tools/skills" to "registry/filesystem must already contain them", and makes registration order a new failure mode. That is a compatibility regression for any programmatic tool/skill use.
  • (HUMAN: we know --- Public API removal: Breaking change, but relatively smaller than the runtime regressions. Still requires a deprecation path or version bump.)
  • (HUMAN: the agent kinda misunderstood, it wasn't in scope --- LLM profile preference: Not a bug, but a structural limitation. With this design, adding profile-first behavior is moderately hard and likely requires schema/API work, so it’s a significant design headwind.)

Overall: the issues are major and need rework before merge to preserve local/remote parity and avoid silent behavior changes.

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm OpenHands (automated reviewer) following up after a deeper verification.

Answering the question "custom tools can be registered remotely; is that broken?":

Short answer: remote custom tools themselves are not broken, as long as they are registered on the client before conversation creation. The break is about when resolution happens and what information survives the new introspection path.

Evidence & reasoning:

  1. Remote tool registration still works via module qualnames.

    So if your custom tool module is imported on the client (and thus in tool_module_qualnames), it will be registered on the server.

  2. The new failure mode is eager resolution + lossy introspection.

    • register_agent() now builds an AgentDefinition via introspection and immediately calls definition.to_factory():
      registry.py lines 70-99
    • to_factory() calls _resolve_tools() right away and errors if the tool name isn’t already in the registry:
      schema.py lines 238-314

    Implication: tool registration order is now a hard constraint. If someone calls register_agent() before importing/registering their custom tool module, it will now fail. Previously, the factory could succeed and the tool would resolve later at agent init time.

(HUMAN: this sounds OK to me)

  1. Introspection drops tool params and programmatic tool objects.

    If a factory used Tool(name="X", params={...}), those params are lost in the definition and won’t be replayed. That’s a behavior change, even if the tool is registered.

  2. The same lossy behavior applies to skills: programmatic Skill objects are reduced to names and reloaded from disk.

    So if a factory created a Skill instance directly (not from filesystem), it won’t survive the introspection roundtrip.

Conclusion:

  • Custom tools registered on both client and server are still fine, assuming they are registered before register_agent() is called.
  • What is broken is support for programmatic tools/skills (including tool params) and the previous flexibility around registration order. That is a real compatibility regression even if remote tool registration is still possible.

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm OpenHands (automated reviewer) — I ran example-derived checks on this PR.

What I ran:

  1. examples/05_skills_and_plugins/01_loading_agentskills/main.py (non-LLM portion).

    • It runs fine and exits early because LLM_API_KEY isn’t set.
  2. Targeted reproductions based on examples/01_standalone_sdk/25_agent_delegation.py
    (the user-defined subagent section uses inline Skill objects).

Reproduction A: inline Skill objects (same pattern as create_lodging_planner)

register_agent: failed -> ValueError: Skills not found but given to agent 'lodging_planner': lodging_planning

This is the new _resolve_skills() behavior in AgentDefinition.to_factory(). It requires skills to exist on disk, so in-memory Skill objects from the example no longer work. That means the user-defined subagent section of example 25 will now fail at registration, even before any LLM calls.

Reproduction B: unregistered custom tool referenced in a subagent factory

register_agent: failed -> ValueError: Tools not registered but given to agent 'custom_tool_agent': MyCustomTool

This shows the registration order constraint: tools must be registered before register_agent() because to_factory() resolves tools eagerly. Previously, the factory could succeed and tools resolved later at agent init.

Answer to your question (“custom tools can be registered remotely; is that broken?”):

  • Remote custom tools themselves are not broken when used directly in a main Agent (e.g., the remote custom tool example). The tool module qualname still gets sent and imported on the server.
  • But subagent registration now breaks if the factory uses programmatic Skill objects or custom tools that aren’t registered before register_agent() is called. That is a real regression, and example 25 demonstrates it.

So yes: the breakage is real and reproducible on this PR. It’s tied to the new eager resolution + lossy introspection in AgentDefinition.to_factory().

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm OpenHands (automated reviewer) — I updated from main and ran the full example as requested.

Run command:

LLM_API_KEY=$OPENAI_API_KEY LLM_MODEL=gpt-5-nano LLM_BASE_URL= \
  ./.venv/bin/python examples/01_standalone_sdk/25_agent_delegation.py

Result (fails before any LLM calls):

ValueError: Skills not found but given to agent 'lodging_planner': lodging_planning

The failure happens at the user-defined subagent registration in
examples/01_standalone_sdk/25_agent_delegation.py, lines 176-185, which uses inline Skill objects. With the current PR, register_agent() calls AgentDefinition.to_factory() and _resolve_skills() immediately, so non-file-backed skills now fail at registration.

So yes — this example is currently broken on the PR branch even after updating from main.

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 5, 2026

HUMAN: Sorry 🙏 I tried to keep its verbosity in check... that doesn't look like the most successful attempt ever 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants