feat: add agent skills execution engine by OriNachum · Pull Request #53 · teabranch/open-responses-server

OriNachum · 2026-03-31T06:19:44Z

Summary

Add SkillManager singleton that discovers and executes agent skills from a configurable directory (SKILLS_DIR)
Skills follow the Claude Code standard: SKILL.md (YAML frontmatter + instructions) + scripts/ (executables) + references/ (docs)
Each skill is exposed as a callable tool (skill__<name>) alongside MCP tools in both /responses and /v1/chat/completions endpoints
Feature is off by default (SKILLS_ENABLED=false) with security controls: path confinement, no shell execution, executable check, configurable timeout

New files

src/open_responses_server/common/skill_manager.py — core skill discovery, caching, and execution
tests/test_skill_manager.py — 25 tests covering parsing, discovery, execution, path traversal, timeout

Modified files

common/config.py — 3 new env vars (SKILLS_ENABLED, SKILLS_DIR, SKILLS_EXEC_TIMEOUT)
api_controller.py — startup/shutdown hooks + skill tool injection
responses_service.py — skill routing at all tool execution points
chat_completions_service.py — skill routing in streaming and non-streaming handlers
pyproject.toml — added pyyaml>=6.0 dependency
README.md — Agent Skills documentation section

Test plan

All 25 new skill manager tests pass (pytest tests/test_skill_manager.py)
Full test suite passes (205/205 tests)
Flake8 clean (no new issues beyond existing E501 style)
Bandit security scan: 0 issues
Manual test: create a test skill, start server with SKILLS_ENABLED=true, verify tool injection and execution

🤖 Generated with Claude Code

Claude

…am reliability (#44) Add the missing `content_index` field to `response.output_text.delta` SSE events so clients that validate against the OpenAI Responses API spec (e.g. ChatKit SDK) no longer fail with a Pydantic validation error. Also introduces SSE heartbeat keepalives, configurable stream timeout, and structured stream timing logs to improve reliability with slow backends. Closes #44 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add try/finally to _with_heartbeat() to cancel in-flight tasks and close the underlying async iterator on cancellation/exit - Guard against interval <= 0 to prevent tight heartbeat loops - Fix trailing whitespace and missing newline in config.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- S7497: Re-raise asyncio.CancelledError after cleanup instead of swallowing it - B110: Replace bare except/pass with logger.debug for aclose errors - S5806: Rename `aiter` to `inner` to avoid shadowing the builtin - S3776: Extract cleanup logic to _cleanup_heartbeat() to reduce cognitive complexity below threshold Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tories Add SkillManager that discovers, caches, and executes agent skills from a configurable directory. Skills follow the Claude Code standard (SKILL.md + scripts/ + references/). Each skill is exposed as a callable tool alongside MCP tools. Feature is off by default (SKILLS_ENABLED=false) with path confinement, no-shell execution, and configurable timeout for security. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

qodo-code-review · 2026-03-31T06:20:03Z

Review Summary by Qodo

Add agent skills execution engine with streaming reliability improvements

✨ Enhancement 🐞 Bug fix

Walkthroughs

Description

• Add SkillManager singleton for discovering and executing agent skills from configurable directory
• Inject skill tools alongside MCP tools in chat completions and responses endpoints
• Implement SSE heartbeat keepalives and configurable stream timeout for reliability
• Add missing content_index field to streamed text delta events for OpenAI spec compliance
• Implement path traversal protection and execution timeout for skill security

Diagram

flowchart LR
  A["SKILLS_DIR"] -->|scan & parse| B["SkillManager"]
  B -->|cache tools| C["skill_functions_cache"]
  C -->|inject| D["Chat Completions"]
  D -->|execute| E["Skill Scripts"]
  E -->|return output| F["Tool Result"]
  G["LLM Stream"] -->|_with_heartbeat| H["SSE Events"]
  H -->|content_index| I["OutputTextDelta"]

File Changes

1. src/open_responses_server/common/skill_manager.py ✨ Enhancement +318/-0

New SkillManager for skill discovery and execution

src/open_responses_server/common/skill_manager.py

2. src/open_responses_server/api_controller.py ✨ Enhancement +119/-9

Add heartbeat wrapper and skill tool injection

src/open_responses_server/api_controller.py

3. src/open_responses_server/chat_completions_service.py ✨ Enhancement +29/-10

Route skill tool execution in streaming and non-streaming

src/open_responses_server/chat_completions_service.py

View more (11)

4. src/open_responses_server/responses_service.py ✨ Enhancement +93/-37

Add skill execution and content_index to deltas

src/open_responses_server/responses_service.py

5. src/open_responses_server/common/config.py ⚙️ Configuration changes +15/-1

Add skill and streaming configuration variables

src/open_responses_server/common/config.py

6. src/open_responses_server/common/llm_client.py ⚙️ Configuration changes +2/-2

Use configurable STREAM_TIMEOUT instead of hardcoded value

src/open_responses_server/common/llm_client.py

7. src/open_responses_server/models/responses_models.py 🐞 Bug fix +1/-0

Add missing content_index field to OutputTextDelta

src/open_responses_server/models/responses_models.py

8. tests/test_skill_manager.py 🧪 Tests +390/-0

Comprehensive tests for skill discovery and execution

tests/test_skill_manager.py

9. tests/test_api_controller_endpoints.py 🧪 Tests +65/-1

Add tests for _with_heartbeat async generator wrapper

tests/test_api_controller_endpoints.py

10. tests/test_chat_completions_service.py 🧪 Tests +2/-2

Update error messages to reference server tools

tests/test_chat_completions_service.py

11. tests/test_responses_service.py 🧪 Tests +6/-0

Add content_index assertions to delta event tests

tests/test_responses_service.py

12. pyproject.toml Dependencies +1/-1

Add pyyaml dependency for SKILL.md parsing

pyproject.toml

13. .env.example 📝 Documentation +4/-0

Document streaming and skill configuration options

.env.example

14. README.md 📝 Documentation +57/-1

Add Agent Skills section with setup and security docs

README.md

qodo-code-review · 2026-03-31T06:20:05Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. Skill failure logged success 🐞 Bug ≡ Correctness

Description

SkillManager.execute_skill_tool() returns an error string when the script exits non-zero instead of
raising, so callers log “executed successfully” and treat the failure as a normal tool result. This
contradicts the method’s contract (“execution fails” should raise) and breaks consistent error
handling versus MCP tools.

Code

src/open_responses_server/common/skill_manager.py[R281-291]

+            if process.returncode != 0:
+                error_msg = (
+                    f"Script '{script_name}' exited with code {process.returncode}"
+                )
+                if stderr_text:
+                    error_msg += f"\nstderr: {stderr_text}"
+                if stdout_text:
+                    error_msg += f"\nstdout: {stdout_text}"
+                logger.warning(f"[SKILLS-EXEC] {error_msg}")
+                return error_msg
+

Evidence
execute_skill_tool() explicitly documents that execution failures raise RuntimeError, but on
non-zero returncode it returns an error string. Both chat_completions_service and responses_service
treat any returned value as a success path (logging ✓ and forwarding the string) because no
exception is raised.
src/open_responses_server/common/skill_manager.py[219-224]
src/open_responses_server/common/skill_manager.py[281-291]
src/open_responses_server/chat_completions_service.py[63-70]
src/open_responses_server/responses_service.py[618-625]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`SkillManager.execute_skill_tool()` returns a string on non-zero exit instead of raising, which causes callers to log success and treat failures as normal outputs.

### Issue Context
The docstring states failures should raise `RuntimeError`, and tool execution call sites rely on exceptions to produce error payloads.

### Fix Focus Areas
- src/open_responses_server/common/skill_manager.py[219-224]
- src/open_responses_server/common/skill_manager.py[281-291]
- src/open_responses_server/chat_completions_service.py[63-72]
- src/open_responses_server/responses_service.py[618-626]

### Suggested change
Change the non-zero returncode branch to `raise RuntimeError(error_msg)` (or return a structured `{error:..., returncode:...}` consistently and adjust callers). Update callers’ logs to only emit “✓ executed successfully” after confirmed success.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Skills not injected~~ ☑ 🐞 Bug ≡ Correctness

Description

The /v1/chat/completions endpoint never injects skill tool definitions into the request, so the LLM
cannot discover/call skills unless the client supplies them manually. This makes the newly-added
skill execution routing largely unreachable for direct chat.completions traffic.

Code

src/open_responses_server/chat_completions_service.py[R63-75]

+                    elif skill_manager.is_skill_tool(tool_name):
+                        logger.info(f"[CHAT-COMPLETIONS-NON-STREAM] Executing skill tool: {tool_name}")
+                        try:
+                            arguments = json.loads(function_call.get("arguments", "{}"))
+                            result = await skill_manager.execute_skill_tool(tool_name, arguments)
+                            logger.info(f"[CHAT-COMPLETIONS-NON-STREAM] ✓ Skill tool {tool_name} executed successfully")
+                            tool_content = result
+                        except Exception as e:
+                            logger.error(f"[CHAT-COMPLETIONS-NON-STREAM] ✗ Error executing skill tool {tool_name}: {e}")
+                            tool_content = json.dumps({"error": f"Error executing skill tool: {e}"})
                    else:
-                        logger.warning(f"[CHAT-COMPLETIONS-NON-STREAM] Tool '{tool_name}' is not a registered MCP tool.")
-                        tool_content = json.dumps({"error": f"Tool '{tool_name}' is not a registered MCP tool."})
+                        logger.warning(f"[CHAT-COMPLETIONS-NON-STREAM] Tool '{tool_name}' is not a registered server tool.")
+                        tool_content = json.dumps({"error": f"Tool '{tool_name}' is not a registered server tool."})

Evidence
Skill execution is now routed in chat_completions_service when a tool call name matches a registered
skill, but handle_chat_completions() only injects MCP tools (not skills) into outgoing requests. In
contrast, api_controller injects skill tools for the /responses flow, so behavior differs by
endpoint.
src/open_responses_server/chat_completions_service.py[214-246]
src/open_responses_server/api_controller.py[163-176]
src/open_responses_server/chat_completions_service.py[59-75]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`/v1/chat/completions` requests only get MCP tools injected, not skills. As a result, skills are not discoverable by the model on this endpoint.

### Issue Context
Skills are injected for the `/responses` endpoint in `api_controller.py`, but `handle_chat_completions()` injects only MCP tools.

### Fix Focus Areas
- src/open_responses_server/chat_completions_service.py[214-246]
- src/open_responses_server/api_controller.py[163-176]

### Suggested change
In `handle_chat_completions()`, after MCP injection, merge in `skill_manager.skill_functions_cache` using the same `{ "type": "function", "function": <tool_def> }` schema and de-dupe by function name.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Duplicate skills on startup 🐞 Bug ☼ Reliability

Description

startup_skills() never clears existing state before scanning, so calling it more than once
accumulates duplicates in self.skills and regenerates caches from the expanded list. This can lead
to duplicated tool definitions and unbounded growth in long-running processes that re-run startup
logic (tests/hot reload/in-process restart).

Code

src/open_responses_server/common/skill_manager.py[R87-107]

+    async def startup_skills(self) -> None:
+        """Scan SKILLS_DIR, parse each skill, and populate caches.
+
+        No-op when SKILLS_ENABLED is False or SKILLS_DIR is empty.
+        """
+        if not SKILLS_ENABLED:
+            logger.info("[SKILLS-STARTUP] Agent skills are disabled (SKILLS_ENABLED=false)")
+            return
+
+        if not SKILLS_DIR:
+            logger.warning("[SKILLS-STARTUP] SKILLS_ENABLED is true but SKILLS_DIR is empty")
+            return
+
+        skills_path = Path(SKILLS_DIR)
+        if not skills_path.is_dir():
+            logger.warning(f"[SKILLS-STARTUP] SKILLS_DIR does not exist or is not a directory: {skills_path}")
+            return
+
+        logger.info(f"[SKILLS-STARTUP] Scanning skills directory: {skills_path}")
+
+        for entry in sorted(skills_path.iterdir()):

Evidence
startup_skills() appends discovered skills to self.skills, and only shutdown_skills() clears state.
If startup_skills() is invoked multiple times without shutdown (common in test/dev), the list grows
and _build_caches() is built from the duplicated list.
src/open_responses_server/common/skill_manager.py[87-160]
src/open_responses_server/common/skill_manager.py[161-167]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`startup_skills()` appends to `self.skills` without clearing, so repeated calls duplicate skills and tool definitions.

### Issue Context
`shutdown_skills()` clears state, but it may not run between multiple startup invocations in tests/dev.

### Fix Focus Areas
- src/open_responses_server/common/skill_manager.py[87-107]

### Suggested change
At the beginning of `startup_skills()` (after the SKILLS_ENABLED/SKILLS_DIR checks pass), clear `self.skills`, `self.skill_functions_cache`, and `self._skill_name_mapping` (or call `await self.shutdown_skills()`/a new internal reset helper) before scanning.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Skill allowlist not enforced 🐞 Bug ⛨ Security

Description

execute_skill_tool() does not enforce that arguments['script'] is one of the discovered/advertised
available_scripts, so the server may execute any executable file placed in scripts/ even if it was
not part of the cached tool definition. It also assumes args is a string and passes it to
shlex.split(), which can crash on non-string types from tool calls.

Code

src/open_responses_server/common/skill_manager.py[R229-259]

+        script_name = arguments.get("script", "")
+        args_str = arguments.get("args", "")
+
+        if not script_name:
+            raise RuntimeError(f"No 'script' argument provided for skill '{skill.name}'")
+
+        # Path validation: prevent directory traversal
+        script_path = (skill.scripts_dir / script_name).resolve()
+        scripts_dir_resolved = skill.scripts_dir.resolve()
+
+        if not str(script_path).startswith(str(scripts_dir_resolved) + os.sep) and script_path != scripts_dir_resolved:
+            logger.error(
+                f"[SKILLS-EXEC] Path traversal attempt blocked: "
+                f"'{script_name}' resolves outside scripts dir"
+            )
+            raise RuntimeError(f"Invalid script path: '{script_name}'")
+
+        if not script_path.is_file():
+            raise RuntimeError(
+                f"Script '{script_name}' not found in skill '{skill.name}'. "
+                f"Available: {skill.available_scripts}"
+            )
+
+        if not os.access(script_path, os.X_OK):
+            raise RuntimeError(f"Script '{script_name}' is not executable")
+
+        # Build command
+        cmd = [str(script_path)]
+        if args_str:
+            cmd.extend(shlex.split(args_str))
+

Evidence

Tool definitions explicitly enumerate allowed scripts, but execution only checks path confinement +
existence + executable bit; it never checks membership in skill.available_scripts. Also args_str is
taken directly from arguments and conditionally passed to shlex.split() without type normalization.

src/open_responses_server/common/skill_manager.py[190-206]
src/open_responses_server/common/skill_manager.py[229-259]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Skill execution does not enforce the advertised script allowlist and assumes `args` is always a string.

### Issue Context
`_build_caches()` publishes an enum allowlist via `available_scripts`, but `execute_skill_tool()` only does filesystem checks. Tool-call arguments may be malformed (e.g., args as list).

### Fix Focus Areas
- src/open_responses_server/common/skill_manager.py[190-206]
- src/open_responses_server/common/skill_manager.py[229-259]

### Suggested change
1) Reject `script_name` unless it is in `skill.available_scripts` (or regenerate discovery dynamically if you intentionally want late-added scripts).
2) Normalize `args`:
- If `args` is a list, append items directly (after `str()` coercion).
- If `args` is a string, keep `shlex.split`.
- Otherwise raise a `RuntimeError` with a clear message.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

src/open_responses_server/chat_completions_service.py

src/open_responses_server/common/skill_manager.py

Copilot

Pull request overview

Adds an Agent Skills execution engine that discovers local “skills” from SKILLS_DIR, exposes them as skill__<name> tools alongside MCP tools, and routes tool calls to server-side execution with configurable timeouts and SSE heartbeats.

Changes:

Introduces SkillManager for skill discovery, tool schema generation, and secure script execution (path confinement, no shell).
Injects skill tools into /responses and /v1/chat/completions flows and adds routing to execute skill tools server-side.
Adds streaming controls (STREAM_TIMEOUT, SSE heartbeat interval) and expands tests to cover new behavior.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/open_responses_server/common/skill_manager.py`	Implements skill discovery, caching, and script execution for `skill__*` tools.
`src/open_responses_server/api_controller.py`	Starts/stops skills on app lifecycle; injects skill tools; adds SSE heartbeat wrapper.
`src/open_responses_server/responses_service.py`	Routes tool execution to MCP vs skills; adds `content_index` to deltas; improves stream timing logs.
`src/open_responses_server/chat_completions_service.py`	Adds skill tool routing in streaming and non-streaming tool-call handlers; uses `STREAM_TIMEOUT`.
`src/open_responses_server/common/config.py`	Adds streaming + skills env vars and logs them on startup.
`src/open_responses_server/common/llm_client.py`	Uses configurable `STREAM_TIMEOUT` for the shared HTTPX client.
`src/open_responses_server/models/responses_models.py`	Adds required `content_index` field to `OutputTextDelta`.
`tests/test_skill_manager.py`	Adds test suite for skill parsing, discovery, execution, traversal, and timeouts.
`tests/test_api_controller_endpoints.py`	Adds unit tests for the heartbeat wrapper.
`tests/test_responses_service.py`	Updates streaming delta expectations to include `content_index`.
`tests/test_chat_completions_service.py`	Updates error-message expectations for non-server tools.
`README.md`	Documents Agent Skills usage and security model.
`pyproject.toml`	Adds `pyyaml>=6.0` dependency.
`uv.lock`	Locks `pyyaml` in dependency graph.
`.env.example`	Documents new streaming env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/open_responses_server/common/skill_manager.py

src/open_responses_server/api_controller.py

.env.example

…on script failure 1. Add skill tool injection in handle_chat_completions() alongside MCP tools, so skills are discoverable via /v1/chat/completions endpoint. 2. Raise RuntimeError on non-zero script exit instead of returning error string, so callers correctly log failures and error handling matches MCP pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. Reject symlinks pointing outside scripts/ during discovery (security). 2. Validate skill names against tool-name rules (alphanumeric/hyphens/underscores, max 64 chars). 3. Include stderr in output on successful execution (preserves warnings). 4. Add SKILLS_* vars to .env.example for discoverability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-03-31T08:36:52Z

Quality Gate failed

Failed conditions
65.7% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

OriNachum and others added 4 commits March 31, 2026 07:43

Copilot AI review requested due to automatic review settings March 31, 2026 06:19

Copilot started reviewing on behalf of OriNachum March 31, 2026 06:20 View session

qodo-code-review bot reviewed Mar 31, 2026

View reviewed changes

src/open_responses_server/chat_completions_service.py Show resolved Hide resolved

src/open_responses_server/common/skill_manager.py Show resolved Hide resolved

Copilot AI reviewed Mar 31, 2026

View reviewed changes

OriNachum and others added 3 commits March 31, 2026 09:27

merge: resolve conflicts with main (keep skills config additions)

2019899

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add agent skills execution engine#53

feat: add agent skills execution engine#53
OriNachum wants to merge 7 commits intomainfrom
feat/agent-skills-engine

OriNachum commented Mar 31, 2026

Uh oh!

qodo-code-review bot commented Mar 31, 2026

Uh oh!

qodo-code-review bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

OriNachum commented Mar 31, 2026

Summary

New files

Modified files

Test plan

Uh oh!

qodo-code-review bot commented Mar 31, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 31, 2026

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qodo-code-review bot commented Mar 31, 2026 •

edited

Loading