Skip to content

feat: add agent skills execution engine#53

Open
OriNachum wants to merge 7 commits intomainfrom
feat/agent-skills-engine
Open

feat: add agent skills execution engine#53
OriNachum wants to merge 7 commits intomainfrom
feat/agent-skills-engine

Conversation

@OriNachum
Copy link
Copy Markdown
Collaborator

Summary

  • Add SkillManager singleton that discovers and executes agent skills from a configurable directory (SKILLS_DIR)
  • Skills follow the Claude Code standard: SKILL.md (YAML frontmatter + instructions) + scripts/ (executables) + references/ (docs)
  • Each skill is exposed as a callable tool (skill__<name>) alongside MCP tools in both /responses and /v1/chat/completions endpoints
  • Feature is off by default (SKILLS_ENABLED=false) with security controls: path confinement, no shell execution, executable check, configurable timeout

New files

  • src/open_responses_server/common/skill_manager.py — core skill discovery, caching, and execution
  • tests/test_skill_manager.py — 25 tests covering parsing, discovery, execution, path traversal, timeout

Modified files

  • common/config.py — 3 new env vars (SKILLS_ENABLED, SKILLS_DIR, SKILLS_EXEC_TIMEOUT)
  • api_controller.py — startup/shutdown hooks + skill tool injection
  • responses_service.py — skill routing at all tool execution points
  • chat_completions_service.py — skill routing in streaming and non-streaming handlers
  • pyproject.toml — added pyyaml>=6.0 dependency
  • README.md — Agent Skills documentation section

Test plan

  • All 25 new skill manager tests pass (pytest tests/test_skill_manager.py)
  • Full test suite passes (205/205 tests)
  • Flake8 clean (no new issues beyond existing E501 style)
  • Bandit security scan: 0 issues
  • Manual test: create a test skill, start server with SKILLS_ENABLED=true, verify tool injection and execution

🤖 Generated with Claude Code

  • Claude

OriNachum and others added 4 commits March 31, 2026 07:43
…am reliability (#44)

Add the missing `content_index` field to `response.output_text.delta` SSE
events so clients that validate against the OpenAI Responses API spec
(e.g. ChatKit SDK) no longer fail with a Pydantic validation error.

Also introduces SSE heartbeat keepalives, configurable stream timeout,
and structured stream timing logs to improve reliability with slow backends.

Closes #44

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add try/finally to _with_heartbeat() to cancel in-flight tasks and
  close the underlying async iterator on cancellation/exit
- Guard against interval <= 0 to prevent tight heartbeat loops
- Fix trailing whitespace and missing newline in config.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- S7497: Re-raise asyncio.CancelledError after cleanup instead of
  swallowing it
- B110: Replace bare except/pass with logger.debug for aclose errors
- S5806: Rename `aiter` to `inner` to avoid shadowing the builtin
- S3776: Extract cleanup logic to _cleanup_heartbeat() to reduce
  cognitive complexity below threshold

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tories

Add SkillManager that discovers, caches, and executes agent skills from a
configurable directory. Skills follow the Claude Code standard (SKILL.md +
scripts/ + references/). Each skill is exposed as a callable tool alongside
MCP tools. Feature is off by default (SKILLS_ENABLED=false) with path
confinement, no-shell execution, and configurable timeout for security.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 31, 2026 06:19
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Add agent skills execution engine with streaming reliability improvements

✨ Enhancement 🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Add SkillManager singleton for discovering and executing agent skills from configurable directory
• Inject skill tools alongside MCP tools in chat completions and responses endpoints
• Implement SSE heartbeat keepalives and configurable stream timeout for reliability
• Add missing content_index field to streamed text delta events for OpenAI spec compliance
• Implement path traversal protection and execution timeout for skill security
Diagram
flowchart LR
  A["SKILLS_DIR"] -->|scan & parse| B["SkillManager"]
  B -->|cache tools| C["skill_functions_cache"]
  C -->|inject| D["Chat Completions"]
  D -->|execute| E["Skill Scripts"]
  E -->|return output| F["Tool Result"]
  G["LLM Stream"] -->|_with_heartbeat| H["SSE Events"]
  H -->|content_index| I["OutputTextDelta"]
Loading

Grey Divider

File Changes

1. src/open_responses_server/common/skill_manager.py ✨ Enhancement +318/-0

New SkillManager for skill discovery and execution

src/open_responses_server/common/skill_manager.py


2. src/open_responses_server/api_controller.py ✨ Enhancement +119/-9

Add heartbeat wrapper and skill tool injection

src/open_responses_server/api_controller.py


3. src/open_responses_server/chat_completions_service.py ✨ Enhancement +29/-10

Route skill tool execution in streaming and non-streaming

src/open_responses_server/chat_completions_service.py


View more (11)
4. src/open_responses_server/responses_service.py ✨ Enhancement +93/-37

Add skill execution and content_index to deltas

src/open_responses_server/responses_service.py


5. src/open_responses_server/common/config.py ⚙️ Configuration changes +15/-1

Add skill and streaming configuration variables

src/open_responses_server/common/config.py


6. src/open_responses_server/common/llm_client.py ⚙️ Configuration changes +2/-2

Use configurable STREAM_TIMEOUT instead of hardcoded value

src/open_responses_server/common/llm_client.py


7. src/open_responses_server/models/responses_models.py 🐞 Bug fix +1/-0

Add missing content_index field to OutputTextDelta

src/open_responses_server/models/responses_models.py


8. tests/test_skill_manager.py 🧪 Tests +390/-0

Comprehensive tests for skill discovery and execution

tests/test_skill_manager.py


9. tests/test_api_controller_endpoints.py 🧪 Tests +65/-1

Add tests for _with_heartbeat async generator wrapper

tests/test_api_controller_endpoints.py


10. tests/test_chat_completions_service.py 🧪 Tests +2/-2

Update error messages to reference server tools

tests/test_chat_completions_service.py


11. tests/test_responses_service.py 🧪 Tests +6/-0

Add content_index assertions to delta event tests

tests/test_responses_service.py


12. pyproject.toml Dependencies +1/-1

Add pyyaml dependency for SKILL.md parsing

pyproject.toml


13. .env.example 📝 Documentation +4/-0

Document streaming and skill configuration options

.env.example


14. README.md 📝 Documentation +57/-1

Add Agent Skills section with setup and security docs

README.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Mar 31, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Skill failure logged success 🐞 Bug ≡ Correctness
Description
SkillManager.execute_skill_tool() returns an error string when the script exits non-zero instead of
raising, so callers log “executed successfully” and treat the failure as a normal tool result. This
contradicts the method’s contract (“execution fails” should raise) and breaks consistent error
handling versus MCP tools.
Code

src/open_responses_server/common/skill_manager.py[R281-291]

+            if process.returncode != 0:
+                error_msg = (
+                    f"Script '{script_name}' exited with code {process.returncode}"
+                )
+                if stderr_text:
+                    error_msg += f"\nstderr: {stderr_text}"
+                if stdout_text:
+                    error_msg += f"\nstdout: {stdout_text}"
+                logger.warning(f"[SKILLS-EXEC] {error_msg}")
+                return error_msg
+
Evidence
execute_skill_tool() explicitly documents that execution failures raise RuntimeError, but on
non-zero returncode it returns an error string. Both chat_completions_service and responses_service
treat any returned value as a success path (logging ✓ and forwarding the string) because no
exception is raised.

src/open_responses_server/common/skill_manager.py[219-224]
src/open_responses_server/common/skill_manager.py[281-291]
src/open_responses_server/chat_completions_service.py[63-70]
src/open_responses_server/responses_service.py[618-625]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`SkillManager.execute_skill_tool()` returns a string on non-zero exit instead of raising, which causes callers to log success and treat failures as normal outputs.

### Issue Context
The docstring states failures should raise `RuntimeError`, and tool execution call sites rely on exceptions to produce error payloads.

### Fix Focus Areas
- src/open_responses_server/common/skill_manager.py[219-224]
- src/open_responses_server/common/skill_manager.py[281-291]
- src/open_responses_server/chat_completions_service.py[63-72]
- src/open_responses_server/responses_service.py[618-626]

### Suggested change
Change the non-zero returncode branch to `raise RuntimeError(error_msg)` (or return a structured `{error:..., returncode:...}` consistently and adjust callers). Update callers’ logs to only emit “✓ executed successfully” after confirmed success.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Skills not injected🐞 Bug ≡ Correctness
Description
The /v1/chat/completions endpoint never injects skill tool definitions into the request, so the LLM
cannot discover/call skills unless the client supplies them manually. This makes the newly-added
skill execution routing largely unreachable for direct chat.completions traffic.
Code

src/open_responses_server/chat_completions_service.py[R63-75]

+                    elif skill_manager.is_skill_tool(tool_name):
+                        logger.info(f"[CHAT-COMPLETIONS-NON-STREAM] Executing skill tool: {tool_name}")
+                        try:
+                            arguments = json.loads(function_call.get("arguments", "{}"))
+                            result = await skill_manager.execute_skill_tool(tool_name, arguments)
+                            logger.info(f"[CHAT-COMPLETIONS-NON-STREAM] ✓ Skill tool {tool_name} executed successfully")
+                            tool_content = result
+                        except Exception as e:
+                            logger.error(f"[CHAT-COMPLETIONS-NON-STREAM] ✗ Error executing skill tool {tool_name}: {e}")
+                            tool_content = json.dumps({"error": f"Error executing skill tool: {e}"})
                    else:
-                        logger.warning(f"[CHAT-COMPLETIONS-NON-STREAM] Tool '{tool_name}' is not a registered MCP tool.")
-                        tool_content = json.dumps({"error": f"Tool '{tool_name}' is not a registered MCP tool."})
+                        logger.warning(f"[CHAT-COMPLETIONS-NON-STREAM] Tool '{tool_name}' is not a registered server tool.")
+                        tool_content = json.dumps({"error": f"Tool '{tool_name}' is not a registered server tool."})
Evidence
Skill execution is now routed in chat_completions_service when a tool call name matches a registered
skill, but handle_chat_completions() only injects MCP tools (not skills) into outgoing requests. In
contrast, api_controller injects skill tools for the /responses flow, so behavior differs by
endpoint.

src/open_responses_server/chat_completions_service.py[214-246]
src/open_responses_server/api_controller.py[163-176]
src/open_responses_server/chat_completions_service.py[59-75]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`/v1/chat/completions` requests only get MCP tools injected, not skills. As a result, skills are not discoverable by the model on this endpoint.

### Issue Context
Skills are injected for the `/responses` endpoint in `api_controller.py`, but `handle_chat_completions()` injects only MCP tools.

### Fix Focus Areas
- src/open_responses_server/chat_completions_service.py[214-246]
- src/open_responses_server/api_controller.py[163-176]

### Suggested change
In `handle_chat_completions()`, after MCP injection, merge in `skill_manager.skill_functions_cache` using the same `{ "type": "function", "function": <tool_def> }` schema and de-dupe by function name.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Duplicate skills on startup 🐞 Bug ☼ Reliability
Description
startup_skills() never clears existing state before scanning, so calling it more than once
accumulates duplicates in self.skills and regenerates caches from the expanded list. This can lead
to duplicated tool definitions and unbounded growth in long-running processes that re-run startup
logic (tests/hot reload/in-process restart).
Code

src/open_responses_server/common/skill_manager.py[R87-107]

+    async def startup_skills(self) -> None:
+        """Scan SKILLS_DIR, parse each skill, and populate caches.
+
+        No-op when SKILLS_ENABLED is False or SKILLS_DIR is empty.
+        """
+        if not SKILLS_ENABLED:
+            logger.info("[SKILLS-STARTUP] Agent skills are disabled (SKILLS_ENABLED=false)")
+            return
+
+        if not SKILLS_DIR:
+            logger.warning("[SKILLS-STARTUP] SKILLS_ENABLED is true but SKILLS_DIR is empty")
+            return
+
+        skills_path = Path(SKILLS_DIR)
+        if not skills_path.is_dir():
+            logger.warning(f"[SKILLS-STARTUP] SKILLS_DIR does not exist or is not a directory: {skills_path}")
+            return
+
+        logger.info(f"[SKILLS-STARTUP] Scanning skills directory: {skills_path}")
+
+        for entry in sorted(skills_path.iterdir()):
Evidence
startup_skills() appends discovered skills to self.skills, and only shutdown_skills() clears state.
If startup_skills() is invoked multiple times without shutdown (common in test/dev), the list grows
and _build_caches() is built from the duplicated list.

src/open_responses_server/common/skill_manager.py[87-160]
src/open_responses_server/common/skill_manager.py[161-167]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`startup_skills()` appends to `self.skills` without clearing, so repeated calls duplicate skills and tool definitions.

### Issue Context
`shutdown_skills()` clears state, but it may not run between multiple startup invocations in tests/dev.

### Fix Focus Areas
- src/open_responses_server/common/skill_manager.py[87-107]

### Suggested change
At the beginning of `startup_skills()` (after the SKILLS_ENABLED/SKILLS_DIR checks pass), clear `self.skills`, `self.skill_functions_cache`, and `self._skill_name_mapping` (or call `await self.shutdown_skills()`/a new internal reset helper) before scanning.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Skill allowlist not enforced 🐞 Bug ⛨ Security
Description
execute_skill_tool() does not enforce that arguments['script'] is one of the discovered/advertised
available_scripts, so the server may execute any executable file placed in scripts/ even if it was
not part of the cached tool definition. It also assumes args is a string and passes it to
shlex.split(), which can crash on non-string types from tool calls.
Code

src/open_responses_server/common/skill_manager.py[R229-259]

+        script_name = arguments.get("script", "")
+        args_str = arguments.get("args", "")
+
+        if not script_name:
+            raise RuntimeError(f"No 'script' argument provided for skill '{skill.name}'")
+
+        # Path validation: prevent directory traversal
+        script_path = (skill.scripts_dir / script_name).resolve()
+        scripts_dir_resolved = skill.scripts_dir.resolve()
+
+        if not str(script_path).startswith(str(scripts_dir_resolved) + os.sep) and script_path != scripts_dir_resolved:
+            logger.error(
+                f"[SKILLS-EXEC] Path traversal attempt blocked: "
+                f"'{script_name}' resolves outside scripts dir"
+            )
+            raise RuntimeError(f"Invalid script path: '{script_name}'")
+
+        if not script_path.is_file():
+            raise RuntimeError(
+                f"Script '{script_name}' not found in skill '{skill.name}'. "
+                f"Available: {skill.available_scripts}"
+            )
+
+        if not os.access(script_path, os.X_OK):
+            raise RuntimeError(f"Script '{script_name}' is not executable")
+
+        # Build command
+        cmd = [str(script_path)]
+        if args_str:
+            cmd.extend(shlex.split(args_str))
+
Evidence
Tool definitions explicitly enumerate allowed scripts, but execution only checks path confinement +
existence + executable bit; it never checks membership in skill.available_scripts. Also args_str is
taken directly from arguments and conditionally passed to shlex.split() without type normalization.

src/open_responses_server/common/skill_manager.py[190-206]
src/open_responses_server/common/skill_manager.py[229-259]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Skill execution does not enforce the advertised script allowlist and assumes `args` is always a string.

### Issue Context
`_build_caches()` publishes an enum allowlist via `available_scripts`, but `execute_skill_tool()` only does filesystem checks. Tool-call arguments may be malformed (e.g., args as list).

### Fix Focus Areas
- src/open_responses_server/common/skill_manager.py[190-206]
- src/open_responses_server/common/skill_manager.py[229-259]

### Suggested change
1) Reject `script_name` unless it is in `skill.available_scripts` (or regenerate discovery dynamically if you intentionally want late-added scripts).
2) Normalize `args`:
- If `args` is a list, append items directly (after `str()` coercion).
- If `args` is a string, keep `shlex.split`.
- Otherwise raise a `RuntimeError` with a clear message.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an Agent Skills execution engine that discovers local “skills” from SKILLS_DIR, exposes them as skill__<name> tools alongside MCP tools, and routes tool calls to server-side execution with configurable timeouts and SSE heartbeats.

Changes:

  • Introduces SkillManager for skill discovery, tool schema generation, and secure script execution (path confinement, no shell).
  • Injects skill tools into /responses and /v1/chat/completions flows and adds routing to execute skill tools server-side.
  • Adds streaming controls (STREAM_TIMEOUT, SSE heartbeat interval) and expands tests to cover new behavior.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/open_responses_server/common/skill_manager.py Implements skill discovery, caching, and script execution for skill__* tools.
src/open_responses_server/api_controller.py Starts/stops skills on app lifecycle; injects skill tools; adds SSE heartbeat wrapper.
src/open_responses_server/responses_service.py Routes tool execution to MCP vs skills; adds content_index to deltas; improves stream timing logs.
src/open_responses_server/chat_completions_service.py Adds skill tool routing in streaming and non-streaming tool-call handlers; uses STREAM_TIMEOUT.
src/open_responses_server/common/config.py Adds streaming + skills env vars and logs them on startup.
src/open_responses_server/common/llm_client.py Uses configurable STREAM_TIMEOUT for the shared HTTPX client.
src/open_responses_server/models/responses_models.py Adds required content_index field to OutputTextDelta.
tests/test_skill_manager.py Adds test suite for skill parsing, discovery, execution, traversal, and timeouts.
tests/test_api_controller_endpoints.py Adds unit tests for the heartbeat wrapper.
tests/test_responses_service.py Updates streaming delta expectations to include content_index.
tests/test_chat_completions_service.py Updates error-message expectations for non-server tools.
README.md Documents Agent Skills usage and security model.
pyproject.toml Adds pyyaml>=6.0 dependency.
uv.lock Locks pyyaml in dependency graph.
.env.example Documents new streaming env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

OriNachum and others added 3 commits March 31, 2026 09:27
…on script failure

1. Add skill tool injection in handle_chat_completions() alongside MCP tools,
   so skills are discoverable via /v1/chat/completions endpoint.
2. Raise RuntimeError on non-zero script exit instead of returning error string,
   so callers correctly log failures and error handling matches MCP pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Reject symlinks pointing outside scripts/ during discovery (security).
2. Validate skill names against tool-name rules (alphanumeric/hyphens/underscores, max 64 chars).
3. Include stderr in output on successful execution (preserves warnings).
4. Add SKILLS_* vars to .env.example for discoverability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
65.7% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants