Skip to content

feat: add skills for specific tools#366

Open
bearsyankees wants to merge 41 commits intomainfrom
load-skills
Open

feat: add skills for specific tools#366
bearsyankees wants to merge 41 commits intomainfrom
load-skills

Conversation

@bearsyankees
Copy link
Collaborator

No description provided.

bearsyankees and others added 30 commits March 8, 2026 02:10
LiteLLM OTel callbacks are not enabled, so _build_trace_metadata()
and its test are dead code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all streaming event emission (agent.streaming.updated,
cleared, interrupted) — high-frequency noise with no observability
value. Revert unnecessary i->index, loc->location renames.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run names are auto-generated by generate_run_name() which already
slugifies the input. The extra sanitization layer was redundant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Traceloop uses raw print() statements for "exporting traces to a
custom exporter" and "Metrics are disabled" messages. Redirect
stdout during init to suppress them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Now that traces are always written to the run dir, the output path
is always relevant — not just when vulnerabilities are found.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Base64 screenshot payloads from browser_action were bloating
events.jsonl (~10MB for small runs). Strip them at the sanitizer
level since DOM/HTML content is sufficient for debugging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docker SDK uses urllib3 for container API calls, which Traceloop
auto-instruments — flooding events.jsonl with noise.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Implemented `load_skill` function to dynamically load skills into the current agent at runtime.
- Enhanced agent context management to track loaded skills.
- Created XML schema documentation for the new `load_skill` tool.
- Added new skills documentation for various tools including `ffuf`, `httpx`, `katana`, `naabu`, `nmap`, `nuclei`, `sqlmap`, and `subfinder`.
- Introduced runtime tooling logic to detect and manage tool usage in commands.
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR introduces auto-loading of tool-specific CLI skill context for eight security tools (nmap, nuclei, httpx, ffuf, subfinder, naabu, katana, sqlmap). Skills are injected lazily in two ways: automatically when the agent is about to run a recognised tool via terminal_execute (_inject_runtime_tooling_context in base_agent.py), and explicitly on demand through a new load_skill LLM-callable tool. A lightweight shell-segment parser (runtime_tooling.py) detects the actual tool binary even through wrappers like sudo, env, timeout, and background operators. Previous review concerns (skill-name format mismatch, frozen=True mutability, requested_skills type inconsistency, single & separator) have all been addressed in this revision.

Key changes:

  • strix/skills/runtime_tooling.py — new module: shell parser, get_tooling_preflight, build_tooling_preflight_message, and canonical_runtime_skill_name
  • strix/agents/base_agent.py_inject_runtime_tooling_context intercepts terminal actions before each LLM turn
  • strix/llm/llm.pyadd_runtime_skills / _get_skills_to_load enable post-init system-prompt updates
  • strix/tools/agents_graph/agents_graph_actions.pyload_skill tool for explicit runtime loading
  • 8 new Markdown skill files under strix/skills/tooling/
  • UI renderer update to display skill-loading events as dim status lines instead of chat bubbles

Issues found:

  • _consume_timeout does not skip the value of the -s/--signal and -k/--kill-after flags. For a command like timeout -s TERM 60 nmap target, the signal name TERM is returned as the detected command, causing the nmap skill to be silently missed.

Confidence Score: 4/5

  • Safe to merge; one skill-detection miss in an edge-case timeout flag pattern does not affect correctness or security.
  • The core architecture is sound and all previously flagged issues have been addressed. The one new logic bug (_consume_timeout not consuming the -s SIGNAL value) is a skill auto-loading miss rather than a correctness or security problem — the command still executes normally, the skill just isn't injected. All other parsing paths, deduplication logic, context state management, and the new load_skill tool behave correctly.
  • strix/skills/runtime_tooling.py — specifically _consume_timeout (lines 567–575)

Important Files Changed

Filename Overview
strix/skills/runtime_tooling.py New module implementing shell-command parsing for auto-detecting security tool usage and lazily injecting skill context. Core logic is well-structured with good separator handling (;, \n, `
strix/agents/base_agent.py Adds _inject_runtime_tooling_context which intercepts terminal actions before each LLM turn, detects tool commands, and auto-loads the matching skills into the system prompt. The deduplication logic (using canonical paths stored in runtime_skills_loaded) is correct and consistent with how load_skill writes context. Returns True to short-circuit the loop only when new skills were actually injected.
strix/llm/llm.py Adds _runtime_skills list and add_runtime_skills method to the LLM class, allowing dynamic post-init skill injection by re-loading the system prompt. Deduplication is in-place; _get_skills_to_load merges config skills, runtime skills, and the scan-mode skill with proper ordering. Looks correct.
strix/tools/agents_graph/agents_graph_actions.py Adds load_skill tool function for explicit runtime skill loading. Addresses previous review concerns: raw skill names are validated (via load_skills which resolves short names against categories), then canonicalised before being applied; the context is updated with canonical paths, keeping it consistent with what _inject_runtime_tooling_context reads. The exception handler now correctly re-parses skills as a fallback instead of returning a raw string.
strix/interface/tool_components/user_message_renderer.py Adds rendering for <runtime_tool_skill_context> messages and reclassifies <inter_agent_message> as a "Context:" header rather than "You:". Logic is clean; minor wording issue in _format_runtime_skill_context when multiple tools are listed.
tests/skills/test_runtime_tooling.py Good coverage of the main happy paths: new-skill loading, wrapper/assignment stripping, single-& background operator splitting, help-flag detection, non-tool mention filtering, already-loaded deduplication, and message content. Notably includes a test for the & separator which was flagged in a previous review thread.
tests/tools/test_load_skill_tool.py Unit tests for the load_skill tool using lightweight dummy agents. Covers success, short-name canonicalisation, invalid-skill error, and missing-agent-instance error. Uses a safe pattern of saving/restoring _agent_instances in finally blocks.

Comments Outside Diff (1)

  1. strix/skills/runtime_tooling.py, line 567-575 (link)

    P2 _consume_timeout mishandles -s/--signal flag

    timeout -s SIGNAL duration cmd is a common pattern in security tooling to send a specific termination signal. The current implementation only skips flag tokens that start with -, but doesn't skip the value that follows value-taking flags like -s/--signal and -k/--kill-after.

    For the token stream ['timeout', '-s', 'TERM', '60', 'nmap', 'target']:

    1. -s starts with - → consumed (index → 2)
    2. TERM doesn't start with - → while loop exits
    3. _TIMEOUT_VALUE_RE.match('TERM') fails → index stays at 2
    4. Back in _extract_command_and_args: TERM is treated as the command name

    Because "term" isn't in _TOOL_SKILL_PATHS, no false-positive occurs, but the real command (nmap) is never inspected — its skill won't be auto-loaded.

    A targeted fix analogous to _consume_sudo:

    _TIMEOUT_FLAGS_WITH_VALUE = {"-s", "--signal", "-k", "--kill-after"}
    
    def _consume_timeout(tokens: list[str], start_index: int) -> int:
        index = start_index
        while index < len(tokens):
            part = tokens[index]
            if part in _TIMEOUT_FLAGS_WITH_VALUE and index + 1 < len(tokens):
                index += 2
                continue
            if part.startswith("-"):
                index += 1
                continue
            break
    
        if index < len(tokens) and _TIMEOUT_VALUE_RE.match(tokens[index]):
            index += 1
    
        return index
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: strix/skills/runtime_tooling.py
    Line: 567-575
    
    Comment:
    **`_consume_timeout` mishandles `-s`/`--signal` flag**
    
    `timeout -s SIGNAL duration cmd` is a common pattern in security tooling to send a specific termination signal. The current implementation only skips flag tokens that start with `-`, but doesn't skip the *value* that follows value-taking flags like `-s`/`--signal` and `-k`/`--kill-after`.
    
    For the token stream `['timeout', '-s', 'TERM', '60', 'nmap', 'target']`:
    1. `-s` starts with `-` → consumed (index → 2)
    2. `TERM` doesn't start with `-` → while loop exits
    3. `_TIMEOUT_VALUE_RE.match('TERM')` fails → index stays at 2
    4. Back in `_extract_command_and_args`: `TERM` is treated as the command name
    
    Because `"term"` isn't in `_TOOL_SKILL_PATHS`, no false-positive occurs, but the real command (`nmap`) is never inspected — its skill won't be auto-loaded.
    
    A targeted fix analogous to `_consume_sudo`:
    ```python
    _TIMEOUT_FLAGS_WITH_VALUE = {"-s", "--signal", "-k", "--kill-after"}
    
    def _consume_timeout(tokens: list[str], start_index: int) -> int:
        index = start_index
        while index < len(tokens):
            part = tokens[index]
            if part in _TIMEOUT_FLAGS_WITH_VALUE and index + 1 < len(tokens):
                index += 2
                continue
            if part.startswith("-"):
                index += 1
                continue
            break
    
        if index < len(tokens) and _TIMEOUT_VALUE_RE.match(tokens[index]):
            index += 1
    
        return index
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/skills/runtime_tooling.py
Line: 567-575

Comment:
**`_consume_timeout` mishandles `-s`/`--signal` flag**

`timeout -s SIGNAL duration cmd` is a common pattern in security tooling to send a specific termination signal. The current implementation only skips flag tokens that start with `-`, but doesn't skip the *value* that follows value-taking flags like `-s`/`--signal` and `-k`/`--kill-after`.

For the token stream `['timeout', '-s', 'TERM', '60', 'nmap', 'target']`:
1. `-s` starts with `-` → consumed (index → 2)
2. `TERM` doesn't start with `-` → while loop exits
3. `_TIMEOUT_VALUE_RE.match('TERM')` fails → index stays at 2
4. Back in `_extract_command_and_args`: `TERM` is treated as the command name

Because `"term"` isn't in `_TOOL_SKILL_PATHS`, no false-positive occurs, but the real command (`nmap`) is never inspected — its skill won't be auto-loaded.

A targeted fix analogous to `_consume_sudo`:
```python
_TIMEOUT_FLAGS_WITH_VALUE = {"-s", "--signal", "-k", "--kill-after"}

def _consume_timeout(tokens: list[str], start_index: int) -> int:
    index = start_index
    while index < len(tokens):
        part = tokens[index]
        if part in _TIMEOUT_FLAGS_WITH_VALUE and index + 1 < len(tokens):
            index += 2
            continue
        if part.startswith("-"):
            index += 1
            continue
        break

    if index < len(tokens) and _TIMEOUT_VALUE_RE.match(tokens[index]):
        index += 1

    return index
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/interface/tool_components/user_message_renderer.py
Line: 141-142

Comment:
**Singular "tool" label when multiple skills are loaded**

When multiple skills are loaded (e.g., `skill_names = "nmap, httpx"`), the rendered label reads *"getting context for tool nmap, httpx"*, which is grammatically incorrect. Consider pluralising when there is more than one tool name.

```suggestion
        count = skill_names.count(",") + 1
        label = "tools" if count > 1 else "tool"
        text.append(f"getting context for {label} {skill_names}", style="dim")
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: bdf8c04

@bearsyankees
Copy link
Collaborator Author

@greptileai

Comment on lines +218 to +224
if char == "|" and index + 1 < len(command) and command[index + 1] == "|":
index += 1

segment = "".join(current).strip()
if segment:
segments.append(segment)
current = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Single & background operator not treated as a segment separator

_split_shell_segments only splits on && (logical AND), ;, \n, and |/||. A lone & (background execution) is not treated as a separator. As a result, a command like:

nmap -sV target.tld & sqlmap -u http://target.tld --forms

is emitted as a single segment. _extract_command_and_args then calls shlex.split on the whole string and returns nmap as the command—sqlmap is never visited, so its skill is not auto-loaded.

Consider adding & as a segment separator in the & character handling block:

Suggested change
if char == "|" and index + 1 < len(command) and command[index + 1] == "|":
index += 1
segment = "".join(current).strip()
if segment:
segments.append(segment)
current = []
if char == "&" and index + 1 < len(command) and command[index + 1] == "&":
segment = "".join(current).strip()
if segment:
segments.append(segment)
current = []
index += 2
continue
if char == "&":
segment = "".join(current).strip()
if segment:
segments.append(segment)
current = []
index += 1
continue
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/skills/runtime_tooling.py
Line: 218-224

Comment:
**Single `&` background operator not treated as a segment separator**

`_split_shell_segments` only splits on `&&` (logical AND), `;`, `\n`, and `|`/`||`. A lone `&` (background execution) is not treated as a separator. As a result, a command like:

```
nmap -sV target.tld & sqlmap -u http://target.tld --forms
```

is emitted as a single segment. `_extract_command_and_args` then calls `shlex.split` on the whole string and returns `nmap` as the command—`sqlmap` is never visited, so its skill is not auto-loaded.

Consider adding `&` as a segment separator in the `&` character handling block:

```suggestion
        if char == "&" and index + 1 < len(command) and command[index + 1] == "&":
            segment = "".join(current).strip()
            if segment:
                segments.append(segment)
            current = []
            index += 2
            continue

        if char == "&":
            segment = "".join(current).strip()
            if segment:
                segments.append(segment)
            current = []
            index += 1
            continue
```

How can I resolve this? If you propose a fix, please make it concise.

bearsyankees and others added 2 commits March 16, 2026 21:34
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@bearsyankees
Copy link
Collaborator Author

@greptileai

@bearsyankees
Copy link
Collaborator Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants