feat: add skills for specific tools by bearsyankees · Pull Request #366 · usestrix/strix

bearsyankees · 2026-03-17T03:03:01Z

No description provided.

LiteLLM OTel callbacks are not enabled, so _build_trace_metadata() and its test are dead code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove all streaming event emission (agent.streaming.updated, cleared, interrupted) — high-frequency noise with no observability value. Revert unnecessary i->index, loc->location renames. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Run names are auto-generated by generate_run_name() which already slugifies the input. The extra sanitization layer was redundant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Traceloop uses raw print() statements for "exporting traces to a custom exporter" and "Metrics are disabled" messages. Redirect stdout during init to suppress them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Now that traces are always written to the run dir, the output path is always relevant — not just when vulnerabilities are found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Base64 screenshot payloads from browser_action were bloating events.jsonl (~10MB for small runs). Strip them at the sanitizer level since DOM/HTML content is sufficient for debugging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Docker SDK uses urllib3 for container API calls, which Traceloop auto-instruments — flooding events.jsonl with noise. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Implemented `load_skill` function to dynamically load skills into the current agent at runtime. - Enhanced agent context management to track loaded skills. - Created XML schema documentation for the new `load_skill` tool. - Added new skills documentation for various tools including `ffuf`, `httpx`, `katana`, `naabu`, `nmap`, `nuclei`, `sqlmap`, and `subfinder`. - Introduced runtime tooling logic to detect and manage tool usage in commands.

greptile-apps · 2026-03-17T03:10:44Z

Greptile Summary

This PR introduces auto-loading of tool-specific CLI skill context for eight security tools (nmap, nuclei, httpx, ffuf, subfinder, naabu, katana, sqlmap). Skills are injected lazily in two ways: automatically when the agent is about to run a recognised tool via terminal_execute (_inject_runtime_tooling_context in base_agent.py), and explicitly on demand through a new load_skill LLM-callable tool. A lightweight shell-segment parser (runtime_tooling.py) detects the actual tool binary even through wrappers like sudo, env, timeout, and background operators. Previous review concerns (skill-name format mismatch, frozen=True mutability, requested_skills type inconsistency, single & separator) have all been addressed in this revision.

Key changes:

strix/skills/runtime_tooling.py — new module: shell parser, get_tooling_preflight, build_tooling_preflight_message, and canonical_runtime_skill_name
strix/agents/base_agent.py — _inject_runtime_tooling_context intercepts terminal actions before each LLM turn
strix/llm/llm.py — add_runtime_skills / _get_skills_to_load enable post-init system-prompt updates
strix/tools/agents_graph/agents_graph_actions.py — load_skill tool for explicit runtime loading
8 new Markdown skill files under strix/skills/tooling/
UI renderer update to display skill-loading events as dim status lines instead of chat bubbles

Issues found:

_consume_timeout does not skip the value of the -s/--signal and -k/--kill-after flags. For a command like timeout -s TERM 60 nmap target, the signal name TERM is returned as the detected command, causing the nmap skill to be silently missed.

Confidence Score: 4/5

Safe to merge; one skill-detection miss in an edge-case timeout flag pattern does not affect correctness or security.
The core architecture is sound and all previously flagged issues have been addressed. The one new logic bug (_consume_timeout not consuming the -s SIGNAL value) is a skill auto-loading miss rather than a correctness or security problem — the command still executes normally, the skill just isn't injected. All other parsing paths, deduplication logic, context state management, and the new load_skill tool behave correctly.
strix/skills/runtime_tooling.py — specifically _consume_timeout (lines 567–575)

Important Files Changed

Filename	Overview
strix/skills/runtime_tooling.py	New module implementing shell-command parsing for auto-detecting security tool usage and lazily injecting skill context. Core logic is well-structured with good separator handling (`;`, `\n`, `
strix/agents/base_agent.py	Adds `_inject_runtime_tooling_context` which intercepts terminal actions before each LLM turn, detects tool commands, and auto-loads the matching skills into the system prompt. The deduplication logic (using canonical paths stored in `runtime_skills_loaded`) is correct and consistent with how `load_skill` writes context. Returns `True` to short-circuit the loop only when new skills were actually injected.
strix/llm/llm.py	Adds `_runtime_skills` list and `add_runtime_skills` method to the LLM class, allowing dynamic post-init skill injection by re-loading the system prompt. Deduplication is in-place; `_get_skills_to_load` merges config skills, runtime skills, and the scan-mode skill with proper ordering. Looks correct.
strix/tools/agents_graph/agents_graph_actions.py	Adds `load_skill` tool function for explicit runtime skill loading. Addresses previous review concerns: raw skill names are validated (via `load_skills` which resolves short names against categories), then canonicalised before being applied; the context is updated with canonical paths, keeping it consistent with what `_inject_runtime_tooling_context` reads. The exception handler now correctly re-parses `skills` as a fallback instead of returning a raw string.
strix/interface/tool_components/user_message_renderer.py	Adds rendering for `<runtime_tool_skill_context>` messages and reclassifies `<inter_agent_message>` as a "Context:" header rather than "You:". Logic is clean; minor wording issue in `_format_runtime_skill_context` when multiple tools are listed.
tests/skills/test_runtime_tooling.py	Good coverage of the main happy paths: new-skill loading, wrapper/assignment stripping, single-`&` background operator splitting, help-flag detection, non-tool mention filtering, already-loaded deduplication, and message content. Notably includes a test for the `&` separator which was flagged in a previous review thread.
tests/tools/test_load_skill_tool.py	Unit tests for the `load_skill` tool using lightweight dummy agents. Covers success, short-name canonicalisation, invalid-skill error, and missing-agent-instance error. Uses a safe pattern of saving/restoring `_agent_instances` in `finally` blocks.

Comments Outside Diff (1)

strix/skills/runtime_tooling.py, line 567-575 (link)

_consume_timeout mishandles -s/--signal flag

timeout -s SIGNAL duration cmd is a common pattern in security tooling to send a specific termination signal. The current implementation only skips flag tokens that start with -, but doesn't skip the value that follows value-taking flags like -s/--signal and -k/--kill-after.

For the token stream ['timeout', '-s', 'TERM', '60', 'nmap', 'target']:

-s starts with - → consumed (index → 2)
TERM doesn't start with - → while loop exits
_TIMEOUT_VALUE_RE.match('TERM') fails → index stays at 2
Back in _extract_command_and_args: TERM is treated as the command name

Because "term" isn't in _TOOL_SKILL_PATHS, no false-positive occurs, but the real command (nmap) is never inspected — its skill won't be auto-loaded.

A targeted fix analogous to _consume_sudo:

_TIMEOUT_FLAGS_WITH_VALUE = {"-s", "--signal", "-k", "--kill-after"}

def _consume_timeout(tokens: list[str], start_index: int) -> int:
    index = start_index
    while index < len(tokens):
        part = tokens[index]
        if part in _TIMEOUT_FLAGS_WITH_VALUE and index + 1 < len(tokens):
            index += 2
            continue
        if part.startswith("-"):
            index += 1
            continue
        break

    if index < len(tokens) and _TIMEOUT_VALUE_RE.match(tokens[index]):
        index += 1

    return index

Prompt To Fix With AI

This is a comment left during a code review.
Path: strix/skills/runtime_tooling.py
Line: 567-575

Comment:
**`_consume_timeout` mishandles `-s`/`--signal` flag**

`timeout -s SIGNAL duration cmd` is a common pattern in security tooling to send a specific termination signal. The current implementation only skips flag tokens that start with `-`, but doesn't skip the *value* that follows value-taking flags like `-s`/`--signal` and `-k`/`--kill-after`.

For the token stream `['timeout', '-s', 'TERM', '60', 'nmap', 'target']`:
1. `-s` starts with `-` → consumed (index → 2)
2. `TERM` doesn't start with `-` → while loop exits
3. `_TIMEOUT_VALUE_RE.match('TERM')` fails → index stays at 2
4. Back in `_extract_command_and_args`: `TERM` is treated as the command name

Because `"term"` isn't in `_TOOL_SKILL_PATHS`, no false-positive occurs, but the real command (`nmap`) is never inspected — its skill won't be auto-loaded.

A targeted fix analogous to `_consume_sudo`:
```python
_TIMEOUT_FLAGS_WITH_VALUE = {"-s", "--signal", "-k", "--kill-after"}

def _consume_timeout(tokens: list[str], start_index: int) -> int:
    index = start_index
    while index < len(tokens):
        part = tokens[index]
        if part in _TIMEOUT_FLAGS_WITH_VALUE and index + 1 < len(tokens):
            index += 2
            continue
        if part.startswith("-"):
            index += 1
            continue
        break

    if index < len(tokens) and _TIMEOUT_VALUE_RE.match(tokens[index]):
        index += 1

    return index
```

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: strix/skills/runtime_tooling.py
Line: 567-575

Comment:
**`_consume_timeout` mishandles `-s`/`--signal` flag**

`timeout -s SIGNAL duration cmd` is a common pattern in security tooling to send a specific termination signal. The current implementation only skips flag tokens that start with `-`, but doesn't skip the *value* that follows value-taking flags like `-s`/`--signal` and `-k`/`--kill-after`.

For the token stream `['timeout', '-s', 'TERM', '60', 'nmap', 'target']`:
1. `-s` starts with `-` → consumed (index → 2)
2. `TERM` doesn't start with `-` → while loop exits
3. `_TIMEOUT_VALUE_RE.match('TERM')` fails → index stays at 2
4. Back in `_extract_command_and_args`: `TERM` is treated as the command name

Because `"term"` isn't in `_TOOL_SKILL_PATHS`, no false-positive occurs, but the real command (`nmap`) is never inspected — its skill won't be auto-loaded.

A targeted fix analogous to `_consume_sudo`:
```python
_TIMEOUT_FLAGS_WITH_VALUE = {"-s", "--signal", "-k", "--kill-after"}

def _consume_timeout(tokens: list[str], start_index: int) -> int:
    index = start_index
    while index < len(tokens):
        part = tokens[index]
        if part in _TIMEOUT_FLAGS_WITH_VALUE and index + 1 < len(tokens):
            index += 2
            continue
        if part.startswith("-"):
            index += 1
            continue
        break

    if index < len(tokens) and _TIMEOUT_VALUE_RE.match(tokens[index]):
        index += 1

    return index
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/interface/tool_components/user_message_renderer.py
Line: 141-142

Comment:
**Singular "tool" label when multiple skills are loaded**

When multiple skills are loaded (e.g., `skill_names = "nmap, httpx"`), the rendered label reads *"getting context for tool nmap, httpx"*, which is grammatically incorrect. Consider pluralising when there is more than one tool name.

```suggestion
        count = skill_names.count(",") + 1
        label = "tools" if count > 1 else "tool"
        text.append(f"getting context for {label} {skill_names}", style="dim")
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: bdf8c04}

strix/tools/agents_graph/agents_graph_actions.py

strix/skills/runtime_tooling.py

…canonicalization for skill names

bearsyankees · 2026-03-17T03:27:29Z

@greptileai

strix/tools/agents_graph/agents_graph_actions.py

greptile-apps · 2026-03-17T03:33:35Z

strix/skills/runtime_tooling.py

+            if char == "|" and index + 1 < len(command) and command[index + 1] == "|":
+                index += 1
+
+            segment = "".join(current).strip()
+            if segment:
+                segments.append(segment)
+            current = []


Single & background operator not treated as a segment separator

_split_shell_segments only splits on && (logical AND), ;, \n, and |/||. A lone & (background execution) is not treated as a separator. As a result, a command like:

nmap -sV target.tld & sqlmap -u http://target.tld --forms

is emitted as a single segment. _extract_command_and_args then calls shlex.split on the whole string and returns nmap as the command—sqlmap is never visited, so its skill is not auto-loaded.

Consider adding & as a segment separator in the & character handling block:

Suggested change

if char == "|" and index + 1 < len(command) and command[index + 1] == "|":

index += 1

segment = "".join(current).strip()

if segment:

segments.append(segment)

current = []

if char == "&" and index + 1 < len(command) and command[index + 1] == "&":

segment = "".join(current).strip()

if segment:

segments.append(segment)

current = []

index += 2

continue

if char == "&":

segment = "".join(current).strip()

if segment:

segments.append(segment)

current = []

index += 1

continue

Prompt To Fix With AI

This is a comment left during a code review. Path: strix/skills/runtime_tooling.py Line: 218-224 Comment: **Single `&` background operator not treated as a segment separator** `_split_shell_segments` only splits on `&&` (logical AND), `;`, `\n`, and `|`/`||`. A lone `&` (background execution) is not treated as a separator. As a result, a command like: ``` nmap -sV target.tld & sqlmap -u http://target.tld --forms ``` is emitted as a single segment. `_extract_command_and_args` then calls `shlex.split` on the whole string and returns `nmap` as the command—`sqlmap` is never visited, so its skill is not auto-loaded. Consider adding `&` as a segment separator in the `&` character handling block: ```suggestion if char == "&" and index + 1 < len(command) and command[index + 1] == "&": segment = "".join(current).strip() if segment: segments.append(segment) current = [] index += 2 continue if char == "&": segment = "".join(current).strip() if segment: segments.append(segment) current = [] index += 1 continue ``` How can I resolve this? If you propose a fix, please make it concise.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

bearsyankees · 2026-03-17T03:50:54Z

@greptileai

…tooling preflight

bearsyankees · 2026-03-17T03:52:27Z

@greptileai

bearsyankees and others added 30 commits March 8, 2026 02:10

feat: add OpenTelemetry observability with local JSONL traces

7acad5a

fix greptile findings

827a4f6

fix greptile #2

86a33cf

fix greptile #3

a1cd693

fix greptile #4

80a4d5a

make traceloop optional extra

655b1f6

fix stale run attrs in otel resource

b083ef1

remove redundant get_run_dir call

84134c3

fix otlp fallback auth header merge

dc21877

make retention pruning process-scoped

fec8f0d

scope events write locks per file

5e8cb70

make otlp http exporter optional

6d69191

sanitize run dir names to prevent traversal

4822845

split otel and posthog telemetry flags

068bb5d

ahmed's comments

ebe21fb

refactor: move stuff to utils

f257627

refactor: move core setup telemetry

3fd49b3

add back

9f42719

no llm callback

ffeb77a

better log pruning

5c18204

refactor: remove unused LiteLLM trace metadata scaffolding

388c964

LiteLLM OTel callbacks are not enabled, so _build_trace_metadata() and its test are dead code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: remove sanitize_run_dir_name (input already slugified)

e2f6691

Run names are auto-generated by generate_run_name() which already slugifies the input. The extra sanitization layer was redundant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: suppress Traceloop SDK print noise during init

505eb96

Traceloop uses raw print() statements for "exporting traces to a custom exporter" and "Metrics are disabled" messages. Redirect stdout during init to suppress them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: always show output path in CLI completion panel

e009780

Now that traces are always written to the run dir, the output path is always relevant — not just when vulnerabilities are found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: block urllib3/requests from OTel instrumentation

8d0d8cc

Docker SDK uses urllib3 for container API calls, which Traceloop auto-instruments — flooding events.jsonl with noise. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: remove remote OTEL export example from config docs

0fd721a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: remove path traversal test (sanitize_run_dir_name was removed)

0423b0f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

revert: restore original telemetry README.md

9903780

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bearsyankees added 2 commits March 16, 2026 21:01

Merge remote-tracking branch 'origin/main' into load-skills

4821956

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

strix/tools/agents_graph/agents_graph_actions.py Outdated Show resolved Hide resolved

strix/skills/runtime_tooling.py Outdated Show resolved Hide resolved

feat: update tooling preflight data structures to use tuples and add …

490d3fe

…canonicalization for skill names

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

bearsyankees and others added 2 commits March 16, 2026 21:34

Update strix/tools/agents_graph/agents_graph_actions.py

84b0e30

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

fix: fix grep comment

a6c58be

feat: add support for splitting commands with background operator in …

bdf8c04

…tooling preflight

bearsyankees added 5 commits March 16, 2026 22:28

more katana

6fb3b1d

more katana

df7ebc7

update tools

eebf8ce

refactor: move load skill out of agent graph and add to httpx

22d0ea7

update nmap documentation for sandbox stability

dd445d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add skills for specific tools#366

feat: add skills for specific tools#366
bearsyankees wants to merge 41 commits intomainfrom
load-skills

bearsyankees commented Mar 17, 2026

Uh oh!

greptile-apps bot commented Mar 17, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

bearsyankees commented Mar 17, 2026

Uh oh!

Uh oh!

greptile-apps bot Mar 17, 2026

Uh oh!

bearsyankees commented Mar 17, 2026

Uh oh!

bearsyankees commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bearsyankees commented Mar 17, 2026

Uh oh!

greptile-apps bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

bearsyankees commented Mar 17, 2026

Uh oh!

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

bearsyankees commented Mar 17, 2026

Uh oh!

bearsyankees commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Mar 17, 2026 •

edited

Loading