Skip to content

[FEATURE] Grep Tool (ripgrep-powered search) #56

@edenreich

Description

@edenreich

Summary

Add a first-class Grep tool that provides fast, permission-safe, regex-capable search over repository files. It wraps ripgrep (rg) under the hood and exposes a consistent JSON input schema with multiple output modes. This replaces any need to shell out to grep/rg from Bash tools.

Goals

  • Fast, reliable code/content search with proper sandboxed permissions.
  • Consistent, machine-friendly outputs for downstream tools.
  • Clear ergonomics for common tasks: file filtering, regex, counts, context lines, and multiline patterns.

Tool Contract

Name: Grep
Engine: ripgrep

Behavioral Notes

  • Always use Grep for search tasks. Never invoke grep or rg via Bash.
  • Full regex supported (e.g., log.*Error, function\\s+\\w+).
  • File filtering via glob or type (ripgrep’s built-in type filtering).
  • Output modes:
    • files_with_matches (default): list matching file paths.
    • content: matching lines; supports -A/-B/-C context and -n for line numbers.
    • count: per-file match counts.
  • Multiline is off by default; enable with multiline: true for dotall patterns.
  • Pattern syntax is ripgrep-compatible. Escape literal braces (e.g., interface\\{\\} for Go).

Input Schema (JSON Schema draft-07)

  • pattern (string, required): regex to search in file contents.
  • path (string, optional): file/dir to search; default CWD.
  • glob (string, optional): maps to --glob (e.g., "*.{ts,tsx}").
  • type (string, optional): ripgrep --type (e.g., js, py, rust).
  • output_mode (enum: content, files_with_matches, count; default files_with_matches).
  • -i (boolean): case-insensitive.
  • -n (boolean): show line numbers (content mode only).
  • -A, -B, -C (numbers): context lines (content mode only).
  • multiline (boolean): enables -U --multiline-dotall.
  • head_limit (number): limit results across all modes (like | head -N).

Example Calls

  1. Files containing “TODO” in TS/TSX:
{"pattern":"TODO","type":"ts","output_mode":"files_with_matches","head_limit":50}
  1. Show matches with 2 lines of context, with line numbers:
{"pattern":"log\\s*\\.(error|warn)","glob":"**/*.go","output_mode":"content","-n":true,"-C":2}
  1. Multiline struct search in Go:
{"pattern":"type\\s+\\w+\\s+struct\\s*\\{[\\s\\S]*?ID\\s+int","type":"go","multiline":true,"output_mode":"files_with_matches"}
  1. Count matches per file:
{"pattern":"\\bFIXME\\b","output_mode":"count"}

Output Shapes

  • files_with_matches:
{"files":["pkg/a.go","pkg/b.go"],"truncated":false}
  • content:
{
  "matches": [
    {"file":"pkg/a.go","line":123,"text":"log.Error(\"boom\")"},
    {"file":"pkg/a.go","line":124,"text":"return err"}
  ],
  "truncated": false
}
  • count:
{"counts":[{"file":"pkg/a.go","count":3},{"file":"pkg/b.go","count":0}],"truncated":false}

Where truncated reflects head_limit-based truncation.

UX Rules

  • If output_mode is content, enable -A/-B/-C and -n; otherwise ignore them.
  • Prefer type over glob for standard languages (performance).
  • If both type and glob are provided, apply both (intersection).
  • Default path to CWD; must respect workspace allowlist.

Security & Permissions

  • Run within the tool’s sandboxed FS view, honoring allow/deny lists.
  • No network access; no process spawning.
  • Enforce path confinement to prevent directory traversal outside allowed roots.

Performance

  • Stream ripgrep results and encode incrementally.
  • Apply head_limit server-side to minimize memory use.
  • Cache file type mappings; reuse compiled regex where safe.
  • Set sensible default thread count (inherit ripgrep defaults unless constrained by sandbox).

Edge Cases

  • Invalid regex → return structured error with rg message.
  • Binary files → skip by default (ripgrep behavior); expose binary_skipped:true per file if needed later.
  • Very large results → respect head_limit, set truncated:true.
  • Multiline without multiline:true → document that patterns won’t cross lines.

Acceptance Criteria

  • Can return file paths for a simple literal search in a non-trivial repo within expected time.
  • Supports glob and type filters independently and together.
  • content mode shows correct lines with -A/-B/-C and -n.
  • count mode returns accurate per-file counts.
  • multiline:true enables dotall semantics and finds cross-line matches.
  • head_limit caps outputs across all modes with truncated:true.
  • All operations confined to allowed paths; no shelling out to grep/rg.
  • Improved unit and integration tests (including regex escaping like interface\\{\\}).
  • Tool description is "A powerful search tool built on ripgrep\n\n Usage:\n - ALWAYS use Grep for search tasks. NEVER invoke grep or rg as a Bash command. The Grep tool has been optimized for correct permissions and access.\n - Supports full regex syntax (e.g., "log.Error", "function\s+\w+")\n - Filter files with glob parameter (e.g., ".js", "**/*.tsx") or type parameter (e.g., "js", "py", "rust")\n - Output modes: "content" shows matching lines, "files_with_matches" shows only file paths (default), "count" shows match counts\n - Use Task tool for open-ended searches requiring multiple rounds\n - Pattern syntax: Uses ripgrep (not grep) - literal braces need escaping (use interface\\{\\} to find interface{} in Go code)\n - Multiline matching: By default patterns match within single lines only. For cross-line patterns like struct \\{[\\s\\S]*?field, use multiline: true\n"
  • It's documented

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions