xgrep

Ultra-fast indexed code search engine with MCP server for AI coding tools.

Pre-builds a trigram inverted index, then searches in milliseconds. Designed for repeated searches on large codebases — by humans and AI agents alike.

Why xgrep?

	ripgrep	zoekt	xgrep
Setup	None	Server required	None (`cargo install`)
First search	Instant	After server start	Auto-builds index
Repeated search (Linux kernel)	2,236ms	170ms (server)	38ms
File discovery (next.js, 26K files)	N/A	N/A	13ms (fd: 290ms)
Index size	N/A	155% of source	8% of source
AI agent integration	None	None	MCP server built-in
Memory (search)	11MB	288MB	208MB

xgrep is not a ripgrep replacement. Use ripgrep for one-off searches. Use xgrep when you search the same codebase repeatedly — the index pays for itself after ~12 searches.

Quick Start

cargo install xgrep-search    # Installs the `xg` command
xg "pattern"                  # Search (auto-builds index on first run)

Requires Rust 1.85+. Works on macOS, Linux, and Windows.

Build from source

git clone https://github.com/momokun7/xgrep.git
cd xgrep/rust
cargo build --release
cp target/release/xg ~/.local/bin/

Usage

xg "pattern"                  # Fixed string search
xg "pattern" /path/to/repo    # Search a specific directory (no cd needed)
xg "pattern" /path/to/file.rs # Search a single file directly
xg -e "handle_\w+"            # Regex search
xg "pattern" -i               # Case-insensitive
xg "pattern" --type rs        # Filter by file type
xg "pattern" -C 3             # Context lines
xg "pattern" --format llm     # Markdown output for LLMs
xg "pattern" --changed        # Only git changed files
xg "pattern" --since 1h       # Recently changed files
xg "pattern" --fresh          # Check index freshness (slower but up-to-date)
xg "pattern" --absolute-paths # Show absolute paths in output
xg "pattern" --exclude vendor  # Exclude paths containing "vendor"
xg "pattern" --no-hints       # Suppress regex pattern hints
xg --find "*.rs" /path/to/repo   # Find files by glob pattern
xg --find config /path/to/repo   # Find files by substring
xg --find "*.rs" --changed       # Find changed .rs files
xg --find "*" -t toml            # Find all .toml files (--find + -t)
xg --list-types               # Show supported file types
xg status                     # Show index status
xg init                       # Explicitly rebuild index
xg init /path/to/repo         # Build index for a specific directory
xg --version                  # Show version

Environment Variables

Variable	Description	Default
`XGREP_LLM_CONTEXT`	Default context lines for `--format llm`	`3`
`XGREP_ABSOLUTE_PATHS`	Set to `1` to always use absolute paths	unset
`XGREP_NO_HINTS`	Set to `1` to suppress regex pattern hints	unset

MCP Server for AI Agents

xgrep runs as an MCP server, giving AI coding tools fast indexed search.

xg serve                        # Start MCP server
xg serve --root /path/to/repo   # Specific directory

Claude Code

Add to settings:

{
  "mcpServers": {
    "xgrep": {
      "command": "xg",
      "args": ["serve"]
    }
  }
}

Available Tools

Tool	Description
`search`	Text/regex search with context. Auto-builds index. Max 4000 tokens by default.
`find_definitions`	Find likely definitions by regex heuristics (may include false positives)
`read_file`	Read file contents with optional line range
`index_status`	Check index freshness and stats
`build_index`	Explicitly rebuild index

Performance

Benchmarked with hyperfine on Apple M4, 32GB RAM, macOS. All numbers are warm cache, after index build. First run includes a one-time index build (~6s for Linux kernel). See Index Cost for details.

Large: Linux kernel (92,947 files, 2.0GB)

Query	xg	ripgrep	vs ripgrep
`struct file_operations`	38ms	2,236ms	59x faster
`printk`	54ms	1,795ms	33x faster
`EXPORT_SYMBOL`	70ms	1,900ms	27x faster

Medium: ripgrep source (248 files, 4.3MB)

Query	xg	ripgrep	vs ripgrep
`fn main`	2.5ms	7.9ms	3.1x faster
`Options`	2.3ms	7.7ms	3.3x faster
`pub struct`	2.6ms	7.8ms	3.1x faster

Small: xgrep source (17 files)

Query	xg	ripgrep	vs ripgrep
`fn main`	2.1ms	5.2ms	2.5x faster
`SearchResult`	1.6ms	4.7ms	2.9x faster
`Matcher`	2.2ms	5.0ms	2.3x faster

Index Cost

Metric	xgrep	zoekt	ripgrep
Build time	6s	46s	N/A
Index size	175MB (8%)	3.0GB (155%)	N/A
Breakeven	~2 searches	-	-

zoekt numbers are CLI mode. In server mode, zoekt search latency is significantly lower.

File Discovery: `--find` vs fd vs find

Benchmarked with hyperfine (-N --warmup 5 --min-runs 50). Repos are shallow-cloned to a temp directory for reproducibility.

tokio (825 files, Rust async runtime):

Query	xg --find	fd	find	vs fd
`*.rs` (769 files)	2.4ms	8.9ms	7.9ms	3.7x faster
`config` (substring)	1.9ms	8.1ms	8.3ms	4.3x faster

next.js (26,424 files, React framework):

Query	xg --find	fd	find	vs fd
`*.ts` (4,639 files)	12.9ms	289.7ms	606.5ms	22x faster
`config` (substring)	6.4ms	228.9ms	637.0ms	36x faster

xg --find reads file paths from the in-memory index (mmap), while fd/find walk the filesystem. The gap widens with repository size.

Reproduce Benchmarks

./bench/run.sh small    # xgrep source (~20 files, 30s)
./bench/run.sh medium   # ripgrep source (~250 files, auto-downloads)
./bench/run.sh large    # Linux kernel (~92K files, requires manual download)
./benchmarks/bench_find.sh  # --find vs fd vs find (auto-clones repos)

Output Formats

Default (ripgrep-compatible):

src/main.rs:42:fn handle_auth() {}

LLM (--format llm): Markdown code blocks with language tags and context lines.

JSON (--json): Structured output for programmatic use.

Regex Performance Notes

xgrep extracts trigram literals from regex patterns to narrow search candidates before full regex matching. This works well for patterns with literal substrings but falls back to full scan for purely abstract patterns.

Fast (trigram-optimized):

Pattern	Why	Trigrams extracted
`handle_\w+`	Literal prefix "handle_"	`han`, `and`, `ndl`, `dle`, `le_`
`fn\s+main`	Literal parts "fn" and "main"	`mai`, `ain`
`error.*timeout`	Literals "error" and "timeout"	Both sets

Slow (full scan fallback):

Pattern	Why
`.*`	No literals
`[a-z]+`	Only character classes
`\d{3}-\d{4}`	No literal strings
`.+error`	Leading `.+` prevents extraction

For patterns that fall back to full scan, xgrep will show a warning: warning: regex cannot be optimized with trigram index (full scan).

Tip: Include at least 3 literal characters in your regex for best performance. handle_\w+ is much faster than \w+_auth.

Limitations

xgrep uses a trigram inverted index, the same technique as Google Code Search (2006) and zoekt. This approach has inherent trade-offs:

Short queries (< 3 chars) bypass the index: Patterns like if, fn, go fall back to full file scan with no speed advantage over ripgrep.
Common trigrams reduce filtering: Queries containing frequent trigrams (the, int, return) produce many candidate files, narrowing the speed gap with ripgrep.
Scaling limits not yet determined: Tested up to 92K files (Linux kernel, 2.0GB) where performance is excellent. Behavior on larger codebases (Chromium-scale, 350K+ files) has not been benchmarked.
Index staleness: Background rebuild runs every ~30 seconds. Recently saved files may not appear until the next rebuild completes.
find_definitions is regex-based: Uses heuristic patterns (fn/struct/class/def), not AST analysis. False positives are expected.
ASCII-only case folding: Case-insensitive search (-i) handles ASCII letters only. Unicode case folding is not supported.

When to use ripgrep instead

One-off searches on a codebase you won't search again
Very small codebases (< 100 files, where index overhead outweighs benefit)
Queries shorter than 3 characters
When you need results from files saved within the last 30 seconds

Why trigrams?

xgrep prioritizes simplicity and small index size over search precision. Alternative approaches:

Approach	Index size	Precision	Trade-off
Trigram (xgrep, zoekt)	~8% of source	Moderate (false positives)	Simple, small, fast to build
Suffix array (Livegrep)	2-5x source	High	Large index, slow to build
AST/Symbol (Searkt, LSP)	Varies	Exact	Language-specific, complex

Trigrams are the right choice when you want a single binary that works on any codebase without language-specific setup.

Exit Codes

Code	Meaning
`0`	Matches found
`1`	No matches found (not an error)
`2`	Error (invalid pattern, missing index, I/O error, usage error)

Follows the same convention as ripgrep.

How It Works

Index Build: Walks the codebase, extracts 3-byte trigrams from each file, builds an inverted index (trigram -> file IDs) with delta+varint compression
Search: Extracts trigrams from query, intersects posting lists to find candidate files, verifies matches
Hybrid Mode: When the index is stale, combines index results with direct scanning of changed files — no rebuild needed
MCP Server: Exposes search via JSON-RPC over stdio, with LLM-optimized output and token-aware truncation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github		.github
bench		bench
benchmarks		benchmarks
npm		npm
rust		rust
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xgrep

Why xgrep?

Quick Start

Usage

Environment Variables

MCP Server for AI Agents

Claude Code

Available Tools

Performance

Large: Linux kernel (92,947 files, 2.0GB)

Medium: ripgrep source (248 files, 4.3MB)

Small: xgrep source (17 files)

Index Cost

File Discovery: `--find` vs fd vs find

Reproduce Benchmarks

Output Formats

Regex Performance Notes

Limitations

When to use ripgrep instead

Why trigrams?

Exit Codes

How It Works

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

xgrep

Why xgrep?

Quick Start

Usage

Environment Variables

MCP Server for AI Agents

Claude Code

Available Tools

Performance

Large: Linux kernel (92,947 files, 2.0GB)

Medium: ripgrep source (248 files, 4.3MB)

Small: xgrep source (17 files)

Index Cost

File Discovery: --find vs fd vs find

Reproduce Benchmarks

Output Formats

Regex Performance Notes

Limitations

When to use ripgrep instead

Why trigrams?

Exit Codes

How It Works

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

File Discovery: `--find` vs fd vs find

Packages