SemTools

Semantic search and document parsing tools for the command line

A collection of high-performance CLI tools for document processing and semantic search, built with Rust for speed and reliability.

parse - Parse documents (PDF, DOCX, etc.) using, by default, the LlamaParse API into markdown format
search - Local semantic keyword search using multilingual embeddings with cosine similarity matching and per-line context matching

NOTE: By default, parse uses LlamaParse as a backend. Get your API key today for free at https://cloud.llamaindex.ai. search remains local-only.

Key Features

Fast semantic search using model2vec embeddings from minishlab/potion-multilingual-128M
Reliable document parsing with caching and error handling
Unix-friendly design with proper stdin/stdout handling
Configurable distance thresholds and returned chunk sizes
Multi-format support for parsing documents (PDF, DOCX, PPTX, etc.)
Concurrent processing for better parsing performance

Installation

Prerequisites:

For the parse tool: LlamaIndex Cloud API key

Install:

You can install semtools via npm:

npm i -g @llamaindex/semtools

Or via cargo:

# install entire crate
cargo install semtools

# install only parse
cargo install semtools --no-default-features --features=parse

# install only search
cargo install semtools --no-default-features --features=search

Note: Installing from npm builds the Rust binaries locally during install if a prebuilt binary is not available, which requires Rust and Cargo to be available in your environment. Install from rustup if needed: https://www.rust-lang.org/tools/install.

Quick Start

Basic Usage:

# Parse some files
parse my_dir/*.pdf

# Search some (text-based) files
search "some keywords" *.txt --max-distance 0.3 --n-lines 5

# Combine parsing and search
parse my_docs/*.pdf | xargs search "API endpoints"

Advanced Usage:

# Combine with grep for exact-match pre-filtering and distance thresholding
parse *.pdf | xargs cat | grep -i "error" | search "network error" --max-distance 0.3

# Pipeline with content search (note the 'cat')
find . -name "*.md" | xargs parse | xargs search "installation"

# Combine with grep for filtering (grep could be before or after parse/search!)
parse docs/*.pdf | xargs search "API" | grep -A5 "authentication"

# Save search results
parse report.pdf | xargs cat | search "summary" > results.txt

CLI Help

parse --help
A CLI tool for parsing documents using various backends

Usage: parse [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Files to parse

Options:
  -c, --parse-config <PARSE_CONFIG>  Path to the config file. Defaults to ~/.parse_config.json
  -b, --backend <BACKEND>            The backend type to use for parsing. Defaults to `llama-parse` [default: llama-parse]
  -v, --verbose                      Verbose output while parsing
  -h, --help                         Print help
  -V, --version                      Print version

search --help
A CLI tool for fast semantic keyword search

Usage: search [OPTIONS] <QUERY> [FILES]...

Arguments:
  <QUERY>     Query to search for (positional argument)
  [FILES]...  Files or directories to search

Options:
  -n, --n-lines <N_LINES>            How many lines before/after to return as context [default: 3]
      --top-k <TOP_K>                The top-k files or texts to return (ignored if max_distance is set) [default: 3]
  -m, --max-distance <MAX_DISTANCE>  Return all results with distance below this threshold (0.0+)
  -i, --ignore-case                  Perform case-insensitive search (default is false)
  -h, --help                         Print help
  -V, --version                      Print version

Configuration

Parse Tool Configuration

By default, the parse tool uses the LlamaParse API to parse documents.

It will look for a ~/.parse_config.json file to configure the API key and other parameters.

Otherwise, it will fallback to looking for a LLAMA_CLOUD_API_KEY environment variable and a set of default parameters.

To configure the parse tool, create a ~/.parse_config.json file with the following content (defaults are shown below):

{
  "api_key": "your_llama_cloud_api_key_here",
  "num_ongoing_requests": 10,
  "base_url": "https://api.cloud.llamaindex.ai",
  "check_interval": 5,
  "max_timeout": 3600,
  "max_retries": 10,
  "retry_delay_ms": 1000,
  "backoff_multiplier": 2.0,
  "parse_kwargs": {
    "parse_mode": "parse_page_with_agent",
    "model": "openai-gpt-4-1-mini",
    "high_res_ocr": "true",
    "adaptive_long_table": "true",
    "outlined_table_extraction": "true",
    "output_tables_as_HTML": "true"
  }
}

Or just set via environment variable:

export LLAMA_CLOUD_API_KEY="your_api_key_here"

Agent Use Case Examples

Future Work

More parsing backends (something local-only would be great!)
Improved search algorithms
(optional) Persistence for speedups on repeat searches on the same files

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

LlamaIndex/LlamaParse for document parsing capabilities
model2vec-rsfor fast embedding generation
minishlab/potion-multilingual-128M for an amazing default static embedding model
simsimd for efficient similarity computation

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github		.github
benchmarks/arxiv		benchmarks/arxiv
cli		cli
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SemTools

Key Features

Installation

Quick Start

CLI Help

Configuration

Parse Tool Configuration

Agent Use Case Examples

Future Work

Contributing

License

Acknowledgments

About

Uh oh!

Releases 7

Packages

Languages

License

run-llama/semtools

Folders and files

Latest commit

History

Repository files navigation

SemTools

Key Features

Installation

Quick Start

CLI Help

Configuration

Parse Tool Configuration

Agent Use Case Examples

Future Work

Contributing

License

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages