Skip to content

Commit 959cdbe

Browse files
author
Raphael Mitsch
committed
docs: Update readme.
1 parent eac19b8 commit 959cdbe

File tree

1 file changed

+14
-4
lines changed

1 file changed

+14
-4
lines changed

README.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,17 +12,27 @@
1212
[![codecov](https://codecov.io/gh/mantisai/sieves/branch/main/graph/badge.svg)](https://codecov.io/gh/mantisai/sieves)
1313
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17633730.svg)](https://doi.org/10.5281/zenodo.17633730)
1414

15-
## Unified Pipelines for Zero-Shot Document AI.
15+
## A Unified Interface for Document AI Applications.
1616

17-
`sieves` provides a type-safe abstraction for building zero-shot Document AI pipelines. It unifies the entire workflow:
17+
`sieves` provides a stable, framework-agnostic abstraction for building document AI pipelines.
18+
Just as `sqlalchemy` provides a unified interface for interchangeable database drivers, `sieves` offers a consistent API for predictive tasks while allowing you to swap the underlying language model frameworks without changing your core application logic.
19+
20+
This approach recognizes that different LM frameworks excel at different aspects of language model development:
21+
* [`outlines`](https://github.com/dottxt-ai/outlines) for high-performance, strictly constrained structured generation with local models.
22+
* [`dspy`](https://github.com/stanfordnlp/dspy) for sophisticated prompt optimization and few-shot example tuning.
23+
* [`langchain`](https://github.com/langchain-ai/langchain) for broad compatibility with proprietary APIs and existing ecosystems.
24+
* [`gliner2`](https://github.com/fastino-ai/GLiNER2) or [`transformers`](https://github.com/huggingface/transformers) zero-shot pipelines for specialized, low-latency local inference.
25+
26+
`sieves` unifies the entire workflow:
1827

1928
1. **Ingestion**: Parsing PDFs, images, and Office docs (via [`docling`](https://github.com/docling-project/docling)).
2029
2. **Preprocessing**: Intelligent text chunking and windowing (via [`chonkie`](https://github.com/chonkie-inc/chonkie)).
21-
3. **Prediction**: Zero-shot structured generation using a unified Pydantic interface.
30+
3. **Prediction**: Zero-shot structured generation using a unified interface.
2231
Supports multiple backends: [`dspy`](https://github.com/stanfordnlp/dspy), [`langchain`](https://github.com/fastino-ai/GLiNER2), [`outlines`](https://github.com/dottxt-ai/outlines), [`gliner2`](https://github.com/fastino-ai/GLiNER2), [`transformers`](https://github.com/huggingface/transformers) zero-shot classification pipelines
2332
4. **Distillation**: Distill a specialized local model from zero-shot predictions (via [`setfit`](https://github.com/huggingface/setfit) and [`model2vec`](https://github.com/MinishLab/model2vec)).
2433

25-
Define your task pipeline once, then swap execution engines without rewriting your pipeline logic.
34+
Define your task pipeline once, then swap execution engines without rewriting your pipeline logic. Use the task library
35+
to skip having to define tasks from scratch.
2636

2737
> [!WARNING]
2838
> `sieves` is in active development (Beta). The API is stable within minor versions, but we recommend pinning your version for production use.

0 commit comments

Comments
 (0)