|
12 | 12 | [](https://codecov.io/gh/mantisai/sieves) |
13 | 13 | [](https://doi.org/10.5281/zenodo.17633730) |
14 | 14 |
|
15 | | -## Unified Pipelines for Zero-Shot Document AI. |
| 15 | +## A Unified Interface for Document AI Applications. |
16 | 16 |
|
17 | | -`sieves` provides a type-safe abstraction for building zero-shot Document AI pipelines. It unifies the entire workflow: |
| 17 | +`sieves` provides a stable, framework-agnostic abstraction for building document AI pipelines. |
| 18 | +Just as `sqlalchemy` provides a unified interface for interchangeable database drivers, `sieves` offers a consistent API for predictive tasks while allowing you to swap the underlying language model frameworks without changing your core application logic. |
| 19 | + |
| 20 | +This approach recognizes that different LM frameworks excel at different aspects of language model development: |
| 21 | +* [`outlines`](https://github.com/dottxt-ai/outlines) for high-performance, strictly constrained structured generation with local models. |
| 22 | +* [`dspy`](https://github.com/stanfordnlp/dspy) for sophisticated prompt optimization and few-shot example tuning. |
| 23 | +* [`langchain`](https://github.com/langchain-ai/langchain) for broad compatibility with proprietary APIs and existing ecosystems. |
| 24 | +* [`gliner2`](https://github.com/fastino-ai/GLiNER2) or [`transformers`](https://github.com/huggingface/transformers) zero-shot pipelines for specialized, low-latency local inference. |
| 25 | + |
| 26 | +`sieves` unifies the entire workflow: |
18 | 27 |
|
19 | 28 | 1. **Ingestion**: Parsing PDFs, images, and Office docs (via [`docling`](https://github.com/docling-project/docling)). |
20 | 29 | 2. **Preprocessing**: Intelligent text chunking and windowing (via [`chonkie`](https://github.com/chonkie-inc/chonkie)). |
21 | | -3. **Prediction**: Zero-shot structured generation using a unified Pydantic interface. |
| 30 | +3. **Prediction**: Zero-shot structured generation using a unified interface. |
22 | 31 | Supports multiple backends: [`dspy`](https://github.com/stanfordnlp/dspy), [`langchain`](https://github.com/fastino-ai/GLiNER2), [`outlines`](https://github.com/dottxt-ai/outlines), [`gliner2`](https://github.com/fastino-ai/GLiNER2), [`transformers`](https://github.com/huggingface/transformers) zero-shot classification pipelines |
23 | 32 | 4. **Distillation**: Distill a specialized local model from zero-shot predictions (via [`setfit`](https://github.com/huggingface/setfit) and [`model2vec`](https://github.com/MinishLab/model2vec)). |
24 | 33 |
|
25 | | -Define your task pipeline once, then swap execution engines without rewriting your pipeline logic. |
| 34 | +Define your task pipeline once, then swap execution engines without rewriting your pipeline logic. Use the task library |
| 35 | +to skip having to define tasks from scratch. |
26 | 36 |
|
27 | 37 | > [!WARNING] |
28 | 38 | > `sieves` is in active development (Beta). The API is stable within minor versions, but we recommend pinning your version for production use. |
|
0 commit comments