You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+13-6Lines changed: 13 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -306,14 +306,20 @@ Enforced via CI pipeline:
306
306
307
307
## Observability & Serialization
308
308
309
-
-**Logging:** Loguru integrated; logs task execution and model wrapper calls
309
+
-**Logging:**`loguru` integrated; logs task execution and model wrapper calls.
310
+
-**Raw Model Outputs:** Captured in `doc.meta[task_id]['raw']` as a list of raw responses per chunk when `include_meta=True` (default).
311
+
-**Token Usage Tracking:**
312
+
- Tracked across the entire pipeline and aggregated in `doc.meta['usage']`.
313
+
- Also available per task in `doc.meta[task_id]['usage']`.
314
+
- Includes `input_tokens` and `output_tokens`.
315
+
- Uses native metadata for DSPy/LangChain and approximate estimation for other backends.
310
316
-**Pipeline persistence:**
311
317
```python
312
-
pipe.dump("pipeline.yml") # Save config
313
-
loaded = Pipeline.load("pipeline.yml", task_kwargs) # Reload with model kwargs
318
+
pipe.dump("pipeline.yml") # Save config.
319
+
loaded = Pipeline.load("pipeline.yml", task_kwargs) # Reload with model kwargs.
314
320
```
315
-
-**Document persistence:** Use pickle (models not serialized)
316
-
-**Config format:** YAML-compatible via `sieves.serialization.Config`
321
+
-**Document persistence:** Use pickle (models not serialized).
322
+
-**Config format:** YAML-compatible via `sieves.serialization.Config`.
317
323
318
324
---
319
325
@@ -428,7 +434,8 @@ Then run: `uv run pytest sieves/tests/test_my_feature.py -v`
428
434
429
435
Key changes that affect development (last ~2-3 months):
430
436
431
-
1.**Information Extraction Single/Multi Mode** - Added `mode` parameter to `InformationExtraction` task for single vs multi entity extraction.
437
+
1.**Token Counting and Raw Output Observability** - Implemented comprehensive token usage tracking (input/output) and raw model response capturing in `doc.meta`. Usage is aggregated per-task and per-document.
438
+
2.**Information Extraction Single/Multi Mode** - Added `mode` parameter to `InformationExtraction` task for single vs multi entity extraction.
432
439
2.**GliNERBridge Refactoring** - Consolidated NER logic into `GliNERBridge`, removing dedicated `GlinerNER` class.
433
440
3.**Documentation Enhancements** - Standardized documentation with usage snippets (tested) and library links across all tasks and model wrappers.
434
441
4.**All Model wrappers as Core Dependencies** (#210) - Outlines, DSPy, LangChain, Transformers, and GLiNER2 are now included in base installation
The `Doc` class is the fundamental unit of data in Sieves. It encapsulates the text to be processed, its associated metadata, and the results generated by various tasks in a pipeline.
3
+
The `Doc` class is the fundamental unit of data in `sieves`. It encapsulates the text to be processed, its associated metadata, and the results generated by various tasks in a pipeline.
The `meta` field stores detailed execution traces, including raw model outputs and token usage statistics. This is particularly useful for debugging and cost monitoring. For a deep dive into how to use these features, see the [Observability and Usage Tracking guide](guides/observability.md).
`sieves` provides built-in tools for monitoring your Document AI pipelines. By enabling metadata collection, you can inspect raw model responses and track token consumption for both local and remote models.
4
+
5
+
## The `meta` Field
6
+
7
+
Every `Doc` object in `sieves` contains a `meta` dictionary. When `include_meta=True` (which is the default for predictive tasks), this dictionary is populated with detailed execution traces.
8
+
9
+
### Raw Model Outputs
10
+
11
+
`sieves` captures the "raw" output from the underlying language model before it is parsed into your final structured format. This is invaluable for debugging prompt failures or investigating unexpected model behavior.
12
+
13
+
The raw outputs are stored in `doc.meta[task_id]['raw']`. Since documents can be split into multiple chunks, this field contains a list of raw responses—one for each chunk.
# The per-chunk usage for the classification task.
77
+
for i, chunk_usage inenumerate(task_meta['usage']['chunks']):
78
+
print(f"Chunk {i}: {chunk_usage['input_tokens']} in, {chunk_usage['output_tokens']} out")
79
+
```
80
+
81
+
---
82
+
83
+
## Native vs. Approximate Counting
84
+
85
+
`sieves` uses a multi-tiered approach to ensure you always have token data, even when model frameworks don't provide it natively.
86
+
87
+
### Native Tracking (DSPy & LangChain)
88
+
For backends like **DSPy**and**LangChain**, `sieves` extracts token counts directly from the model provider's metadata (e.g., OpenAI or Anthropic response headers). This is the most accurate form of tracking.
89
+
90
+
!!! note "DSPy Caching"
91
+
DSPy's internal caching may return 0 or `None` for tokens if a result is retrieved from the local cache rather than the remote API.
For local models or frameworks that don't expose native counts, `sieves` uses the model's own **tokenizer** to estimate usage:
95
+
1. **Input Tokens**: Counted by encoding the fully rendered prompt string.
96
+
2. **Output Tokens**: Counted by encoding the raw generated output string.
97
+
98
+
If a local tokenizer isnot available (e.g., when using a remote API via Outlines without a local weight clone), `sieves` will attempt to fall back to `tiktoken` (for OpenAI-compatible models) orreturn`None`.
Copy file name to clipboardExpand all lines: docs/guides/optimization.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -202,7 +202,7 @@ Optimizer(
202
202
203
203
## Learning More About Optimization
204
204
205
-
Sieves optimization is built on [DSPy's MIPROv2 optimizer](https://dspy-docs.vercel.app/api/optimizers/MIPROv2). For in-depth guidance on optimization techniques, training data quality, and interpreting results, we recommend exploring these external resources:
205
+
`sieves` optimization is built on [DSPy's MIPROv2 optimizer](https://dspy-docs.vercel.app/api/optimizers/MIPROv2). For in-depth guidance on optimization techniques, training data quality, and interpreting results, we recommend exploring these external resources:
206
206
207
207
### Understanding MIPROv2
208
208
@@ -217,13 +217,13 @@ Sieves optimization is built on [DSPy's MIPROv2 optimizer](https://dspy-docs.ver
217
217
- ⚙️ **Hyperparameter Tuning** - Adjusting `num_trials`, `num_candidates`, and other optimizer settings for better results
218
218
- 🎯 **Evaluation Metrics** - Choosing the right metrics for your task (see Evaluation Metrics section above)
219
219
220
-
### Sieves-Specific Integration
220
+
### `sieves`-Specific Integration
221
221
222
-
The main differences when using optimization in Sieves:
222
+
The main differences when using optimization in `sieves`:
223
223
224
224
-**Simplified API**: Use `task.optimize(optimizer)` instead of calling DSPy optimizers directly
225
225
-**Automatic integration**: Optimized prompts and few-shot examples are automatically integrated into the task
226
226
-**Task compatibility**: Works with all `PredictiveTask` subclasses (Classification, NER, InformationExtraction, etc.)
227
227
-**Full parameter access**: All DSPy optimizer parameters are available via the `Optimizer` class constructor
228
228
229
-
For questions specific to Sieves optimization integration, see the [Troubleshooting](#troubleshooting) section above or consult the [task-specific documentation](../tasks/predictive/classification.md) for evaluation metrics.
229
+
For questions specific to `sieves` optimization integration, see the [Troubleshooting](#troubleshooting) section above or consult the [task-specific documentation](../tasks/predictive/classification.md) for evaluation metrics.
0 commit comments