Skip to content

Commit a022935

Browse files
rmitschRaphael Mitsch
andauthored
refactor: Fold GlinerNER into GliNERBridge (#235)
* refactor: Fold GlinerNER into GliNERBridge. * fix: Fix test import. * fix: Fix tests. * docs: Update readme. Fix tests. * docs: Fix snippet integration. Update AGENTS.md. * docs: Update readme intro. * docs: Update readme. * docs: Update readme. --------- Co-authored-by: Raphael Mitsch <raphael@climatiq.com>
1 parent d5ec9fd commit a022935

File tree

10 files changed

+285
-535
lines changed

10 files changed

+285
-535
lines changed

AGENTS.md

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ uv run pytest -m "not slow"
132132
uv run mkdocs serve
133133

134134
# Build static docs
135-
uv run mkdocs build
135+
uv run mkdocs build --strict
136136
```
137137

138138
### Import & Sanity Check
@@ -177,6 +177,7 @@ uv run python -c "import sieves; print(sieves.__name__)"
177177
- Connects tasks to model wrappers
178178
- Defines prompt templates (Jinja2-based)
179179
- Handles output schema and parsing
180+
- Specialized bridges like `GliNERBridge` can be shared across tasks
180181

181182
6. **ModelSettings** (`sieves.model_wrappers.types.ModelSettings`)
182183
- Configures structured generation behavior
@@ -245,10 +246,13 @@ Enforced via CI pipeline:
245246
- Define `__call__` for execution
246247
3. Create `bridges.py`:
247248
- Subclass `Bridge` for each supported model wrapper
249+
- Use generic bridges (e.g. `GliNERBridge`) if applicable
248250
- Define prompt template (Jinja2), output schema (Pydantic), extraction/parsing logic
249251
4. Export in `sieves/tasks/predictive/__init__.py`
250252
5. Add tests under `sieves/tests/tasks/predictive/`
251-
6. Add docs to `docs/tasks/`
253+
6. Add docs to `docs/tasks/`:
254+
- Include usage examples with snippets from `sieves/tests/docs/`
255+
- Link to third-party libraries
252256

253257
### Adding a New ModelWrapper
254258

@@ -258,7 +262,7 @@ Enforced via CI pipeline:
258262
4. Advertise `inference_modes` property
259263
5. Add to `ModelType` enum in `model_type.py`
260264
6. Ensure `serialize()/deserialize()` work with `Config`
261-
7. Add tests and docs
265+
7. Add tests and docs (with snippets)
262266

263267
### Custom Preprocessing
264268

@@ -321,6 +325,8 @@ Enforced via CI pipeline:
321325
- Keep patches minimal and focused; avoid unrelated refactors
322326
- Respect optional dependencies; gate ingestion/distillation imports behind extras (model libraries are now core)
323327
- Update docs (`docs/`) if you add public features
328+
- Include introduction and usage examples
329+
- Use snippets from `sieves/tests/docs/` to ensure code is tested
324330
- Write tests for new functionality
325331
- Consider conditional execution and error handling (`strict`) for robust pipelines
326332

@@ -422,15 +428,18 @@ Then run: `uv run pytest sieves/tests/test_my_feature.py -v`
422428

423429
Key changes that affect development (last ~2-3 months):
424430

425-
1. **All Model wrappers as Core Dependencies** (#210) - Outlines, DSPy, LangChain, Transformers, and GLiNER2 are now included in base installation
426-
2. **DSPy v3 Migration** (#192) - Upgraded to DSPy v3 (breaking API changes from v2)
427-
3. **GliNER2 Migration** (#202) - Migrated from GliNER v1 to GLiNER2 for improved NER performance
428-
4. **ModelSettings Refactoring** (#194) - `inference_mode` moved into ModelSettings (simplified task init)
429-
5. **Conditional Task Execution** (#195) - Added `condition` parameter for filtering docs during execution
430-
6. **Non-strict Execution Support** (#196) - Better error handling; `strict=False` allows graceful failures
431-
7. **Standardized Output Fields** (#206) - Normalized descriptive/ID attribute naming across tasks
432-
8. **Chonkie Integration** - Token-based chunking framework now primary chunking backend
433-
9. **Optional Progress Bars** (#197) - Progress display now configurable per task
431+
1. **Information Extraction Single/Multi Mode** - Added `mode` parameter to `InformationExtraction` task for single vs multi entity extraction.
432+
2. **GliNERBridge Refactoring** - Consolidated NER logic into `GliNERBridge`, removing dedicated `GlinerNER` class.
433+
3. **Documentation Enhancements** - Standardized documentation with usage snippets (tested) and library links across all tasks and model wrappers.
434+
4. **All Model wrappers as Core Dependencies** (#210) - Outlines, DSPy, LangChain, Transformers, and GLiNER2 are now included in base installation
435+
5. **DSPy v3 Migration** (#192) - Upgraded to DSPy v3 (breaking API changes from v2)
436+
6. **GliNER2 Migration** (#202) - Migrated from GliNER v1 to GLiNER2 for improved NER performance
437+
7. **ModelSettings Refactoring** (#194) - `inference_mode` moved into ModelSettings (simplified task init)
438+
8. **Conditional Task Execution** (#195) - Added `condition` parameter for filtering docs during execution
439+
9. **Non-strict Execution Support** (#196) - Better error handling; `strict=False` allows graceful failures
440+
10. **Standardized Output Fields** (#206) - Normalized descriptive/ID attribute naming across tasks
441+
11. **Chonkie Integration** - Token-based chunking framework now primary chunking backend
442+
12. **Optional Progress Bars** (#197) - Progress display now configurable per task
434443

435444
---
436445

0 commit comments

Comments
 (0)