Skip to content

Commit 0dfb1b4

Browse files
feat: auto-generate tool schemas from @register_tool functions
1 parent fb1b33b commit 0dfb1b4

36 files changed

+5209
-3194
lines changed

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Enforce LF line endings for all text files
2+
* text=auto eol=lf

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,5 @@ docs/build
2121
dist
2222
.DS_Store
2323
code_sample.py
24+
25+
settings.local.json

AGENTS.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# AGENTS.md - AI Coding Assistant Guidelines
2+
3+
Guidelines for AI coding assistants working with the ContextGem codebase.
4+
5+
For detailed contribution procedures, see `CONTRIBUTING.md`, which covers:
6+
- Development environment setup
7+
- Project structure overview
8+
- VCR cassette recording scenarios
9+
- Documentation building
10+
11+
## Project Overview
12+
13+
**ContextGem** is a Python LLM framework for extracting structured data from documents using long context windows (not RAG-based).
14+
15+
- **Python**: 3.10-3.13
16+
- **License**: Apache 2.0
17+
- **Package Manager**: `uv`
18+
19+
## Architecture: Internal/Public Split
20+
21+
The codebase uses a two-layer architecture. **Always implement in internal first, then expose via public.**
22+
23+
```
24+
contextgem/
25+
├── internal/ # Core implementation (_underscore-prefixed classes)
26+
│ ├── base/ # Business logic (concepts, aspects, documents, llms)
27+
│ ├── prompts/ # Jinja2 prompt templates
28+
│ ├── typings/ # Type system & validators
29+
│ └── registry.py # Internal-to-public type mapping
30+
└── public/ # Thin facades (inherit from internal, registered via decorator)
31+
```
32+
33+
### Pattern Example
34+
35+
```python
36+
# 1. Internal implementation (contextgem/internal/base/concepts.py)
37+
class _StringConcept(BaseModel):
38+
name: str
39+
# ... business logic ...
40+
41+
# 2. Public facade (contextgem/public/concepts.py)
42+
@_expose_in_registry(additional_key=_StringConcept)
43+
class StringConcept(_StringConcept):
44+
"""Public API documentation."""
45+
pass
46+
47+
# 3. Export in contextgem/__init__.py
48+
```
49+
50+
## Coding Conventions
51+
52+
| Convention | Rule |
53+
|------------|------|
54+
| Internal classes | `_Aspect`, `_Document` (underscore prefix) |
55+
| Public classes | `Aspect`, `Document` (no prefix) |
56+
| Constants | `_MAX_NESTING_LEVEL` (ALL_CAPS) |
57+
| Required import | `from __future__ import annotations` (except `__init__.py`) |
58+
| Formatter | Ruff (line length: 88) |
59+
| Type checker | Pyright (basic mode) |
60+
| Docstrings | reStructuredText format for Sphinx (`:param:`, `:type:`, `:ivar:`, `:vartype:`, `:returns:`, `:rtype:`) |
61+
| Data models | Pydantic v2 (`BaseModel`, `field_validator`, `model_validator`) |
62+
63+
## Auto-Generated Files - Do NOT Edit
64+
65+
| File | Source | Regeneration |
66+
|------|--------|--------------|
67+
| `README.md` | `dev/readme.template.md` | Pre-commit hook runs `dev/populate_project_readme.py` |
68+
| `docs/source/llms.txt` | Documentation sources | Pre-commit hook |
69+
| `dev/notebooks/` | `dev/usage_examples/` | Pre-commit hook |
70+
| `uv.lock` | `pyproject.toml` | `uv sync` |
71+
72+
**To update README**: Edit `dev/readme.template.md`, then run pre-commit or the generation script.
73+
74+
## After Code Changes
75+
76+
**Always run these steps after making changes:**
77+
78+
```bash
79+
# 1. Run pre-commit hooks (formatting, linting, type checking)
80+
uv run pre-commit run --all-files
81+
82+
# 2. Run targeted tests for the code you modified
83+
uv run pytest tests/test_all.py::TestAll::test_relevant_method
84+
85+
# 3. Run full test suite to check for regressions
86+
uv run pytest
87+
```
88+
89+
### Writing Tests for New Implementations
90+
91+
When adding new functionality:
92+
93+
1. **Add test methods** to `tests/test_all.py::TestAll` following existing patterns
94+
2. **For tests that do NOT call LLM APIs**: Run them to verify they pass
95+
3. **For tests that call LLM APIs**: Add `@pytest.mark.vcr` decorator, **do NOT run them**, and inform the user that cassettes need recording before these tests can be executed
96+
97+
```python
98+
# Example: Adding a test in tests/test_all.py
99+
class TestAll:
100+
def test_non_llm_feature(self):
101+
# Safe to run - no LLM calls
102+
pass
103+
104+
@pytest.mark.vcr # Requires cassette - DO NOT RUN, inform user
105+
def test_llm_feature(self):
106+
# Calls LLM API - user must record cassette first
107+
pass
108+
```
109+
110+
### Updating Documentation
111+
112+
When adding or modifying functionality:
113+
114+
1. **Update docstrings** in the affected classes/methods (reStructuredText format)
115+
2. **Update RST files** in `docs/source/` if the feature has dedicated documentation
116+
3. **Update `dev/readme.template.md`** if README content is affected (not `README.md` directly)
117+
4. **Add usage examples** to `dev/usage_examples/` if demonstrating new features
118+
5. **Verify docs compile** by running from the `docs/` directory:
119+
120+
```bash
121+
uv run sphinx-build -b dirhtml source build/dirhtml -v -E -W
122+
```
123+
124+
### VCR Cassette Rules
125+
126+
- **Never run tests in "live" mode** without existing cassettes
127+
- Tests replay recorded LLM API responses from `tests/cassettes/`
128+
- If tests fail due to cassette mismatches, **inform the user** that cassettes need re-recording
129+
- The user will handle cassette re-recording themselves (requires API keys and may incur costs)
130+
- See [CONTRIBUTING.md - VCR Cassette Management](CONTRIBUTING.md#-vcr-cassette-management) for scenarios
131+
132+
## Git Policy
133+
134+
**Never stage or commit changes** - this is the developer's responsibility.
135+
136+
The developer will review changes and handle git operations themselves.
137+
138+
### File Operations During Refactoring
139+
140+
When moving or renaming files, **always use `git mv`** to preserve git history:
141+
142+
```bash
143+
# Moving a file
144+
git mv old/path/file.py new/path/file.py
145+
146+
# Renaming a file
147+
git mv old_name.py new_name.py
148+
```
149+
150+
**Never** use regular file system operations (copy + delete, or IDE rename) for moves/renames - this breaks git history tracking.
151+
152+
## Quick Commands
153+
154+
```bash
155+
uv sync --all-groups # Install dependencies
156+
uv run pre-commit run --all-files # Run all linters/formatters
157+
uv run pytest # Run tests (uses recorded cassettes)
158+
uv run pytest --cov=contextgem # Run tests with coverage
159+
uv run pytest tests/test_all.py::TestAll::test_specific # Run specific test
160+
```
161+
162+
## Key Gotchas
163+
164+
1. **Never import public classes in internal modules** - use registry for type resolution
165+
2. **Prompt changes break VCR cassettes** - inform user if tests fail after prompt modifications
166+
3. **README.md is auto-generated** - edit `dev/readme.template.md` instead
167+
4. **Never stage or commit** - let the developer handle all git operations
168+
5. **Always run pre-commit** after code changes before considering work complete

CHANGELOG.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,22 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55

66
- **Refactor**: Code reorganization that doesn't change functionality but improves structure or maintainability
77

8+
## [0.20.0](https://github.com/shcherbak-ai/contextgem/releases/tag/v0.20.0) - 2026-02-22
9+
### Added
10+
- Auto-generate tool schemas from `@register_tool` decorated functions. Pass functions directly to `tools=[...]` — schemas are built automatically from type hints and docstrings. Explicit OpenAI-compatible schema dicts remain supported for full backward compatibility.
11+
- Added `docstring-parser` dependency for extracting tool parameter descriptions from Sphinx/reST, Google, and NumPy style docstrings.
12+
13+
### Fixed
14+
- Deterministic tool schema generation: `required` field ordering in auto-generated schemas is now sorted, preventing non-deterministic output from `frozenset` iteration across Python process invocations.
15+
16+
### Changed
17+
- Upgraded pinned dependency versions: `litellm==1.81.14`, `openai==2.21.0`, `genai-prices==0.0.54`. Versions remain pinned to maintain stability and avoid occasional breaking changes and API inconsistencies observed in previous unpinned releases.
18+
19+
### Docs
20+
- Added dedicated "Chat with Tools" documentation page with examples for auto-schema generation, supported type hints, `TypedDict` usage, custom schema overrides, and tool configuration options.
21+
- Simplified quickstart tools example using the new `@register_tool` function-passing syntax.
22+
- Updated `CONTRIBUTING.md` with AI assistant guidance and Fabric commands.
23+
824
## [0.19.4](https://github.com/shcherbak-ai/contextgem/releases/tag/v0.19.4) - 2025-12-19
925
### Fixed
1026
- Applied fix for missing quote in JSON example format within prompt template. (PR [#86](https://github.com/shcherbak-ai/contextgem/pull/86))

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
@AGENTS.md

CONTRIBUTING.md

Lines changed: 43 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,22 @@ To sign the agreement:
1818
3. Fill in all the requested information and include it in your first pull request
1919

2020

21+
## 🤖 Using AI Coding Assistants
22+
23+
This repository is **AI agent-friendly** and includes configuration files to help AI coding assistants understand the codebase:
24+
25+
- **[AGENTS.md](AGENTS.md)** - Project overview, architecture patterns, coding conventions, and workflow guidelines for AI assistants ([agents.md standard](https://agents.md))
26+
- **[CLAUDE.md](CLAUDE.md)** - Configuration for [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview)
27+
28+
When using AI assistants (Claude Code, Cursor, etc.) to contribute:
29+
30+
1. **Review AI-generated code** - Always verify changes follow project patterns and pass tests
31+
2. **Handle VCR cassettes yourself** - AI assistants should not run tests that call LLM APIs without existing cassettes
32+
3. **Manage git operations yourself** - Review and commit changes manually rather than letting AI handle git
33+
34+
> **💡 Tip:** AI assistants work best when given specific, focused tasks. Break large contributions into smaller pieces for better results.
35+
36+
2137
## 🚀 Getting Started
2238

2339
### 🛠️ Development Environment
@@ -44,18 +60,29 @@ To sign the agreement:
4460
# Install uv if you don't have it
4561
pip install uv
4662
47-
# Install dependencies and development extras
48-
uv sync --all-groups
63+
# Install dependencies and pre-commit hooks
64+
uv run fab setup
65+
66+
# Or manually:
67+
# uv sync --all-groups --upgrade
68+
# uv run pre-commit install
69+
# uv run pre-commit install --hook-type commit-msg
4970
```
5071

51-
3. **🔧 Install pre-commit hooks**:
52-
```bash
53-
# Install pre-commit hooks
54-
uv run pre-commit install
72+
### 🛠️ Available Fabric Commands
5573

56-
# Install commit-msg hooks (for commitizen)
57-
uv run pre-commit install --hook-type commit-msg
58-
```
74+
The project includes a `fabfile.py` with common development tasks:
75+
76+
```bash
77+
uv run fab --list # List all available commands
78+
uv run fab setup # Set up dev environment (deps + hooks)
79+
uv run fab sync # Sync dependencies with upgrades
80+
uv run fab lint # Run pre-commit checks on all files
81+
uv run fab docs # Build documentation
82+
uv run fab docs-live # Start live documentation server
83+
uv run fab readme # Regenerate README.md from template
84+
uv run fab install-hooks # Install pre-commit hooks
85+
```
5986

6087

6188
### 📁 Project Structure
@@ -484,16 +511,13 @@ The log output will show detailed information about test execution.
484511

485512
### 🏗️ Building the Documentation
486513

487-
Navigate to the `docs/` directory and choose your preferred build method:
514+
Use the fabric commands from the project root:
488515

489516
#### For Live Development (Recommended)
490517

491-
Use `sphinx-autobuild` for live reloading during development:
492-
493518
```bash
494519
# Live rebuild with auto-refresh on file changes
495-
make livehtml
496-
# Or on Windows: ./make.bat livehtml
520+
uv run fab docs-live
497521
```
498522

499523
This starts a development server on `http://localhost:9000` with:
@@ -503,23 +527,18 @@ This starts a development server on `http://localhost:9000` with:
503527

504528
#### For Static Builds
505529

506-
For one-time builds or CI-style building:
507-
508530
```bash
509-
# Build with verbose output, ignore cache, and treat warnings as errors
510-
# (recommended for structural changes)
511-
uv run sphinx-build -b dirhtml source build/dirhtml -v -E -W
531+
# Build with verbose output, ignore cache, and treat warnings as errors
532+
uv run fab docs
512533
```
513534

514-
The `-E` flag ensures Sphinx completely rebuilds the environment, which is especially important after structural changes like modifying toctree directives or removing files. The `dirhtml` format creates pretty URLs without `.html` extensions, consistent with the live development server.
515-
516535
### 👀 Viewing the Documentation
517536

518537
**With Live Development:**
519-
The documentation automatically opens at `http://localhost:9000` when using `make livehtml`.
538+
Open `http://localhost:9000` in your browser.
520539

521540
**With Static Builds:**
522-
After building, open `build/dirhtml/index.html` in your web browser to view the documentation.
541+
After building, open `docs/build/dirhtml/index.html` in your web browser.
523542

524543
### 🌐 Live Documentation
525544

@@ -545,8 +564,7 @@ Instead:
545564
If you need to test the README generation manually:
546565

547566
```bash
548-
# Populate README.md from template
549-
python dev/populate_project_readme.py
567+
uv run fab readme
550568
```
551569

552570

NOTICE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ This software includes the following third-party components:
2626
Core Dependencies:
2727
- aiolimiter: Rate limiting for asynchronous operations
2828
- colorlog: Colored logging formatter
29+
- docstring-parser: Docstring parsing for auto-generating tool schemas
2930
- fastjsonschema: Fast JSON schema validator
3031
- genai-prices: LLM pricing data and utilities (by Pydantic) to automatically estimate costs
3132
- Jinja2: Templating engine

0 commit comments

Comments
 (0)