Skip to content

Commit 952ee90

Browse files
authored
feat(py/tools): add conform CLI with multi-runtime support (#4593)
## Summary Refactors the parallel conformance runner into a dedicated private package (`conform`) with a clean module architecture and `asyncio.Queue` worker pool for parallel execution. <img width="1364" height="776" alt="Screenshot 2026-02-11 at 2 33 49 PM" src="https://github.com/user-attachments/assets/12beabb3-ad99-4cd0-808b-ff3d167dd548" /> <img width="1081" height="473" alt="Screenshot 2026-02-11 at 2 34 28 PM" src="https://github.com/user-attachments/assets/37b3dd6d-031f-4113-9cfc-70edfcc5ece1" /> ### Subcommands | Command | Purpose | |---------|---------| | `conform check-model [PLUGIN...] [--all] [-j N] [-v] [--runtime NAME]` | Run model conformance tests in parallel | | `conform check-plugin` | Lint-time check that plugins have conformance files | | `conform list` | Show plugins and env-var readiness | ### Key changes - **Module architecture** — `types.py` contains shared types (Status, PluginResult, Runtime Protocol) to eliminate circular imports - **Runtime Protocol** — abstracts runtime differences (Python, JS, Go) behind a Protocol; configurable via `[tool.conform.runtimes.*]` in pyproject.toml - **asyncio.Queue worker pool** — natural backpressure instead of semaphore; N workers pull from the queue - **Error log** — Rich Panels showing last 15 lines per failure (full with `-v`) - **CLI polish** — yellow section headers, cyan args via rich-argparse; bare `conform` shows help + env-var table - **MkDocs eng docs** — Material theme (blue), covers architecture, usage, config, multi-runtime - **py/bin/conform** — wrapper script for convenience - **Blocking I/O audit** — all async paths verified safe - **Releasekit fixes** — added `conform` to `internal_tools` group (fixes ungrouped_packages and publish_classifier_consistency warnings); added Changelog URL to genkit `pyproject.toml` ### Conformance test results Running `genkit dev:test-model` directly for `google-genai` with credentials: | Model | Tests | Result | |-------|-------|--------| | `gemini-2.5-flash` | Tool Request, Structured Output, Multiturn, System Role, Image Base64, Image URL, Video YouTube | **7/7 ✅** | | `gemini-2.5-pro` | All 7 tests | 0/7 ❌ (aborted — race condition) | | `gemini-3-pro-preview` | All 10 tests | 0/10 ❌ (aborted — race condition) | | `imagen-4.0-generate-001` | Image Output | 0/1 ❌ (aborted — race condition) | | `gemini-2.5-flash-preview-tts` | TTS | 0/1 ❌ (aborted — race condition) | **Summary: 7 passed, 19 failed** — All 7 `gemini-2.5-flash` tests pass. The 19 failures are all "aborted" due to a **genkit CLI race condition**: `genkit dev:test-model` dispatches test requests for models before the Python reflection server has finished registering their actions, causing 404s on `/api/runAction`. The last model in the sequence (`gemini-2.5-flash`) passes because the server is fully ready by then. ### Known upstream issues (not addressed in this PR) 1. **`_base_async.py` graceful shutdown** — The `dev_runner` raises `RuntimeError: User coroutine finished without a result.` when SIGTERM cancels a conformance entry point. This is a core framework bug that needs a separate PR. It doesn't affect test correctness but produces noisy tracebacks on shutdown. 2. **Genkit CLI action discovery race** — `genkit dev:test-model` starts testing models before the Python runtime has registered all actions with the reflection server. Models tested early get 404s and are marked "aborted". This is a timing issue in the Node.js CLI, not the Python SDK. ### Lint status `bin/lint` passes with 0 errors.
1 parent 4f5a910 commit 952ee90

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+3210
-183
lines changed

py/GEMINI.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
| Broad type ignores | No `# type: ignore` without codes | ✅ Automated |
3535
| Python classifiers | All packages have 3.10-3.14 classifiers | ✅ Automated |
3636
| Namespace `__init__.py` | Plugins must not have `__init__.py` in `genkit/` or `genkit/plugins/` | ✅ Automated |
37+
| Model conformance specs | Model plugins have `model-conformance.yaml` + `conformance_entry.py` | ✅ Automated |
3738

3839
**Release Checks** (`py/bin/release_check`):
3940

py/bin/check_consistency

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@
3838
# 17. Python version classifiers (3.10-3.14)
3939
# 18. Namespace package __init__.py (plugins must not have __init__.py in genkit/ or genkit/plugins/)
4040
# 19. Test file basename uniqueness (no duplicate basenames across test directories)
41+
# 20. CHANGELOG.md files exist
42+
# 21. Model conformance spec coverage (model plugins → conformance spec + entry point)
4143

4244
set -euo pipefail
4345

@@ -59,7 +61,7 @@ PY_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
5961
cd "$PY_DIR"
6062

6163
# Total number of checks
62-
TOTAL_CHECKS=20
64+
TOTAL_CHECKS=21
6365

6466
echo -e "${BLUE}=== Genkit Python Consistency Checks ===${NC}"
6567
echo ""
@@ -675,6 +677,18 @@ else
675677
echo -e " ${YELLOW}!${NC} Some packages are missing CHANGELOG.md files (warnings only)"
676678
fi
677679
echo ""
680+
681+
# -----------------------------------------------------------------------------
682+
# Check 21: Model Conformance Spec Coverage
683+
# -----------------------------------------------------------------------------
684+
echo -e "${BLUE}[21/$TOTAL_CHECKS] Checking model conformance spec coverage...${NC}"
685+
# Delegated to the Python-based ``conform check-model`` tool which performs
686+
# the same checks (model_info.py scanning + additional-model-plugins from
687+
# TOML config) and prints output in the same colored format.
688+
if ! uv run --active conform check-plugin; then
689+
ERRORS=$((ERRORS + 1))
690+
fi
691+
echo ""
678692
# -----------------------------------------------------------------------------
679693
# Summary
680694
# -----------------------------------------------------------------------------

py/bin/conform

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#!/usr/bin/env bash
2+
# Copyright 2026 Google LLC
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
#
16+
# SPDX-License-Identifier: Apache-2.0
17+
18+
# Wrapper script for the ``conform`` tool.
19+
#
20+
# Usage:
21+
# py/bin/conform check-model --all Run all model conformance tests
22+
# py/bin/conform check-plugin Check plugins have conformance files
23+
# py/bin/conform list List plugins and env-var readiness
24+
# py/bin/conform --help Show help
25+
26+
set -euo pipefail
27+
28+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
29+
PY_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
30+
31+
exec uv run --directory "${PY_DIR}" --active conform "$@"

py/bin/test-model-conformance

Lines changed: 0 additions & 148 deletions
This file was deleted.

py/engdoc/model-conformance-roadmap.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,24 @@ phase are independent and should run in parallel** for fastest completion.
187187
|------|-------------|------------|---------|--------|
188188
| `runner-script` | Shell script to orchestrate per-plugin conformance test runs | All Phase 1 tasks | `py/bin/test-model-conformance` | ✅ Done |
189189

190+
### Phase 2.5: Spec Audit + Model Updates ✅ COMPLETE
191+
192+
| Task | Description | File(s) | Status |
193+
|------|-------------|---------|--------|
194+
| `audit-specs` | Verified all 11 plugin specs against official provider documentation (Feb 11, 2026). Fixed model names, corrected Supports flags, added missing models. Total: 24 models across 11 plugins. | All `model-conformance.yaml` files | ✅ Done |
195+
196+
**Changes made during audit:**
197+
198+
| Plugin | Before | After | Changes |
199+
|--------|--------|-------|---------|
200+
| **anthropic** | 2 models | 4 models | Added claude-sonnet-4-5, claude-opus-4-6 |
201+
| **deepseek** | 1 model (no structured-output) | 2 models | Added structured-output to chat, added deepseek-reasoner (no tools) |
202+
| **xai** | 1 model (grok-3, legacy) | 2 models | Replaced grok-3 → grok-4-fast-non-reasoning, added grok-2-vision-1212 |
203+
| **mistral** | 1 model (no vision) | 2 models | Added vision tests, added mistral-large-latest |
204+
| **amazon-bedrock** | Missing structured-output | Fixed | Added structured-output, streaming-structured-output |
205+
| **cloudflare** | Missing tool-request | Fixed | Added tool-request, streaming-multiturn |
206+
| **ollama** | Missing tool-request, vision | Fixed | Added tool-request, input-image-base64 |
207+
190208
### Phase 3: Validation ⏳ PENDING
191209

192210
| Task | Description | Depends On | File(s) | Status |

py/justfile

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,14 @@ validate-docs:
136136

137137
# --- Model Conformance -------------------------------------------------
138138

139-
# Run model conformance tests.
139+
# Run model conformance tests in parallel (see `conform check-model --help`).
140140
test-conformance *ARGS:
141-
"{{ py_dir }}/bin/test-model-conformance" {{ ARGS }}
141+
uv run --directory "{{ py_dir }}" --active conform check-model {{ ARGS }}
142+
143+
# Check that every model plugin has conformance files.
144+
check-conformance:
145+
uv run --directory "{{ py_dir }}" --active conform check-plugin
146+
147+
# List available conformance plugins and env-var readiness.
148+
list-conformance:
149+
uv run --directory "{{ py_dir }}" --active conform list

py/pyproject.toml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ dependencies = [
4242
"genkit-plugin-mistral",
4343
"genkit-plugin-huggingface",
4444
"genkit-plugin-observability",
45+
# Internal tools (private, not published)
46+
"conform",
4547
"liccheck>=0.9.2",
4648
"setuptools>=75.0.0,<82", # Required by liccheck (provides pkg_resources, removed in setuptools 82+)
4749
"mcp>=1.25.0",
@@ -233,10 +235,12 @@ genkit-plugin-observability = { workspace = true }
233235
genkit-plugin-ollama = { workspace = true }
234236
genkit-plugin-vertex-ai = { workspace = true }
235237
genkit-plugin-xai = { workspace = true }
238+
# Internal tools (private, not published)
239+
conform = { workspace = true }
236240

237241
[tool.uv.workspace]
238242
exclude = ["*/shared", "samples/sample-test", "testapps/*"]
239-
members = ["packages/*", "plugins/*", "samples/*"]
243+
members = ["packages/*", "plugins/*", "samples/*", "tools/conform"]
240244

241245

242246
# Ruff checks and formatting.
@@ -453,6 +457,7 @@ root = [
453457
# Tools
454458
"tools/releasekit/src", # For releasekit package imports
455459
"tools/releasekit", # For test imports (pythonpath = ["."])
460+
"tools/conform/src", # For conform package imports
456461
]
457462

458463
# Pyright type checking configuration.
@@ -499,6 +504,7 @@ extraPaths = [
499504
"plugins/xai/src",
500505
# Tools
501506
"tools/releasekit/src",
507+
"tools/conform/src",
502508
]
503509
pythonVersion = "3.10"
504510
reportMissingImports = true
@@ -547,6 +553,7 @@ project_includes = [
547553
# Tools
548554
"tools/releasekit/src/**/*.py",
549555
"tools/releasekit/tests/**/*.py",
556+
"tools/conform/src/**/*.py",
550557
]
551558

552559
# Search path for first-party code import resolution.
@@ -563,6 +570,7 @@ search-path = [
563570
# Tools
564571
"tools/releasekit/src",
565572
"tools/releasekit",
573+
"tools/conform/src",
566574
]
567575
# Ignore missing imports for namespace packages - pyrefly can't resolve PEP 420
568576
# namespace packages but these imports work at runtime.

py/releasekit.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ google_plugins = [
4747
"genkit-plugin-google-genai",
4848
"genkit-plugin-vertex-ai",
4949
]
50-
internal_tools = []
50+
internal_tools = ["conform"]
5151
samples = [
5252
"dev-local-vectorstore-hello",
5353
"framework-context-demo",

0 commit comments

Comments
 (0)