Update ML Model List #921

tawnymanticore · 2025-12-19T17:05:09Z

What does this PR do?

Merging main into remote_config. has nemotron, gemini 3 pro enablement with vertex, and gemini 3 flash

Summary by CodeRabbit

New Features
- Radar/radar‑chart comparison and a polished "No Data" view.
- Finetune models now surface as run configurations in evals.
- "Add Subtopics" modal to generate nested topics.
- Option to inject a custom prompt builder into adapters.
Improvements
- Interactive chart legend, formatting, and UI refinements.
- Added new model entries (Gemini 3 Flash, Nemotron 3 Nano, GLM 4.7).
- Updated Vertex AI guidance to prefer "global".
- Hide Task Description on task-creation flow.
Chores
- Tooling/typecheck and CI adjustments; various ignore patterns and config updates.
Bug Fixes
- Prevented accidental mutation when masking sensitive config data.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Note

Adds a radar comparison chart and UX tweaks, surfaces completed finetunes as run configs in evals, introduces Gemini 3 Flash and Nemotron 3 Nano (incl. Vertex support), and replaces Pyright with Ty across CI and tooling.

Frontend (Evals UI):
- Radar Chart: New compare_radar_chart.svelte with legend formatting and no-data state; enhanced scatter chart legend/positioning and reactive selection.
- Loading/selection fixes and UX tweaks in compare page; adds chart_no_data.svelte.
Frontend (Synth Data):
- Nested Subtopics: Modal and logic to generate subtopics for all leaf topics; separate run-config refs; buttons wired in synth flow.
API/Backend:
- Run Configs: get_all_run_configs and task_run_config_from_id support finetune-derived run configs (completed only); removed legacy task_run_configs route; endpoints use helper.
- Typed properties for kiln task tool server; minor typing fixes in finetune API and MCP tool meta.
Core Library:
- Models: Add gemini_3_flash, Vertex entries for Gemini 3 Pro/Flash, and nemotron_3_nano; enum updates.
- Finetunes: ID helpers; provider lookup by base model; parser selection prefers base model parser; tests added.
- Adapters: Allow custom prompt builder via AdapterConfig.
- Typing/Utils: Numerous type hints/ignore fixes; prevent mutation in Config.settings(hide_sensitive=True).
CI/Tooling:
- Replace Pyright with Ty in workflows, hooks, checks, and pyproject; add Ty config; remove pyrightconfig.json.
- Desktop build workflow triggers on main and releases.
Misc:
- API schema/types updated (remove task_run_configs); small lockfile bumps; tessl/cursor configs; new ignores.

^{Written by Cursor Bugbot for commit d0a10c7. This will update automatically on new commits. Configure here.}

…der-gets-wiped-after-app-restart

Eval with new fine-tune data model

…n-in-new-task. People are confusing description and prompt. Don't need it.

…_create Remove Task Description field from create flow

Fewer builds: we don't need to be running builds on every push.

Add tessl to Kiln repo

Add hooks-mcp to .cursor/mcp.json

Fix: don't save test files into root. Use tmp

…sing-data

Mike Wants a Chart

* add gemini flash + nemotron 3 * adding gemini 3 flash and pro to vertex. and adding nemotron 3 to ollama * adding alias for nemotron 3

coderabbitai · 2025-12-19T17:05:23Z

Walkthrough

Includes finetune-backed run-config handling in eval APIs, adds radar-chart and no-data UI for eval comparisons, implements nested subtopic generation UI and exported modal actions, updates model catalog and adapter prompt-builder injection, and replaces Pyright with Ty in CI/tooling plus MCP/tessl config additions.

Changes

Cohort / File(s)	Summary
MCP & tessl config `.cursor/mcp.json`, `.cursor/rules/.gitignore`, `.tessl/.gitignore`, `AGENTS.md`, `tessl.json`	Adds MCP server definitions (tessl, HooksMCP), tessl dependency manifest, agent rules reference, and new gitignore patterns for tessl artifacts.
CI / Typecheck / Tooling `.github/workflows/build_desktop.yml`, `.github/workflows/build_and_test.yml`, `.github/workflows/format_and_lint.yml`, `checks.sh`, `hooks_mcp.yaml`, `pyproject.toml`, `pyrightconfig.json`	Replaces Pyright with Ty across workflows and scripts, updates workflow triggers/steps and ruff install steps, removes Pyright config, and updates dev dependencies and checks.
Eval API — finetune integration `app/desktop/studio_server/eval_api.py`, `libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py`, `libs/core/kiln_ai/adapters/provider_tools.py`	Adds finetune-run-config ID utilities, parser-from-finetune and built-in provider lookup; introduces `get_all_run_configs()` and finetune-aware run-config lookup usages.
Eval API tests `app/desktop/studio_server/test_eval_api.py`, `libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py`, `libs/core/kiln_ai/adapters/test_provider_tools.py`	Tests updated/added to cover finetune run-config ID parsing, parser/provider utilities, and inclusion/filtering of completed finetune run configs.
Web API schema `app/web_ui/src/lib/api_schema.d.ts`	Removes the `/api/projects/{project_id}/tasks/{task_id}/task_run_configs` GET path and its operation definition.
Eval comparison UI — new components `app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte`, `app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte`	Adds a no-data placeholder and a new ECharts-based radar chart component with props, data assembly, and reactive updates.
Eval comparison UI — page & chart updates `app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte`, `app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte`	Refactors selection reactivity, integrates radar chart, adds legend formatting, hover emphasis, and ChartNoData usage.
Generation UI — nested topics `app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte`, `app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte`	Adds nested-subtopics modal/dialog, exported functions `open_add_nested_topics_modal()` and `add_nested_topics_to_all_leaf_topics()`, per-leaf generation flow and UI controls (Add Subtopics).
Task setup UI adjustments `app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte`, `app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte`	Hides Task Description on create flow and updates Vertex AI example location text from `us-central1` to `global`.
Model registry updates `libs/core/kiln_ai/adapters/ml_model_list.py`	Adds `gemini_3_flash`, `nemotron_3_nano`, and `glm_4_7` ModelName entries and corresponding KilnModel/provider entries (OpenRouter, Gemini API, Vertex, Ollama, SiliconFlow as applicable).
Adapter prompt-builder injection `libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`, `libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py`	Adds `AdapterConfig.prompt_builder`, updates `BaseAdapter.__init__()` to accept optional AdapterConfig to inject a custom prompt builder; adds test for custom builder behavior.
Adapter/test tweaks `libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py`, `libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py`, `libs/core/kiln_ai/adapters/test_prompt_builders.py`	Minor test updates: enum usage, import consolidation, and MockAdapter `_run` input type adjusted to `InputType`.
Config & tests hygiene `libs/core/kiln_ai/utils/config.py`, `libs/core/kiln_ai/utils/test_config.py`, `libs/core/kiln_ai/cli/commands/test_package_project.py`	Uses deepcopy when masking settings to avoid mutation, adds test verifying no mutation, and updates tests to use `tmp_path` for isolated artifact outputs.
Typing suppressions & small fixes assorted files: `app/desktop/studio_server/finetune_api.py`, `app/desktop/studio_server/tool_api.py`, `libs/core/kiln_ai/datamodel/`, `libs/core/kiln_ai/tools/`, `libs/server/kiln_server/*`	Adds multiple `type: ignore[...]` annotations to silence type-checkers, and switches one tool creation path to use `KilnTaskServerProperties` object.
Miscellaneous repo files `.gitignore`, `CONTRIBUTING.md`, `checks.sh`, `hooks_mcp.yaml`	Adds `CLAUDE.md` to .gitignore, updates CONTRIBUTING to reference `ty`, and updates checks script/tooling references (see CI cohort).

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

#868 — Finetune/run-config work: overlaps on finetune-aware run-config support, ID utilities, and provider/parser changes.
#909 — Eval comparison UI: overlaps on radar chart, ChartNoData, and compare view updates.
#890 — Generation UI: overlaps on nested-subtopics modal and exported generation functions.

Suggested labels

codex

Suggested reviewers

leonardmq
scosman
sfierro

Poem

🐇 I hopped through configs, tiny paws so fleet,

Finetunes tucked in, and radar blooms complete,
Subtopics burrowed, nested soft and neat,
New models, MCP—oh what a treat! ✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title 'Update ML Model List' is overly broad and does not accurately represent the primary changes; the changeset includes substantial frontend (radar chart, nested subtopics), backend (finetune run configs), and CI/tooling changes (Pyright to Ty migration) beyond just model list updates.	Revise the title to reflect the main change(s), such as 'Add finetune-backed run configs and new models (Gemini 3 Flash, Nemotron 3 Nano)' or 'Merge main: add radar chart, finetune run configs, new models, and Ty migration'.
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.24% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	❓ Inconclusive	The PR description provides a comprehensive summary via the CURSOR_SUMMARY note but lacks explicit sections matching the template requirements (Related Issues, CLA confirmation, and Checklists are missing or incomplete).	Complete the PR description by adding the Required Issues section, CLA confirmation statement, and filling all checklist items to meet the repository's template standards.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-19T17:07:30Z

📊 Coverage Report

Overall Coverage: 91%

Diff: origin/remote_config...HEAD

app/desktop/studio_server/eval_api.py (93.8%): Missing lines 501
app/desktop/studio_server/finetune_api.py (100%)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (100%)
libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (100%)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (100%)
libs/core/kiln_ai/adapters/provider_tools.py (100%)
libs/core/kiln_ai/datamodel/basemodel.py (100%)
libs/core/kiln_ai/datamodel/chunk.py (100%)
libs/core/kiln_ai/datamodel/dataset_filters.py (100%)
libs/core/kiln_ai/datamodel/dataset_split.py (100%)
libs/core/kiln_ai/datamodel/vector_store.py (100%)
libs/core/kiln_ai/tools/built_in_tools/math_tools.py (100%)
libs/core/kiln_ai/tools/rag_tools.py (100%)
libs/core/kiln_ai/utils/config.py (100%)

Summary

Total: 68 lines
Missing: 1 line
Coverage: 98%

Line-by-line

View line-by-line diff coverage

app/desktop/studio_server/eval_api.py

Lines 497-505

  497         # Load the list of run configs to use. Two options:
  498         run_configs: list[TaskRunConfig] = []
  499         if all_run_configs:
  500             # special case, we cannot directly lod task.run_configs(), we need to also get all finetune run configs which lives inside the finetune model
! 501             run_configs = get_all_run_configs(project_id, task_id)
  502         else:
  503             if len(run_config_ids) == 0:
  504                 raise HTTPException(
  505                     status_code=400,

📊 HTML Coverage Report - Interactive coverage report
📈 Diff Coverage Report - Detailed diff analysis
Github Actions Run - View the full coverage report

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (8)

app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte (1)
48-51: Consider conditionally excluding description when creating.

Both warn_before_unload (line 50) and has_edits() (line 171) check task.description even though the field is now hidden during creation. While this doesn't break functionality (description will be empty), you could improve consistency by conditionally excluding the description check when creating is true.
Optional refinement

For warn_before_unload (around line 50):
 $: warn_before_unload =
   !saved &&
-  ([task.name, task.description, task.instruction].some((value) => !!value) ||
+  ([task.name, ...(editing ? [task.description] : []), task.instruction].some((value) => !!value) ||
     task.requirements.some((req) => !!req.name || !!req.instruction))
For has_edits() (around line 171):
 return (
   !!task.name ||
-  !!task.description ||
+  (editing && !!task.description) ||
   !!task.instruction ||
   !!task.thinking_instruction ||
Also applies to: 165-177
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (1)
735-790: Dialog UI looks good with one optional enhancement.

The nested topics dialog is well-structured with proper loading and error states. The conditional rendering of RunConfigComponent based on task availability is defensive.
Optional: Add disabled state to button during operation

While the loading spinner at line 740 prevents the button from being visible during operation, adding a disabled state provides extra protection against race conditions:
       <button
         class="btn mt-2 btn-primary"
+        disabled={adding_nested_topics || !guidance_data.task}
         on:click={add_nested_topics_to_all_leaf_topics}
       >
         Generate Subtopics
       </button>
This prevents rapid double-clicks before the loading state renders.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte (3)
14-31: Type and prop definitions are well-structured.

The ComparisonFeature type is duplicated from the parent page. Consider extracting this to a shared types file if it's used in multiple places, but for now this is acceptable.

36-71: Data key filtering and helper functions are correct.

Excluding the cost section (kiln_cost_section) from radar chart is appropriate since cost metrics have vastly different scales than evaluation scores.

Note: getRunConfigDisplayName and buildLegendFormatter are duplicated from compare_chart.svelte. Consider extracting these to a shared utility if the pattern continues to expand.

217-226: Reactive dependency tracking works but is slightly obscure.

The pattern (model_info || model_info === null) always evaluates to true—it's a technique to ensure Svelte tracks model_info as a dependency. A simpler approach would be to reference the values directly in the block body or use an explicit dependency array pattern.
Alternative: Clearer dependency tracking
-  $: if (
-    chartInstance &&
-    comparisonFeatures &&
-    selectedRunConfigIds &&
-    (model_info || model_info === null) &&
-    (prompts || prompts === null)
-  ) {
+  $: if (chartInstance && comparisonFeatures && selectedRunConfigIds) {
+    // Track async-loaded dependencies
+    void model_info
+    void prompts
     updateChart()
   }
libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (1)
5-10: Consider input validation for ID components.

The function correctly builds the finetune run config ID, but doesn't validate that project_id, task_id, or finetune_id don't contain the :: separator. While this is consistent with similar functions in the codebase (e.g., parse_custom_model_id), it could lead to ambiguous IDs if the inputs contain ::.
🔎 Optional validation to prevent separator in ID components
 def finetune_run_config_id(project_id: str, task_id: str, finetune_id: str) -> str:
     """
     Build in-memory ID for run-config inside a finetune. Format: finetune_run_config::project_id::task_id::finetune_id
     project_id::task_id::finetune_id is Finetune.model_id()
     """
+    for component, name in [(project_id, "project_id"), (task_id, "task_id"), (finetune_id, "finetune_id")]:
+        if "::" in component:
+            raise ValueError(f"{name} cannot contain '::' separator: {component}")
     return f"finetune_run_config::{project_id}::{task_id}::{finetune_id}"
libs/core/kiln_ai/adapters/provider_tools.py (1)
117-137: LGTM! Correct lookup implementation.

The function correctly looks up a built-in provider by model ID and provider name, with proper enum handling and graceful None returns for not-found cases. The linear search is acceptable for the current scale of built-in models.
🔎 Optional: Performance optimization for frequent lookups

If this function is called frequently (e.g., in hot paths or loops), consider adding a module-level cache indexed by (provider_name, model_id):
# At module level
_provider_by_id_cache: dict[tuple[ModelProviderName, str], KilnModelProvider] = {}

def _build_provider_cache():
    """Build lookup cache on first use."""
    if _provider_by_id_cache:
        return
    for model in built_in_models:
        for provider in model.providers:
            if provider.model_id:
                _provider_by_id_cache[(provider.name, provider.model_id)] = provider

def built_in_provider_from_model_id(
    model_id: str, provider_name: ModelProviderName | str
) -> KilnModelProvider | None:
    """Look up a built-in provider entry by the provider and its provider-specific model ID."""
    try:
        provider_enum = (
            provider_name
            if isinstance(provider_name, ModelProviderName)
            else ModelProviderName(provider_name)
        )
    except ValueError:
        return None
    
    _build_provider_cache()
    return _provider_by_id_cache.get((provider_enum, model_id))
app/desktop/studio_server/eval_api.py (1)
99-114: Consider adding an explicit 404 for finetunes without run_config.

When a finetune is found but has no run_config set (line 102 check fails), the code falls through to the generic 404 at line 111-113. While functional, this could be confusing for debugging since the finetune exists but just lacks a run_config.
Suggested improvement for clarity
     # special case for finetune run configs, it's inside the finetune model
     if run_config_id.startswith("finetune_run_config::"):
         finetune = finetune_from_finetune_run_config_id(run_config_id)
         if finetune.run_config is not None:
             return TaskRunConfig(
                 id=finetune_run_config_id(project_id, task_id, str(finetune.id)),
                 name=finetune.name,
                 description=finetune.description,
                 run_config_properties=finetune.run_config,
                 parent=task,  # special case, we need to reference the task model
             )
+        else:
+            raise HTTPException(
+                status_code=404,
+                detail=f"Finetune {finetune.id} exists but has no run_config set.",
+            )

     raise HTTPException(
         status_code=404,
         detail=f"Task run config not found. ID: {run_config_id}",
     )

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbbdbc8 and 757a603.

⛔ Files ignored due to path filters (1)

app/web_ui/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (30)

.cursor/mcp.json (1 hunks)
.cursor/rules/.gitignore (1 hunks)
.github/workflows/build_desktop.yml (1 hunks)
.tessl/.gitignore (1 hunks)
AGENTS.md (1 hunks)
app/desktop/studio_server/eval_api.py (6 hunks)
app/desktop/studio_server/test_eval_api.py (9 hunks)
app/web_ui/src/lib/api_schema.d.ts (0 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte (5 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte (1 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte (9 hunks)
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte (1 hunks)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (6 hunks)
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte (1 hunks)
app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (1 hunks)
app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte (1 hunks)
libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (1 hunks)
libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py (1 hunks)
libs/core/kiln_ai/adapters/ml_model_list.py (3 hunks)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (3 hunks)
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py (2 hunks)
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (2 hunks)
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py (1 hunks)
libs/core/kiln_ai/adapters/provider_tools.py (3 hunks)
libs/core/kiln_ai/adapters/test_prompt_builders.py (2 hunks)
libs/core/kiln_ai/adapters/test_provider_tools.py (11 hunks)
libs/core/kiln_ai/cli/commands/test_package_project.py (2 hunks)
libs/core/kiln_ai/utils/config.py (2 hunks)
libs/core/kiln_ai/utils/test_config.py (1 hunks)
tessl.json (1 hunks)

💤 Files with no reviewable changes (1)

app/web_ui/src/lib/api_schema.d.ts

🧰 Additional context used

📓 Path-based instructions (9)

app/web_ui/**/*.svelte

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

app/web_ui/**/*.svelte: Use Svelte v4 (not v5) for the web frontend UI
Use DaisyUI for UI components in the web frontend

Use Svelte v4, not v5

Files:

app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte

app/web_ui/**/*.{svelte,ts}

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Tailwind CSS for styling in the web frontend

Files:

app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte

app/web_ui/**/*.{svelte,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Tailwind CSS and DaisyUI for styling the frontend

Files:

app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte

libs/core/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Python 3.10+ for the core library (libs/core) and Python 3.13 for the desktop app

Use Python 3.10+ for the core library (libs/core)

Files:

libs/core/kiln_ai/adapters/test_prompt_builders.py
libs/core/kiln_ai/utils/test_config.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
libs/core/kiln_ai/adapters/test_provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py
libs/core/kiln_ai/adapters/provider_tools.py
libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py
libs/core/kiln_ai/utils/config.py
libs/core/kiln_ai/cli/commands/test_package_project.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use pytest for testing Python code

Files:

libs/core/kiln_ai/adapters/test_prompt_builders.py
libs/core/kiln_ai/utils/test_config.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
libs/core/kiln_ai/adapters/test_provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
app/desktop/studio_server/test_eval_api.py
libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py
libs/core/kiln_ai/cli/commands/test_package_project.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation

**/*.py: Use Pydantic v2, not v1
Use asyncio for asynchronous operations in Python

Files:

libs/core/kiln_ai/adapters/test_prompt_builders.py
libs/core/kiln_ai/utils/test_config.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
libs/core/kiln_ai/adapters/test_provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
app/desktop/studio_server/test_eval_api.py
libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py
libs/core/kiln_ai/adapters/provider_tools.py
libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py
libs/core/kiln_ai/utils/config.py
app/desktop/studio_server/eval_api.py
libs/core/kiln_ai/cli/commands/test_package_project.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest for testing in Python

Files:

libs/core/kiln_ai/adapters/test_prompt_builders.py
libs/core/kiln_ai/utils/test_config.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
libs/core/kiln_ai/adapters/test_provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
app/desktop/studio_server/test_eval_api.py
libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py
libs/core/kiln_ai/cli/commands/test_package_project.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py

app/desktop/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use Python 3.13 for the desktop application (app/desktop)

Files:

app/desktop/studio_server/test_eval_api.py
app/desktop/studio_server/eval_api.py

{libs/server,app/desktop/studio_server}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI for REST server implementation

Files:

app/desktop/studio_server/test_eval_api.py
app/desktop/studio_server/eval_api.py

🧠 Learnings (16)

📓 Common learnings

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

📚 Learning: 2025-12-15T22:02:37.369Z

Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-15T22:02:37.369Z
Learning: Follow the instructions in .tessl/RULES.md

Applied to files:

.cursor/rules/.gitignore
AGENTS.md
.tessl/.gitignore

📚 Learning: 2025-08-08T16:13:26.526Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

Applied to files:

libs/core/kiln_ai/adapters/test_prompt_builders.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
libs/core/kiln_ai/adapters/test_provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
libs/core/kiln_ai/adapters/provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-12-06T07:53:22.155Z

Learnt from: chiang-daniel
Repo: Kiln-AI/Kiln PR: 781
File: app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte:200-222
Timestamp: 2025-12-06T07:53:22.155Z
Learning: In app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte, the process_model_change function intentionally overrides selected_run_config_id when a model has a model_specific_run_config property. This is by design to ensure fine-tuned models automatically use their associated run configuration (the configuration they were trained with), even if triggered by selecting a saved run config that uses that model.

Applied to files:

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte

📚 Learning: 2025-11-24T19:13:16.966Z

Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: .cursor/rules/project.mdc:0-0
Timestamp: 2025-11-24T19:13:16.966Z
Learning: Applies to app/web_ui/**/*.svelte : Use DaisyUI for UI components in the web frontend

Applied to files:

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte

📚 Learning: 2025-10-22T04:31:20.500Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 739
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte:170-177
Timestamp: 2025-10-22T04:31:20.500Z
Learning: In Svelte components for the Kiln project, leonardmq prefers tracking previous query parameter values and comparing them in reactive blocks (rather than just checking for existence) to avoid duplicate execution on initial load while still reacting to subsequent navigation changes. This pattern supports both current behavior and future feature additions like mode switching.

Applied to files:

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte

📚 Learning: 2025-09-25T07:20:11.459Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 650
File: app/web_ui/src/routes/(app)/docs/library/[project_id]/[document_id]/+page.svelte:19-19
Timestamp: 2025-09-25T07:20:11.459Z
Learning: When analyzing relative import paths in SvelteKit route structures, be more careful to verify the actual directory structure exists before suggesting corrections. The import path ../../../../generate/[project_id]/[task_id]/table_button.svelte from app/web_ui/src/routes/(app)/docs/library/[project_id]/[document_id]/+page.svelte correctly resolves to the existing file at app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/table_button.svelte.

Applied to files:

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte

📚 Learning: 2025-10-21T00:23:00.770Z

Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 737
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte:287-301
Timestamp: 2025-10-21T00:23:00.770Z
Learning: In Kiln AI, for topic/inputs generation flows (such as in generate_samples_modal.svelte and generated_data_node.svelte), the task input_json_schema is always set. Therefore, hardcoding requires_structured_output={true} in RunConfigComponent for these flows is correct and does not need to be conditional like it is for output generation.

Applied to files:

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte

📚 Learning: 2025-08-08T16:19:20.074Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1736-1741
Timestamp: 2025-08-08T16:19:20.074Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwq_32b with ModelProviderName.siliconflow_cn (model_id "Qwen/QwQ-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via tests/responses that SiliconFlow returns outputs that parse correctly without an R1 parser. Parser usage remains provider-specific and should only be added when validated.

Applied to files:

libs/core/kiln_ai/adapters/test_provider_tools.py
libs/core/kiln_ai/adapters/provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-08-08T15:50:45.334Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

libs/core/kiln_ai/adapters/test_provider_tools.py
libs/core/kiln_ai/adapters/provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-08-08T16:14:54.346Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1979-1983
Timestamp: 2025-08-08T16:14:54.346Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py, for ModelName.deepseek_r1_distill_qwen_32b on provider siliconflow_cn (model_id "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via testing that SiliconFlow returns final outputs that parse without an R1 parser, and reasoning may be omitted under structured output; rely on reasoning_optional_for_structured_output=True.

Applied to files:

libs/core/kiln_ai/adapters/test_provider_tools.py
libs/core/kiln_ai/adapters/provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-10-24T05:01:15.465Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 736
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/qna/qna_ui_store.ts:552-585
Timestamp: 2025-10-24T05:01:15.465Z
Learning: In the Q&A generation workflow (app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/qna/qna_ui_store.ts), chunking is intentionally only ensured for target="all" generation. When users generate Q&A for a specific document or part, they do not alter the chunking strategy; the UI only surfaces document/part-level generation actions after the user has already run the global chunking flow (or chosen to use full documents without chunking). This is by design to separate the chunking decision from per-item generation.
<!-- [/add_learning]

Applied to files:

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte

📚 Learning: 2025-08-08T16:15:20.796Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:2434-2439
Timestamp: 2025-08-08T16:15:20.796Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwen_3_8b_no_thinking with ModelProviderName.siliconflow_cn (model_id "Qwen/Qwen3-8B"), the qwen3_style_no_think formatter is not required: SiliconFlow’s non‑thinking Qwen3‑8B returns clean output without <answer> wrappers. Formatter/parser usage is provider-specific and should be added only when verified by tests/responses.

Applied to files:

libs/core/kiln_ai/adapters/provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-08-22T11:15:53.584Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/adapters/embedding/embedding_registry.py:14-20
Timestamp: 2025-08-22T11:15:53.584Z
Learning: In the Kiln AI codebase, model_provider_name fields in datamodels like EmbeddingConfig are typed as ModelProviderName enum (which inherits from str, Enum), not as plain strings. Therefore, accessing .value on these fields is valid and returns the string representation of the enum value.

Applied to files:

libs/core/kiln_ai/adapters/provider_tools.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-10-28T05:25:09.283Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 757
File: libs/core/kiln_ai/adapters/reranker_list.py:39-42
Timestamp: 2025-10-28T05:25:09.283Z
Learning: In the Kiln codebase (Python), model family and model name fields in reranker and similar model definition classes (like KilnRerankerModel in libs/core/kiln_ai/adapters/reranker_list.py) should use plain str types rather than Enum types. This is because users receive over-the-air (OTA) updates for new models, and using Enums would break when clients with older builds encounter new model families their Enum definitions don't include. Strings provide forward compatibility for the OTA update mechanism.

Applied to files:

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-09-10T08:06:53.717Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 585
File: libs/core/kiln_ai/adapters/ml_embedding_model_list.py:32-33
Timestamp: 2025-09-10T08:06:53.717Z
Learning: In the Kiln AI codebase, friendly_name fields in embedding model configurations should maintain the original model naming conventions from providers (e.g., "gemini-embedding-001", "text-embedding-3-small") for human readability, rather than following snake_case conventions like the enum values.

Applied to files:

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
libs/core/kiln_ai/adapters/ml_model_list.py

🧬 Code graph analysis (9)

libs/core/kiln_ai/adapters/test_provider_tools.py (2)

libs/core/kiln_ai/adapters/provider_tools.py (2)

built_in_provider_from_model_id (117-136)

parser_from_finetune (259-276)

libs/core/kiln_ai/adapters/ml_model_list.py (1)

ModelParserID (216-222)

libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py (3)

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (2)

AdapterConfig (36-52)

model_provider (87-102)

libs/core/kiln_ai/adapters/run_output.py (1)

RunOutput (10-14)

libs/core/kiln_ai/datamodel/run_config.py (1)

RunConfigProperties (24-62)

app/desktop/studio_server/test_eval_api.py (2)

app/desktop/studio_server/eval_api.py (2)

get_all_run_configs (66-88)

task_run_config_from_id (91-114)

libs/core/kiln_ai/datamodel/finetune.py (1)

Finetune (23-127)

libs/core/kiln_ai/adapters/provider_tools.py (2)

libs/core/kiln_ai/datamodel/datamodel_enums.py (1)

ModelProviderName (84-106)

libs/core/kiln_ai/adapters/ml_model_list.py (2)

KilnModelProvider (233-305)

ModelParserID (216-222)

libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py (1)

libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (2)

finetune_run_config_id (5-10)

finetune_from_finetune_run_config_id (13-23)

libs/core/kiln_ai/cli/commands/test_package_project.py (2)

libs/core/kiln_ai/cli/commands/package_project.py (1)

package_project (587-729)

libs/core/kiln_ai/datamodel/project.py (1)

tasks (42-43)

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)

libs/core/kiln_ai/datamodel/datamodel_enums.py (1)

ModelProviderName (84-106)

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)

config (47-58)

libs/core/kiln_ai/adapters/ml_model_list.py (1)

libs/core/kiln_ai/datamodel/datamodel_enums.py (3)

ModelProviderName (84-106)

StructuredOutputMode (27-49)

KilnMimeType (109-133)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Generate Coverage Report

🔇 Additional comments (58)

libs/core/kiln_ai/cli/commands/test_package_project.py (1)

653-709: LGTM! Excellent test isolation improvement.

The changes properly utilize pytest's tmp_path fixture to ensure each test writes to its own isolated temporary directory. This eliminates potential conflicts from hard-coded paths and follows pytest best practices. The updates are consistent with existing patterns in other test classes within this file.

.cursor/mcp.json (1)

8-12: No action needed. The HooksMCP configuration is valid without an explicit type field. Cursor MCP servers using STDIO transport require only "command" and "args" fields, and stdio is the default transport for command-based servers. The explicit "type": "stdio" on the tessl server is optional; both configurations are equivalent and correct.

Likely an incorrect or invalid review comment.

.github/workflows/build_desktop.yml (2)

4-6: LGTM! Good additions for workflow flexibility.

Adding workflow_dispatch for manual triggering and the release trigger for automated builds on release creation are both best practices for desktop application CI/CD workflows.

7-9: Verify branch restriction aligns with remote_config workflow needs.

The push trigger is now restricted to the main branch only. After merging this into remote_config, the workflow file on that branch will only trigger builds when pushing to main, not when pushing to remote_config.

If remote_config is a long-lived branch where desktop builds are needed independently, consider adding it to the branches list. If builds should only run from main, then this restriction is appropriate.

app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte (1)

284-294: LGTM! Correctly hides Task Description during creation.

The conditional rendering logic properly restricts the Task Description field to edit mode only, aligning with the stated objective to simplify the creation UI.

app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (1)

133-133: No issues identified. Google and Partner models and generative AI features on Vertex AI are exposed as specific regional endpoints and a global endpoint, and location='global' is valid for the Google Gen AI SDK and Vertex AI SDK for Python. Additionally, all Gemini 3 and Gemini 2.5 preview models released after June 2025 are only available in the global location, which confirms that global location provides access to the newest Gemini models. The instructional text accurately recommends using "global" as a valid Google Cloud location for Vertex AI Gemini models.

libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py (1)

5-6: LGTM! Clean import consolidation.

The consolidation of imports improves readability without any functional changes.

libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py (2)

6-11: LGTM! Import additions support new test.

The additional imports for AdapterConfig and BasePromptBuilder are necessary for testing the custom prompt builder injection feature.

629-659: LGTM! Well-structured test for custom prompt builder injection.

The test effectively validates that:

Custom prompt builders can be injected via AdapterConfig

The adapter uses the injected builder instead of the default

The custom builder's output appears in the formatter

The test follows pytest conventions and properly mocks dependencies.

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (5)

15-15: LGTM! Import addition supports new functionality.

The BasePromptBuilder import is necessary for the type annotation and prompt builder injection feature.

47-52: LGTM! Well-documented configuration option.

The prompt_builder field addition to AdapterConfig is well-designed:

Properly typed as optional

Clear documentation explains the injection mechanism and fallback behavior

Maintains backward compatibility

61-65: LGTM! Clear documentation of prompt building behavior.

The docstring update effectively documents the prompt building mechanism and how users can customize it via AdapterConfig.

71-71: LGTM! Backward-compatible signature update.

The optional config parameter maintains backward compatibility while enabling the new customization feature.

76-81: LGTM! Clean implementation of prompt builder injection.

The initialization logic correctly implements the precedence:

Use custom prompt builder from config if provided

Otherwise, load from run_config.prompt_id

This maintains backward compatibility while enabling the new customization point.

libs/core/kiln_ai/adapters/test_prompt_builders.py (2)

38-38: LGTM: Import addition supports type annotation update.

The InputType import is necessary for the updated method signature below.

80-80: The BaseAdapter interface correctly uses InputType, and the MockAdapter._run method signature at line 80 properly matches this pattern. InputType is a union type encompassing both structured types (Dict/List) and strings, making the signature change valid and type-safe.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte (1)

821-856: LGTM! Clean UI enhancement for nested topics workflow.

The conditional rendering based on topic presence is well-structured, and the button styling follows DaisyUI patterns appropriately. The new "Add Subtopics" button integrates smoothly into the existing step flow.

app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (5)

17-17: LGTM! Solid defensive pattern.

Separating the RunConfigComponent refs prevents null reference issues when components are conditionally mounted/unmounted. The explanatory comment is helpful.

Also applies to: 26-28

160-160: LGTM! Ref usage is consistent.

The updated references correctly use run_config_component_modal with proper null handling via optional chaining.

Also applies to: 684-684

257-279: LGTM! Clean recursive implementation.

The get_all_leaf_topics function correctly identifies and collects all leaf nodes (topics without subtopics) using proper recursion with path tracking.

281-289: LGTM! State management is clean.

The state variables and modal open function are well-structured. Resetting the error state when opening the modal provides good UX.

291-400: LGTM! Robust nested topics generation logic.

The function handles the nested topics workflow well:

Comprehensive validation upfront

Sequential processing with partial saves after each leaf topic

Proper error handling with KilnError

Analytics tracking

Sequential processing (vs. parallel) is a reasonable design choice here since it provides better error tracking and saves progress incrementally.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte (1)

1-21: Clean no-data state component.

The implementation is straightforward and reusable. Tailwind CSS styling is consistent with project conventions.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte (3)

566-579: Well-structured reactive state helpers.

The reactive declarations correctly derive loading and data availability states. The type guard (m): m is string ensures proper type narrowing for validSelectedModels.

921-934: Radar chart integration looks correct.

The CompareRadarChart is properly wired with the selected run configurations via validSelectedModels, while the existing CompareChart continues to show all run configurations. This distinction makes sense for the different visualization purposes.

602-609: Loading state logic is correct.

The compound condition properly guards the loading spinner: it only appears when models are selected, all are still loading, and no data has arrived yet. This prevents unnecessary spinners and flickering.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte (5)

73-93: Legend formatting implementation looks good.

The multi-line legend format with rich text styling provides useful context. Note that if multiple run configs share the same display name, the legend entries may collide, but this is a pre-existing constraint of the charting approach rather than a new issue.

154-157: Good interactivity enhancement.

The emphasis.scale: 2 provides clear visual feedback when hovering over data points, complementing the legend hover interactions.

205-227: Legend and layout updates are well-coordinated.

The legend positioning (left: "70%") with grid adjustment (right: "34%") provides adequate space for multi-line legend entries while keeping the chart readable.

262-301: Legend hover interactions are a nice UX addition.

The highlight/downplay actions on legend hover help users correlate legend entries with chart data points. The hasDataPoints reactive check correctly determines data availability.

367-370: No-data state handling is improved.

The condition axisOptions.length <= 1 || !hasDataPoints correctly covers both insufficient metrics and missing data scenarios, delegating to the reusable ChartNoData component.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte (3)

73-131: Radar chart data generation is well-implemented.

The 10% padding on max values (max * 1.1) ensures data points don't touch the radar boundary. The fallback to max = 1 when all values are 0 prevents potential edge cases. Null values are correctly replaced with 0 for radar visualization.

142-215: Chart update logic is comprehensive.

The radar chart configuration is well-thought-out with proper positioning to accommodate the legend. Clearing the chart when there's no data prevents stale visualizations.

249-263: Template rendering logic is correct.

The dataKeys.length >= 3 guard appropriately hides the radar chart when there aren't enough metrics for meaningful visualization. The ChartNoData fallback provides consistent UX.

libs/core/kiln_ai/adapters/ml_model_list.py (4)

130-130: LGTM: New model enum entries added correctly.

The new gemini_3_flash and nemotron_3_nano enum members follow the existing naming conventions and are properly integrated with their corresponding model configurations below.

Also applies to: 132-132

1496-1507: Configuration is correct and verified.

The Vertex provider configuration for Gemini 3 Pro Preview is accurate. The model_id "gemini-3-pro-preview" matches the official Vertex AI preview model naming, and Gemini 3 Pro is the most advanced reasoning Gemini model. The thinking_level="medium" balances speed and reasoning with some reasoning capability while prioritizing low-latency operations, which is appropriate for the model. The configuration of reasoning_capable and gemini_reasoning_enabled are correct for this model's capabilities on Vertex AI.

1508-1577: Configuration verified for Gemini 3 Flash across all three providers.

Gemini 3 Flash is accessible via the Gemini API, Vertex AI, and other platforms, confirming availability on all three providers in the code. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with structured output and configurable reasoning via thinking levels. The thinking_level parameter controls the amount of internal reasoning the model performs (minimal, low, medium, or high), which aligns with the thinking_level="medium" configuration in the Gemini API and Vertex providers.

The model enables complex video analysis, data extraction, and document analysis with near real-time performance, confirming suitability for data generation and evaluation tasks marked in the configuration. All listed multimodal MIME types for images, audio, and video are supported by the model according to documented capabilities.

1971-1991: Configuration is correct as-is.

The Nemotron 3 Nano model configuration follows established patterns and is properly verified:

Model is fully available on OpenRouter with open-weights, datasets, and recipes

Model is available on Ollama

Model supports configurable reasoning capabilities, justifying reasoning_capable=True for both providers

No parser configuration required, consistent with learnings that parsers are provider-specific and should only be added when validated

Model family classification as Llama aligns with the Nemotron product line

.tessl/.gitignore (1)

1-2: LGTM!

The ignore patterns for Tessl-related artifacts are appropriate.

.cursor/rules/.gitignore (1)

1-1: LGTM!

The ignore pattern correctly prevents generated Tessl files from being committed.

libs/core/kiln_ai/utils/config.py (2)

1-1: LGTM!

The copy import is correctly added to support the deepcopy operation.

263-263: Good defensive coding with deep copy.

Using copy.deepcopy(v) prevents unintended mutation of the underlying _settings when masking sensitive fields in nested structures. This is particularly important for list-based settings like openai_compatible_providers where the masking logic modifies nested dictionaries.

AGENTS.md (1)

45-47: LGTM!

The agent rules reference is clear and aligns with the Tessl workflow integration. Based on learnings, following the instructions in .tessl/RULES.md is the expected pattern.

libs/core/kiln_ai/utils/test_config.py (1)

284-317: Excellent test coverage for non-mutation behavior.

This test comprehensively verifies that masking sensitive fields via settings(hide_sensitive=True) does not mutate the underlying configuration data. The test correctly:

Sets up providers with sensitive api_key fields

Confirms masking works ([hidden] appears in masked view)

Validates original data remains intact and accessible

libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (1)

13-23: LGTM! Good error handling and validation.

The function correctly validates the ID prefix and extracts the model ID using removeprefix() (Python 3.9+, compatible with the project's 3.10+ requirement). The error message is clear and actionable.

libs/core/kiln_ai/adapters/provider_tools.py (2)

259-277: Good prioritization logic for parser selection.

The function correctly prioritizes the base model's parser over the data strategy parser, which makes sense for cases like R1 models fine-tuned without thinking data that still output thinking tokens. This aligns with the learnings that parser usage should be based on observed behavior in tests.

Based on learnings, parser selection is provider-specific and should reflect actual model behavior.

296-296: LGTM! Proper integration of enhanced parser logic.

The change to use parser_from_finetune(fine_tune) correctly integrates the new parser priority logic, ensuring that fine-tuned models inherit appropriate parsers from their base models when applicable.

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)

1173-1173: LGTM - Consistent enum usage for provider names.

Using ModelProviderName.openai instead of the string literal "openai" improves type safety and aligns with the enum-based provider naming pattern used throughout the codebase.

Also applies to: 1243-1243

libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py (1)

1-54: Well-structured test coverage for finetune run config ID utilities.

The tests appropriately cover:

ID format construction

Valid ID parsing with proper mock verification

Invalid ID handling with parametrized edge cases

libs/core/kiln_ai/adapters/test_provider_tools.py (3)

269-292: Good test coverage for built_in_provider_from_model_id.

Tests correctly verify:

Successful lookup returns the expected provider with correct attributes

Various invalid lookup scenarios return None (nonexistent model, wrong provider, invalid provider string)

1222-1275: Comprehensive tests for parser_from_finetune logic.

The three tests effectively cover the priority hierarchy:

Base model parser takes precedence when available

Falls back to data_strategy parser when model has no parser

Returns None when neither source provides a parser

74-74: Fixture updates align with new parser_from_finetune requirements.

Adding base_model_id to mock finetune fixtures ensures the new parser_from_finetune function can look up the base model provider to check for parser information.

Also applies to: 87-87, 100-100, 629-629, 965-965

app/desktop/studio_server/test_eval_api.py (4)

552-591: Good test for finetune run config retrieval via task_run_config_from_id.

The test properly:

Creates a Finetune with run_config

Mocks finetune_from_finetune_run_config_id

Verifies the returned TaskRunConfig has correct id, name, description, and properties

593-648: Thorough test for get_all_run_configs aggregation logic.

The test correctly validates that:

Regular run configs are included

Completed finetunes with run_config appear with the proper ID format

Incomplete finetunes (running, with fine_tune_model_id=None) are excluded

1990-2078: Comprehensive test covering all finetune status scenarios.

This test effectively validates the filtering logic for finetune run configs by testing:

Completed finetunes with run_config ✓ included

Running finetunes ✗ excluded (no fine_tune_model_id)

Unknown status finetunes ✗ excluded

Failed finetunes ✗ excluded

Completed without run_config ✗ excluded

737-737: Necessary fixture update to prevent test failures.

Adding mock_task.finetunes.return_value = [] ensures the test doesn't fail when get_all_run_configs attempts to iterate over finetunes.

app/desktop/studio_server/eval_api.py (2)

66-88: Well-implemented aggregation of run configs including finetunes.

The function correctly:

Fetches task's native run_configs

Filters finetunes to only include those with both run_config and fine_tune_model_id (completed and usable)

Constructs proper finetune_run_config_id format for each included finetune

7-10: Consistent integration of finetune run configs across all endpoints.

The imports and updates to get_run_configs, run_eval_config, and get_eval_config_score_summary properly leverage get_all_run_configs to ensure finetune-based run configs are included wherever run configs are aggregated.

Also applies to: 361-361, 500-501, 654-655

coderabbitai · 2025-12-19T17:10:21Z

tessl.json

+{
+  "name": "project",
+  "dependencies": {
+    "tessl/pypi-pillow": {
+      "version": "11.3.0"
+    },
+    "tessl/pypi-pyinstaller": {
+      "version": "6.15.0"
+    },
+    "tessl/pypi-certifi": {
+      "version": "2024.12.0"
+    },
+    "tessl/pypi-kiln-ai": {
+      "version": "0.22.1"
+    },
+    "tessl/pypi-pyright": {
+      "version": "1.1.0"
+    },
+    "tessl/pypi-pytest": {
+      "version": "8.4.0"
+    },
+    "tessl/pypi-pytest-xdist": {
+      "version": "3.8.0"
+    },
+    "tessl/pypi-python-dotenv": {
+      "version": "1.1.0"
+    },
+    "tessl/pypi-watchfiles": {
+      "version": "1.1.0"
+    },
+    "tessl/npm-floating-ui--dom": {
+      "version": "1.7.0"
+    },
+    "tessl/npm-sveltejs--kit": {
+      "version": "2.37.0"
+    },
+    "tessl/npm-sveltejs--vite-plugin-svelte": {
+      "version": "6.1.0"
+    },
+    "tessl/npm-tailwindcss--typography": {
+      "version": "0.5.0"
+    },
+    "tessl/npm-typescript-eslint--eslint-plugin": {
+      "version": "8.42.0"
+    },
+    "tessl/npm-typescript-eslint--parser": {
+      "version": "8.42.0"
+    },
+    "tessl/npm-daisyui": {
+      "version": "4.12.0"
+    },
+    "tessl/npm-echarts": {
+      "version": "6.0.0"
+    },
+    "tessl/npm-eslint-config-prettier": {
+      "version": "10.1.0"
+    },
+    "tessl/npm-eslint-plugin-svelte": {
+      "version": "3.12.0"
+    },
+    "tessl/npm-eslint": {
+      "version": "9.34.0"
+    },
+    "tessl/npm-highlight.js": {
+      "version": "11.11.0"
+    },
+    "tessl/npm-openapi-typescript": {
+      "version": "7.9.0"
+    },
+    "tessl/npm-postcss": {
+      "version": "8.5.0"
+    },
+    "tessl/npm-prettier": {
+      "version": "3.6.0"
+    },
+    "tessl/npm-svelte-check": {
+      "version": "3.8.0"
+    },
+    "tessl/npm-svelte": {
+      "version": "4.2.0"
+    },
+    "tessl/npm-tailwindcss": {
+      "version": "3.4.0"
+    },
+    "tessl/npm-typescript": {
+      "version": "5.9.0"
+    },
+    "tessl/npm-vite": {
+      "version": "7.1.0"
+    },
+    "tessl/npm-vitest": {
+      "version": "4.0.0"
+    }
+  }
+}


⚠️ Potential issue | 🔴 Critical

Multiple dependencies are outdated and require updates.

The manifest contains several outdated or invalid versions:

pyright: 1.1.0 is not a valid version; upgrade to 1.1.407

pillow: 11.3.0 is from July 2025; upgrade to 12.0.0+ (October 2025)

typescript: 5.9.0 should be updated to 5.9.3

vite: 7.1.0 should be updated to 7.3.0+ for latest fixes and security patches

Run dependency security audits after updating to ensure no known vulnerabilities exist in the chosen versions.

🤖 Prompt for AI Agents

In tessl.json (lines 1-95) several dependency versions are outdated/invalid: update "tessl/pypi-pyright" to version "1.1.407", bump "tessl/pypi-pillow" to a 12.0.0+ release (e.g., "12.0.0"), change "tessl/npm-typescript" to "5.9.3", and update "tessl/npm-vite" to "7.3.0" (or newer 7.3.x); after changing those entries, run your dependency security/audit tooling and tests to verify compatibility and resolve any new conflicts.

Ty Typechecking

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/desktop/studio_server/tool_api.py (1)

641-647: Use KilnTaskServerProperties constructor for consistency.

This endpoint still uses a dict literal while add_kiln_task_tool (lines 609-615) now uses the KilnTaskServerProperties constructor. For consistency and better static type checking, please use the same pattern here.

🔎 Proposed fix for consistency

-        existing_tool_server.properties = {
-            "name": tool_data.name,
-            "description": tool_data.description,
-            "task_id": tool_data.task_id,
-            "run_config_id": tool_data.run_config_id,
-            "is_archived": tool_data.is_archived,
-        }
+        existing_tool_server.properties = KilnTaskServerProperties(
+            name=tool_data.name,
+            description=tool_data.description,
+            task_id=tool_data.task_id,
+            run_config_id=tool_data.run_config_id,
+            is_archived=tool_data.is_archived,
+        )

🧹 Nitpick comments (7)

libs/core/kiln_ai/datamodel/vector_store.py (1)
96-101: Consider using typing.cast() for more explicit type narrowing.

The type: ignore[return-value] comments are valid since the runtime checks guarantee the correct type. However, using cast() would more explicitly document the type assertion and is generally preferred over type ignore comments.
🔎 Proposed refactor using cast()

Add the import at the top of the file:
from typing import TYPE_CHECKING, Literal, Union, cast
Then update each property:
 @property
 def lancedb_vector_properties(self) -> LanceDBConfigVectorProperties:
     if self.properties["store_type"] != VectorStoreType.LANCE_DB_VECTOR:
         raise ValueError(
             f"Lancedb vector properties are only available for LanceDB vector store type. Got {self.properties.get('store_type')}"
         )
-    return self.properties  # type: ignore[return-value]
+    return cast(LanceDBConfigVectorProperties, self.properties)

 @property
 def lancedb_hybrid_properties(self) -> LanceDBConfigHybridProperties:
     if self.properties["store_type"] != VectorStoreType.LANCE_DB_HYBRID:
         raise ValueError(
             f"Lancedb hybrid properties are only available for LanceDB hybrid store type. Got {self.properties.get('store_type')}"
         )
-    return self.properties  # type: ignore[return-value]
+    return cast(LanceDBConfigHybridProperties, self.properties)

 @property
 def lancedb_fts_properties(self) -> LanceDBConfigFTSProperties:
     if self.properties["store_type"] != VectorStoreType.LANCE_DB_FTS:
         raise ValueError(
             f"Lancedb FTS properties are only available for LanceDB FTS store type. Got {self.properties.get('store_type')}"
         )
-    return self.properties  # type: ignore[return-value]
+    return cast(LanceDBConfigFTSProperties, self.properties)
Also applies to: 104-109, 112-117
libs/core/kiln_ai/tools/rag_tools.py (1)
230-230: Consider using Pydantic v2 for parameter validation instead of suppressing type checks.

The type: ignore suppresses a legitimate type-safety check, and TypedDict provides no runtime validation. If the caller omits query, line 231 will raise a KeyError. Per the coding guidelines, Pydantic v2 should be used for data validation, and this would provide both type hints and runtime validation.
🔎 Proposed refactor using Pydantic v2 BaseModel

Replace the RagParams TypedDict definition (lines 61-62):
-class RagParams(TypedDict):
-    query: str
+class RagParams(BaseModel):
+    query: str
Then update line 230 to validate and parse the parameters:
-        kwargs = RagParams(**kwargs)  # type: ignore[missing-typed-dict-key]
-        query = kwargs["query"]
+        params = RagParams(**kwargs)
+        query = params.query
This approach:

Removes the need for type: ignore

Provides runtime validation that query is present and is a string

Aligns with the coding guideline to use Pydantic v2 for data validation

Makes the code consistent with ChunkContext (line 37), which already uses BaseModel
app/desktop/studio_server/finetune_api.py (1)

284-284: LGTM - Type suppressions are safe with runtime guards.

The type: ignore[invalid-argument-type] annotations on the three finetune_registry lookups are appropriate. Each access is protected by a prior existence check (lines 279, 359, 431) that raises an HTTPException if the key is missing, ensuring runtime safety.

Optional: Consider tightening finetune_registry type definitions

If finetune_registry type definitions could be refined to accept the specific key types being used (ModelProviderName, str), these type ignores might become unnecessary. However, this is purely cosmetic and can be deferred.

Also applies to: 363-363, 436-436
app/desktop/studio_server/tool_api.py (1)
507-516: Consider applying TypedDict constructors to all property helper functions.

For consistency with the new pattern in add_kiln_task_tool, consider updating _remote_tool_server_properties and _local_tool_server_properties to use their respective TypedDict constructors (RemoteServerProperties and LocalServerProperties). This would improve static type checking across all tool server types.
🔎 Proposed refactor

For _remote_tool_server_properties:
     def _remote_tool_server_properties(
         tool_data: ExternalToolServerCreationRequest,
     ) -> RemoteServerProperties:
-        # Create the ExternalToolServer with all data for validation
-        return {
-            "server_url": tool_data.server_url,
-            "headers": tool_data.headers,
-            "secret_header_keys": tool_data.secret_header_keys,
-            "is_archived": tool_data.is_archived,
-        }
+        return RemoteServerProperties(
+            server_url=tool_data.server_url,
+            headers=tool_data.headers,
+            secret_header_keys=tool_data.secret_header_keys,
+            is_archived=tool_data.is_archived,
+        )
For _local_tool_server_properties:
     def _local_tool_server_properties(
         tool_data: LocalToolServerCreationRequest,
     ) -> LocalServerProperties:
-        return {
-            "command": tool_data.command,
-            "args": tool_data.args,
-            "env_vars": tool_data.env_vars,
-            "secret_env_var_keys": tool_data.secret_env_var_keys,
-            "is_archived": tool_data.is_archived,
-        }
+        return LocalServerProperties(
+            command=tool_data.command,
+            args=tool_data.args,
+            env_vars=tool_data.env_vars,
+            secret_env_var_keys=tool_data.secret_env_var_keys,
+            is_archived=tool_data.is_archived,
+        )
Also applies to: 567-576
libs/server/kiln_server/document_api.py (2)
360-380: Type narrowing workaround is acceptable, but consider explicit checks.

The type: ignore comments work around the type checker's inability to narrow discriminated unions in match statements. While this is pragmatic, it suppresses all type errors for those attribute accesses, potentially hiding future issues.
💡 Optional: Use explicit isinstance checks for better type safety

Consider using explicit type narrowing to avoid type: ignore comments:
 def get_properties_for_chunker_type(
     self,
 ) -> SemanticChunkerProperties | FixedWindowChunkerProperties:
     properties = self.properties
-    match properties.chunker_type:
-        case ChunkerType.SEMANTIC:
+    if isinstance(properties, SemanticChunkerPropertiesPublic):
-            return SemanticChunkerProperties(
-                chunker_type=ChunkerType.SEMANTIC,
-                embedding_config_id=properties.embedding_config_id,  # type: ignore[possibly-missing-attribute]
-                buffer_size=properties.buffer_size,  # type: ignore[possibly-missing-attribute]
-                breakpoint_percentile_threshold=properties.breakpoint_percentile_threshold,  # type: ignore[possibly-missing-attribute]
+        return SemanticChunkerProperties(
+            chunker_type=ChunkerType.SEMANTIC,
+            embedding_config_id=properties.embedding_config_id,
+            buffer_size=properties.buffer_size,
+            breakpoint_percentile_threshold=properties.breakpoint_percentile_threshold,
-                include_metadata=False,
-                include_prev_next_rel=False,
-            )
-        case ChunkerType.FIXED_WINDOW:
+            include_metadata=False,
+            include_prev_next_rel=False,
+        )
+    elif isinstance(properties, FixedWindowChunkerPropertiesPublic):
-            return FixedWindowChunkerProperties(
-                chunker_type=ChunkerType.FIXED_WINDOW,
-                chunk_size=properties.chunk_size,  # type: ignore[possibly-missing-attribute]
-                chunk_overlap=properties.chunk_overlap,  # type: ignore[possibly-missing-attribute]
+        return FixedWindowChunkerProperties(
+            chunker_type=ChunkerType.FIXED_WINDOW,
+            chunk_size=properties.chunk_size,
+            chunk_overlap=properties.chunk_overlap,
-            )
-    raise_exhaustive_enum_error(properties.chunker_type)
+        )
+    raise_exhaustive_enum_error(properties.chunker_type)
This eliminates the need for type: ignore comments while maintaining type safety.
1633-1642: Type narrowing workaround (same pattern as lines 360-380).

Same discriminated union narrowing issue. The if request.properties.chunker_type == ChunkerType.SEMANTIC check should narrow the type, but the type checker doesn't recognize it. The validation logic is correct.

Consider applying the same explicit isinstance check pattern suggested for get_properties_for_chunker_type if you refactor that method.
libs/core/kiln_ai/tools/built_in_tools/math_tools.py (1)
37-37: Type suppressions added as part of Ty migration.

The # type: ignore[missing-typed-dict-key] comments successfully suppress type checker warnings for the new Ty tooling. Note that the pattern kwargs = AddParams(**kwargs) reassigns kwargs to a TypedDict but provides no runtime validation—TypedDict is just a dict at runtime. The keys are validated by the parameters_schema in the base class, so this is safe.

If you want actual runtime type validation in the future, consider using Pydantic models instead of TypedDict:
Optional refactor to add runtime validation
from pydantic import BaseModel

class AddParams(BaseModel):
    a: Union[int, float]
    b: Union[int, float]

async def run(self, context=None, **kwargs) -> ToolCallResult:
    """Add two numbers and return the result."""
    params = AddParams(**kwargs)  # Pydantic validates at runtime
    return ToolCallResult(output=str(params.a + params.b))
Also applies to: 75-75, 110-110, 151-151

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 757a603 and d0a10c7.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (20)

.github/workflows/build_and_test.yml (1 hunks)
.github/workflows/format_and_lint.yml (2 hunks)
.gitignore (1 hunks)
CONTRIBUTING.md (1 hunks)
app/desktop/studio_server/finetune_api.py (3 hunks)
app/desktop/studio_server/tool_api.py (1 hunks)
checks.sh (2 hunks)
hooks_mcp.yaml (1 hunks)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (2 hunks)
libs/core/kiln_ai/datamodel/basemodel.py (3 hunks)
libs/core/kiln_ai/datamodel/chunk.py (2 hunks)
libs/core/kiln_ai/datamodel/dataset_filters.py (3 hunks)
libs/core/kiln_ai/datamodel/dataset_split.py (1 hunks)
libs/core/kiln_ai/datamodel/vector_store.py (1 hunks)
libs/core/kiln_ai/tools/built_in_tools/math_tools.py (4 hunks)
libs/core/kiln_ai/tools/rag_tools.py (1 hunks)
libs/server/kiln_server/document_api.py (3 hunks)
libs/server/kiln_server/mcp/mcp_server_tool_utils.py (1 hunks)
pyproject.toml (2 hunks)
pyrightconfig.json (0 hunks)

💤 Files with no reviewable changes (1)

pyrightconfig.json

✅ Files skipped from review due to trivial changes (3)

.gitignore
.github/workflows/build_and_test.yml
libs/server/kiln_server/mcp/mcp_server_tool_utils.py

🧰 Additional context used

📓 Path-based instructions (5)

libs/core/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Python 3.10+ for the core library (libs/core) and Python 3.13 for the desktop app

Use Python 3.10+ for the core library (libs/core)

Files:

libs/core/kiln_ai/tools/rag_tools.py
libs/core/kiln_ai/datamodel/dataset_filters.py
libs/core/kiln_ai/datamodel/basemodel.py
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
libs/core/kiln_ai/tools/built_in_tools/math_tools.py
libs/core/kiln_ai/datamodel/chunk.py
libs/core/kiln_ai/datamodel/vector_store.py
libs/core/kiln_ai/datamodel/dataset_split.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation

**/*.py: Use Pydantic v2, not v1
Use asyncio for asynchronous operations in Python

Files:

libs/core/kiln_ai/tools/rag_tools.py
libs/core/kiln_ai/datamodel/dataset_filters.py
libs/core/kiln_ai/datamodel/basemodel.py
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
app/desktop/studio_server/finetune_api.py
libs/core/kiln_ai/tools/built_in_tools/math_tools.py
app/desktop/studio_server/tool_api.py
libs/server/kiln_server/document_api.py
libs/core/kiln_ai/datamodel/chunk.py
libs/core/kiln_ai/datamodel/vector_store.py
libs/core/kiln_ai/datamodel/dataset_split.py

app/desktop/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use Python 3.13 for the desktop application (app/desktop)

Files:

app/desktop/studio_server/finetune_api.py
app/desktop/studio_server/tool_api.py

{libs/server,app/desktop/studio_server}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI for REST server implementation

Files:

app/desktop/studio_server/finetune_api.py
app/desktop/studio_server/tool_api.py
libs/server/kiln_server/document_api.py

libs/server/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use FastAPI for creating REST servers

Files:

libs/server/kiln_server/document_api.py

🧠 Learnings (8)

📓 Common learnings

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

📚 Learning: 2025-09-17T17:36:30.231Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 614
File: libs/server/kiln_server/document_api.py:1413-1414
Timestamp: 2025-09-17T17:36:30.231Z
Learning: When RagConfig schema changes are made in the Kiln AI system, older RagConfig instances created before the schema change are discarded rather than migrated, eliminating the need for fallback logic or backwards compatibility handling in the API responses. This is acceptable because RagConfig features are still in pre-release state.

Applied to files:

libs/core/kiln_ai/tools/rag_tools.py

📚 Learning: 2025-08-08T16:13:26.526Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

Applied to files:

libs/core/kiln_ai/datamodel/basemodel.py

📚 Learning: 2025-11-24T19:13:16.966Z

Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: .cursor/rules/project.mdc:0-0
Timestamp: 2025-11-24T19:13:16.966Z
Learning: Applies to **/*test*.py : Use pytest for testing Python code

Applied to files:

hooks_mcp.yaml

📚 Learning: 2025-11-24T19:13:16.966Z

Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: .cursor/rules/project.mdc:0-0
Timestamp: 2025-11-24T19:13:16.966Z
Learning: Run linting, formatting, and typechecking tools before completing tasks to ensure code meets standards

Applied to files:

checks.sh

📚 Learning: 2025-10-01T21:48:25.854Z

Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 674
File: libs/core/kiln_ai/tools/kiln_task_tool.py:30-34
Timestamp: 2025-10-01T21:48:25.854Z
Learning: In libs/core/kiln_ai/datamodel/external_tool_server.py, KilnTaskServerProperties is a TypedDict, not a Pydantic model. At runtime it's a regular Python dict, so .get() method works correctly and no .model_dump() conversion is needed.

Applied to files:

app/desktop/studio_server/tool_api.py

📚 Learning: 2025-10-27T09:38:25.562Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 753
File: libs/server/kiln_server/document_api.py:279-299
Timestamp: 2025-10-27T09:38:25.562Z
Learning: In the Kiln codebase, leonardmq intentionally overrides (rather than using setdefault) certain internal config fields like include_metadata and include_prev_next_rel in semantic chunker properties, even if the frontend sends them. These fields are too granular for user control and are set to enforce config stability.

Applied to files:

libs/server/kiln_server/document_api.py

📚 Learning: 2025-10-29T11:19:31.059Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 759
File: app/web_ui/src/lib/api_schema.d.ts:2485-2490
Timestamp: 2025-10-29T11:19:31.059Z
Learning: In libs/server/kiln_server/document_api.py and the generated app/web_ui/src/lib/api_schema.d.ts, CreateVectorStoreConfigRequest.properties is now a union of three distinct public schemas: LanceDBConfigFTSPropertiesPublic, LanceDBConfigVectorPropertiesPublic, and LanceDBConfigHybridPropertiesPublic. These are intentionally split to allow independent evolution; do not suggest merging them.

Applied to files:

libs/server/kiln_server/document_api.py
libs/core/kiln_ai/datamodel/vector_store.py

🧬 Code graph analysis (6)

libs/core/kiln_ai/datamodel/dataset_filters.py (1)

libs/core/kiln_ai/datamodel/test_dataset_split.py (1)

task_run (78-93)

libs/core/kiln_ai/datamodel/basemodel.py (1)

libs/core/kiln_ai/datamodel/test_basemodel.py (4)

relationship_name (118-119)

relationship_name (146-147)

parent_type (122-123)

parent_type (150-151)

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (1)

libs/core/kiln_ai/adapters/chat/chat_formatter.py (1)

messages (71-72)

app/desktop/studio_server/finetune_api.py (1)

libs/core/kiln_ai/adapters/fine_tune/test_together_finetune.py (1)

finetune (65-77)

app/desktop/studio_server/tool_api.py (2)

app/web_ui/src/lib/types.ts (1)

KilnTaskServerProperties (95-96)

libs/core/kiln_ai/datamodel/external_tool_server.py (1)

KilnTaskServerProperties (43-48)

libs/server/kiln_server/document_api.py (2)

libs/core/kiln_ai/datamodel/chunk.py (2)

ChunkerType (35-37)

FixedWindowChunkerProperties (49-52)

app/web_ui/src/lib/types.ts (1)

ChunkerType (62-62)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Cursor Bugbot
GitHub Check: Generate Coverage Report
GitHub Check: Build Desktop Apps (macos-15-intel)
GitHub Check: Build Desktop Apps (ubuntu-22.04)
GitHub Check: Build Desktop Apps (macos-latest)
GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
GitHub Check: Build Desktop Apps (windows-latest)
GitHub Check: Build, Typecheck, and Test Python (3.11)
GitHub Check: Count Tests
GitHub Check: Build, Typecheck, and Test Python (3.10)
GitHub Check: Generate Coverage Report

🔇 Additional comments (17)

libs/core/kiln_ai/datamodel/dataset_split.py (1)

234-234: LGTM! Improved explicitness and type-checking compatibility.

The ternary operator makes the None-check explicit and improves readability. This is more type-checker friendly and aligns well with the explicit None-checking pattern used in the subsequent lines (235-238).

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (2)

138-140: LGTM! Type annotation correctly broadened to match actual data.

The return type change from dict[str, str | None] to dict[str, Any] is correct because the function returns dictionaries containing non-string values like "tool_calls" (line 162) and "tool_call_id" (line 171). This aligns with how the function is used throughout the file (lines 236, 272, 285).

187-187: LGTM! Local variable type correctly updated for consistency.

The type annotation properly reflects the updated return type of generate_chat_message_list.

libs/core/kiln_ai/datamodel/dataset_filters.py (2)

3-3: LGTM! Type annotation improves type safety.

The addition of the Dict import and explicit type annotation for static_dataset_filters enhances type safety and makes the code more self-documenting. The annotation correctly describes the mapping structure.

Also applies to: 131-136

22-23: LGTM! Parameter naming improves consistency.

Changing the parameter name from _ to task_run makes the function signature consistent with the DatasetFilter protocol definition (line 17) and improves code readability.

libs/core/kiln_ai/datamodel/chunk.py (1)

156-156: LGTM! Type suppressions are well-justified.

The type: ignore[return-value] annotations are appropriate given Python's type system limitations with TypedDict narrowing. The accompanying comments clearly explain the rationale, runtime checks ensure correctness, and the approach respects the project's linting constraints.

Also applies to: 168-168

libs/core/kiln_ai/datamodel/basemodel.py (1)

329-329: LGTM - Type suppressions are justified for metaprogramming.

The four type: ignore annotations are appropriate for these dynamic code paths:

Line 329 (mutable_copy): Pydantic's model_copy(deep=True) returns the same type, but the type checker can't infer this through the generic Self return type.

Lines 694, 698 (_create_child_method): Dynamic method generation using generic child_class parameters requires runtime type construction that the type system cannot validate statically.

Line 790 (_validate_nested): Recursive call with **kwargs unpacking in a complex inheritance hierarchy where the type checker cannot validate the dynamic arguments.

All of these are standard patterns when working with Pydantic models and Python metaprogramming.

Also applies to: 694-694, 698-698, 790-790

libs/server/kiln_server/document_api.py (1)

2149-2157: LGTM! Improved type safety with TypedDict.

The change from a plain dictionary to a properly typed FixedWindowChunkerProperties TypedDict improves type safety. The values passed are correct:

chunk_size is guaranteed to be a PositiveInt (checked at line 2144)

chunk_overlap defaults to 0, which satisfies NonNegativeInt

hooks_mcp.yaml (1)

25-27: LGTM! Typecheck command updated to use Ty.

The migration from uv run pyright . to uvx ty check aligns with the broader tooling migration across the codebase.

CONTRIBUTING.md (1)

77-77: LGTM! Documentation updated for Ty tooling.

The IDE extension recommendation now correctly reflects the migration from Pyright to Ty.

pyproject.toml (2)

43-47: LGTM! Ty configuration is well-structured.

The Ty source configuration appropriately excludes test files, build directories, and the web UI. The invalid-key = "ignore" rule is reasonable during the migration period.

54-57: LGTM! Enhanced linting rules improve code quality.

The additional Ruff rules (import sorting, unused imports, RUF rules, FastAPI linting, and tidy imports) and the ban on typing.cast are excellent additions that enforce better practices.

checks.sh (3)

34-41: LGTM! Type checking and linting steps are well-structured.

The separation of Ruff checks into distinct steps and the addition of the Ty type checking step improve clarity and align with the tooling migration.

43-50: LGTM! Improved misspelling check feedback.

The addition of a success message after the misspelling check provides better user feedback when the check passes.

52-55: The script reference is valid. app/web_ui/src/lib/check_schema.sh exists and is executable.

Likely an incorrect or invalid review comment.

.github/workflows/format_and_lint.yml (2)

38-39: LGTM! Proper dependency installation.

Changing to uv sync --all-extras --dev correctly installs all project dependencies including the new ty type checker, rather than just installing ruff separately.

49-51: LGTM! Type checking added to CI pipeline.

The new "Typecheck with ty" step ensures type checking runs in CI, aligning with the local development workflow in checks.sh.

coderabbitai · 2025-12-19T17:40:18Z

app/desktop/studio_server/tool_api.py

+            properties=KilnTaskServerProperties(
+                name=tool_data.name,
+                description=tool_data.description,
+                task_id=tool_data.task_id,
+                run_config_id=tool_data.run_config_id,
+                is_archived=tool_data.is_archived,
+            ),


🛠️ Refactor suggestion | 🟠 Major

Type safety improvement, but inconsistent with edit endpoint.

The use of KilnTaskServerProperties constructor is excellent for static type checking. However, the corresponding edit_kiln_task_tool endpoint (lines 641-647) still uses a dict literal, creating an inconsistency.

Please update edit_kiln_task_tool to use the same pattern for consistency.

🤖 Prompt for AI Agents

In app/desktop/studio_server/tool_api.py around lines 609 and 641-647, the create path uses KilnTaskServerProperties but the edit_kiln_task_tool endpoint still constructs properties with a dict literal; change edit_kiln_task_tool to construct and pass a KilnTaskServerProperties(...) instance with the same fields (name, description, task_id, run_config_id, is_archived) instead of the dict to keep typing consistent and avoid mixed patterns.

coderabbitai · 2025-12-19T17:40:18Z

pyproject.toml

    "ruff>=0.13.0",
    "watchfiles>=1.1.0",
    "scalar-fastapi>=1.4.3",
+    "ty>=0.0.2",


⚠️ Potential issue | 🟡 Minor

Update version constraint for ty—currently in beta with breaking changes expected.

ty is in Beta release and Astral recommends it for motivated users for production use, but you shouldn't use ty in production yet because it's a preview release with missing features and known issues. The current stable version on PyPI is 0.0.4, and a stable release is eyed for 2026. Using ty>=0.0.2 allows all 0.0.x versions, which during beta development can introduce breaking changes between minor versions. Consider pinning to a specific version (e.g., ty==0.0.4) or at minimum constraining to a narrower range to avoid unexpected CI/CD failures during the development phase.

🤖 Prompt for AI Agents

In pyproject.toml around line 26, the dependency "ty>=0.0.2" is too permissive for a beta package and can pull breaking 0.0.x releases; change the constraint to a stable pinned or narrowed range such as "ty==0.0.4" or "ty>=0.0.4,<0.1.0" to prevent unexpected breaking upgrades, update the lockfile if present, and rerun dependency installs/CI to validate the change.

* adding glm 4.7 * addressing CR feedback * lock typecheck version for now * siliconflow has a Pro prefix for the endpoint for GLM 4.7 --------- Co-authored-by: scosman <[email protected]>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI Agents

In @libs/core/kiln_ai/adapters/ml_model_list.py:
- Around line 1538-1567: The KilnModelProvider entry for model_id
"gemini-3-flash-preview" is missing the gemini_reasoning_enabled flag despite
having reasoning_capable=True and thinking_level="medium"; update that
KilnModelProvider instance to include gemini_reasoning_enabled=True (place it
after multimodal_mime_types or alongside the other capability flags) to match
the pattern used by other Gemini providers.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0a10c7 and 4adf15a.

📒 Files selected for processing (3)

.github/workflows/build_and_test.yml
.github/workflows/format_and_lint.yml
libs/core/kiln_ai/adapters/ml_model_list.py

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/build_and_test.yml
.github/workflows/format_and_lint.yml

🧰 Additional context used

📓 Path-based instructions (2)

libs/core/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Python 3.10+ for the core library (libs/core) and Python 3.13 for the desktop app

Use Python 3.10+ for the core library (libs/core)

Files:

libs/core/kiln_ai/adapters/ml_model_list.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation

**/*.py: Use Pydantic v2, not v1
Use asyncio for asynchronous operations in Python

Files:

libs/core/kiln_ai/adapters/ml_model_list.py

🧠 Learnings (9)

📓 Common learnings

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

📚 Learning: 2025-08-08T15:50:45.334Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-08-08T16:13:26.526Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

Applied to files:

libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-08-08T16:19:20.074Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1736-1741
Timestamp: 2025-08-08T16:19:20.074Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwq_32b with ModelProviderName.siliconflow_cn (model_id "Qwen/QwQ-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via tests/responses that SiliconFlow returns outputs that parse correctly without an R1 parser. Parser usage remains provider-specific and should only be added when validated.

Applied to files:

libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-10-28T05:25:09.283Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 757
File: libs/core/kiln_ai/adapters/reranker_list.py:39-42
Timestamp: 2025-10-28T05:25:09.283Z
Learning: In the Kiln codebase (Python), model family and model name fields in reranker and similar model definition classes (like KilnRerankerModel in libs/core/kiln_ai/adapters/reranker_list.py) should use plain str types rather than Enum types. This is because users receive over-the-air (OTA) updates for new models, and using Enums would break when clients with older builds encounter new model families their Enum definitions don't include. Strings provide forward compatibility for the OTA update mechanism.

Applied to files:

libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-09-10T08:06:53.717Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 585
File: libs/core/kiln_ai/adapters/ml_embedding_model_list.py:32-33
Timestamp: 2025-09-10T08:06:53.717Z
Learning: In the Kiln AI codebase, friendly_name fields in embedding model configurations should maintain the original model naming conventions from providers (e.g., "gemini-embedding-001", "text-embedding-3-small") for human readability, rather than following snake_case conventions like the enum values.

Applied to files:

libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-08-08T16:15:20.796Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:2434-2439
Timestamp: 2025-08-08T16:15:20.796Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwen_3_8b_no_thinking with ModelProviderName.siliconflow_cn (model_id "Qwen/Qwen3-8B"), the qwen3_style_no_think formatter is not required: SiliconFlow’s non‑thinking Qwen3‑8B returns clean output without <answer> wrappers. Formatter/parser usage is provider-specific and should be added only when verified by tests/responses.

Applied to files:

libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-08-08T16:14:54.346Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1979-1983
Timestamp: 2025-08-08T16:14:54.346Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py, for ModelName.deepseek_r1_distill_qwen_32b on provider siliconflow_cn (model_id "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via testing that SiliconFlow returns final outputs that parse without an R1 parser, and reasoning may be omitted under structured output; rely on reasoning_optional_for_structured_output=True.

Applied to files:

libs/core/kiln_ai/adapters/ml_model_list.py

📚 Learning: 2025-08-22T11:15:53.584Z

Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/adapters/embedding/embedding_registry.py:14-20
Timestamp: 2025-08-22T11:15:53.584Z
Learning: In the Kiln AI codebase, model_provider_name fields in datamodels like EmbeddingConfig are typed as ModelProviderName enum (which inherits from str, Enum), not as plain strings. Therefore, accessing .value on these fields is valid and returns the string representation of the enum value.

Applied to files:

libs/core/kiln_ai/adapters/ml_model_list.py

🧬 Code graph analysis (1)

libs/core/kiln_ai/adapters/ml_model_list.py (2)

libs/core/kiln_ai/datamodel/datamodel_enums.py (3)

ModelProviderName (84-106)

StructuredOutputMode (27-49)

KilnMimeType (109-133)

libs/core/kiln_ai/datamodel/extraction.py (1)

friendly_name (289-291)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Generate Coverage Report
GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
GitHub Check: Build Desktop Apps (ubuntu-22.04)
GitHub Check: Build Desktop Apps (macos-15-intel)
GitHub Check: Build Desktop Apps (macos-latest)
GitHub Check: Generate Coverage Report
GitHub Check: Build Desktop Apps (windows-latest)
GitHub Check: Build, Typecheck, and Test Python (3.12)
GitHub Check: Build, Typecheck, and Test Python (3.10)
GitHub Check: Count Tests

🔇 Additional comments (4)

libs/core/kiln_ai/adapters/ml_model_list.py (4)

130-130: LGTM: Enum additions follow naming conventions.

The new model name enum entries follow the established snake_case pattern and are consistent with existing model naming conventions.

Also applies to: 132-132, 198-198

1497-1507: LGTM: Vertex provider addition is well-configured.

The Vertex provider configuration for Gemini 3 Pro Preview is consistent with the existing OpenRouter and Gemini API providers, including appropriate reasoning flags and thinking level settings.

1972-1992: LGTM: Nemotron 3 Nano model is well-configured.

The new Nemotron 3 Nano model follows established patterns:

Correctly uses ModelFamily.llama (consistent with Nemotron 70B)

Has reasoning capabilities properly configured

No parser specified (appropriate per learnings - parsers should only be added when validated through testing)

5155-5175: LGTM: GLM 4.7 model configuration is consistent.

The new GLM 4.7 model is properly configured:

Consistent with existing GLM model patterns (e.g., GLM 4.6)

Appropriate use of reasoning_optional_for_structured_output flag for SiliconFlow provider

No parser specified (correct per learnings)

Friendly name follows established conventions

coderabbitai · 2026-01-06T17:02:18Z

libs/core/kiln_ai/adapters/ml_model_list.py

+            KilnModelProvider(
+                name=ModelProviderName.gemini_api,
+                model_id="gemini-3-flash-preview",
+                structured_output_mode=StructuredOutputMode.json_schema,
+                suggested_for_data_gen=True,
+                suggested_for_evals=True,
+                supports_doc_extraction=True,
+                multimodal_capable=True,
+                supports_vision=True,
+                multimodal_mime_types=[
+                    # documents
+                    KilnMimeType.PDF,
+                    KilnMimeType.CSV,
+                    KilnMimeType.TXT,
+                    KilnMimeType.HTML,
+                    KilnMimeType.MD,
+                    # images
+                    KilnMimeType.JPG,
+                    KilnMimeType.PNG,
+                    # audio
+                    KilnMimeType.MP3,
+                    KilnMimeType.WAV,
+                    KilnMimeType.OGG,
+                    # video
+                    KilnMimeType.MP4,
+                    KilnMimeType.MOV,
+                ],
+                reasoning_capable=True,
+                thinking_level="medium",
+            ),


⚠️ Potential issue | 🟠 Major

Missing gemini_reasoning_enabled flag in Gemini API provider.

The Gemini 3 Flash Gemini API provider has reasoning_capable=True (line 1565) and thinking_level="medium" (line 1566), but is missing the gemini_reasoning_enabled=True flag. This is inconsistent with:

The OpenRouter provider for this same model (line 1536 has gemini_reasoning_enabled=True)

The Vertex provider for this same model (line 1575 has gemini_reasoning_enabled=True)

The general pattern across other Gemini models where reasoning_capable=True is paired with gemini_reasoning_enabled=True (e.g., Gemini 2.5 Pro, Gemini 3 Pro Preview)

🔎 Proposed fix

Add the missing flag after multimodal_mime_types:

KilnMimeType.MOV, ], reasoning_capable=True, + gemini_reasoning_enabled=True, thinking_level="medium", ),

🤖 Prompt for AI Agents

In @libs/core/kiln_ai/adapters/ml_model_list.py around lines 1538 - 1567, The KilnModelProvider entry for model_id "gemini-3-flash-preview" is missing the gemini_reasoning_enabled flag despite having reasoning_capable=True and thinking_level="medium"; update that KilnModelProvider instance to include gemini_reasoning_enabled=True (place it after multimodal_mime_types or alongside the other capability flags) to match the pattern used by other Gemini providers.

chiang-daniel and others added 30 commits December 1, 2025 23:44

update eval API to include finetune run-configs

9655b33

add test

093e0f8

add more coverage

8314490

handle parsers for finetune

c5c1be4

remove stale API

bad35fe

look up model based on model_id instead of name

b2171c5

coder rabbit

42d03a9

Fewer builds: we don't need to be running builds on every push.

147d6f6

added add subtopics button to SDG

faa180d

Fix linting errors: remove unused imports

a14ebda

fix: api key values incorrectly replaced with [hidden]

1a413a1

Merge pull request #896 from Kiln-AI/leonard/kil-328-bug-custom-provi…

8a9883d

…der-gets-wiped-after-app-restart

Merge pull request #868 from Kiln-AI/dchiang/eval-finetune-tools

1f1c333

Eval with new fine-tune data model

update deps

599a900

Fixes https://linear.app/kiln-ai/issue/KIL-206/remove-task-descriptio…

5c206a6

…n-in-new-task. People are confusing description and prompt. Don't need it.

Merge pull request #897 from Kiln-AI/scosman/hide_task_description_on…

f8ffdb0

…_create Remove Task Description field from create flow

Add Leonard's suggestion to allow manual runs too.

13d27b7

tessl support

2f57a6e

Merge pull request #875 from Kiln-AI/scosman/fewer_builds

d6d4d6b

Fewer builds: we don't need to be running builds on every push.

ui dependencies

8f1c31f

Merge branch 'main' into sfierro/KIL-312

7e1002d

coderabbit feedback

a022af9

Merge pull request #898 from Kiln-AI/sfierro/tessl

9787340

Add tessl to Kiln repo

mcp hooks

e2f8ebf

Merge pull request #899 from Kiln-AI/sfierro/mcp-hooks

de9855d

Add hooks-mcp to .cursor/mcp.json

Fix: don't save test files into root. Use tmp

1e6cc3f

Merge pull request #904 from Kiln-AI/scosman/test_zip_file

c96a491

Fix: don't save test files into root. Use tmp

ty typecheck WIP

028dc02

ty typecheck WIP

2cab746

ty typecheck WIP

fd39571

scosman and others added 18 commits December 17, 2025 15:47

Fix reactivity issue on compare view

cee1f34

Mike wants a chart

a44b9d8

Mike wants a chart part 2

0c6fd30

better subtitles

33e571c

don't fill em

f24f205

highlight on hover

7944b7e

refactor: custom prompt builder injection via constructor

2fa972a

Fixes https://linear.app/kiln-ai/issue/KIL-340/hide-chart-if-were-mis…

4a73a76

…sing-data

Don't show radar chart unless we have 3 points to show

91fc5ea

Merge pull request #909 from Kiln-AI/scosman/mike_wants_a_chart

229b888

Mike Wants a Chart

Merge branch 'main' into scosman/ty2

378d323

Cleaner value lookup, which fixes type checking. Also cleaner checks.sh

f0fafc5

Fix type error

5e43ea6

Fix CI, ty requires sync

a6d198f

coderabbit feedback

9d808b3

suggest global instead of us-central1 in connect providers (#916)

03ff585

Merge pull request #906 from Kiln-AI/leonard/override-prompt

d1d3868

add gemini flash + nemotron 3 to ml_model_list (#915)

757a603

* add gemini flash + nemotron 3 * adding gemini 3 flash and pro to vertex. and adding nemotron 3 to ollama * adding alias for nemotron 3

tawnymanticore requested review from leonardmq and scosman December 19, 2025 17:05

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

Merge pull request #914 from Kiln-AI/scosman/ty2

d0a10c7

Ty Typechecking

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

scosman approved these changes Dec 19, 2025

View reviewed changes

Adding support for GLM 4.7 (#923)

4adf15a

* adding glm 4.7 * addressing CR feedback * lock typecheck version for now * siliconflow has a Pro prefix for the endpoint for GLM 4.7 --------- Co-authored-by: scosman <[email protected]>

coderabbitai bot reviewed Jan 6, 2026

View reviewed changes

leonardmq approved these changes Jan 6, 2026

View reviewed changes

tawnymanticore merged commit ae1512c into remote_config Jan 6, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update ML Model List #921

Update ML Model List #921

Uh oh!

tawnymanticore commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 19, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 19, 2025 •

edited

Loading

app/desktop/studio_server/eval_api.py

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 19, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 19, 2025

Uh oh!

coderabbitai bot Dec 19, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Update ML Model List #921

Update ML Model List #921

Uh oh!

Conversation

tawnymanticore commented Dec 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Coverage Report

Diff: origin/remote_config...HEAD

Summary

Line-by-line

app/desktop/studio_server/eval_api.py

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tawnymanticore commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 19, 2025 •

edited

Loading

github-actions bot commented Dec 19, 2025 •

edited

Loading