Skip to content

Conversation

@leonardmq
Copy link
Collaborator

@leonardmq leonardmq commented Jan 11, 2026

What does this PR do?

Bring #926 into remote_config.

Summary by CodeRabbit

  • New Features

    • Added radar chart comparison for evaluation metrics
    • Added ability to generate subtopics for leaf topics in data generation
    • Extended evaluation support for finetuned models with dedicated run configurations
    • New AI models available including Mistral, Gemini 3 Flash, GLM, and ByteDance variants
  • Bug Fixes

    • Task description field now hidden during task creation
    • Configuration settings prevent unintended data mutations
  • Documentation

    • Updated README with new product positioning
    • Added Agent Rules documentation
  • Chores

    • Switched Python type checking from Pyright to Ty
    • Added MCP server configurations
    • Migrated linting and formatting tools to use uv

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 11, 2026

Walkthrough

This PR introduces finetune run configuration integration into the evaluation API, adds new comparison visualization components (radar chart), implements nested topic generation for data synthesis, migrates type-checking from Pyright to Ty, expands the model catalog with additional providers, and enhances UI/UX with improved state management across multiple pages.

Changes

Cohort / File(s) Summary
Configuration & Type-Checking Migration
.cursor/mcp.json, .cursor/rules/.gitignore, .gitignore, pyproject.toml, pyrightconfig.json, .github/workflows/format_and_lint.yml, .github/workflows/build_and_test.yml, hooks_mcp.yaml, CONTRIBUTING.md
Transition from Pyright to Ty for Python type-checking. Updates MCP server configurations, adds tessl and HooksMCP entries. Removes pyrightconfig.json and updates CI workflows to use uvx ty check instead of pyright. Updates dev dependencies and ruff linting configuration.
Tessl Integration
.cursor/mcp.json, .tessl/.gitignore, tessl.json, AGENTS.md
Introduces tessl project manifest with dependencies tracking for PyPI and npm packages. Adds tessl-managed agent rules referencing .tessl/RULES.md. Creates .tessl/.gitignore to exclude tiles/ and RULES.md.
Finetune Run Config Support
libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py, libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py, app/desktop/studio_server/eval_api.py, app/desktop/studio_server/test_eval_api.py
Adds helpers to construct and resolve finetune-based run config IDs using format finetune_run_config::project::task::finetune. Enhances eval_api to aggregate finetune run configs alongside standard configs. Updates dataset formatter typing for message content. Expands test coverage for finetune-derived run configs.
Model Catalog Expansion
libs/core/kiln_ai/adapters/ml_model_list.py
Adds 11 new ModelName enum members (Mistral 3 variants, Minimax M2.1, Gemini 3 Flash, Nemotron, GLM 4.7, ByteDance seeds). Introduces corresponding KilnModel entries with multi-provider configurations (OpenRouter, Gemini API, Fireworks, etc.). Updates existing model entries and friendly names.
Prompt Builder & Provider Tools
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py, libs/core/kiln_ai/adapters/provider_tools.py, libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py, libs/core/kiln_ai/adapters/test_provider_tools.py
Enables custom prompt builder injection via AdapterConfig.prompt_builder. Adds helpers to resolve built-in providers by model ID and determine parser preference from finetune base model. Adds comprehensive test coverage for new resolution logic.
Comparison UI: Radar Chart & No-Data State
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte, app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte, app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte, app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
Introduces new radar chart component for metric correlation visualization with multi-line legend formatting. Adds no-data placeholder component. Refactors comparison page with reactive state (validSelectedModels, allSelectedLoading, anyLoadedData). Updates scatter chart with improved legend interactions and data visualization. Removes legacy task_run_configs API reference and updates to /run_configs endpoint.
Generate Page: Nested Topics Feature
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte, app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
Adds "Add Subtopics to Leaf Topics" dialog with nested topic generation. Implements add_nested_topics_to_all_leaf_topics function to generate subtopics for all leaf nodes via /generate_categories endpoint. Separates run config components into modal and nested contexts. Conditional UI rendering in synth page based on whether topics exist.
Task & Setup UI Refinements
app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte, app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
Hides Task Description field during task creation (non-editing flows). Updates Google Vertex AI location guidance from 'us-central1' to 'global'.
API Schema & Tool Server Updates
app/web_ui/src/lib/api_schema.d.ts, app/desktop/studio_server/tool_api.py, app/desktop/studio_server/finetune_api.py
Removes legacy get_task_run_configs API path and operation from OpenAPI schema. Replaces dictionary-based properties with typed KilnTaskServerProperties in add_kiln_task_tool. Adds type: ignore comments in finetune registry lookups.
Documentation & Branding
README.md, checks.sh
Updates README title to "Build, Evaluate, and Optimize AI Systems". Adds "Measure, Improve, Repeat" marketing section. Adds Agents link to All Docs. Updates checks.sh to use uvx-based ruff and ty commands with improved reporting.
Type Annotation & Internal Improvements
libs/core/kiln_ai/datamodel/basemodel.py, libs/core/kiln_ai/datamodel/chunk.py, libs/core/kiln_ai/datamodel/dataset_filters.py, libs/core/kiln_ai/datamodel/dataset_split.py, libs/core/kiln_ai/datamodel/vector_store.py, libs/core/kiln_ai/tools/built_in_tools/math_tools.py, libs/core/kiln_ai/tools/rag_tools.py, libs/server/kiln_server/document_api.py, libs/server/kiln_server/mcp/mcp_server_tool_utils.py, libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py, libs/core/kiln_ai/adapters/test_prompt_builders.py, libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
Adds type: ignore annotations throughout to suppress type-checking warnings without behavior changes. Refactors chunker properties to use FixedWindowChunkerProperties in ephemeral document splitting. Adds safety guard in dataset_split for None output. Updates parameter names and type annotations for clarity.
Config & Utilities
libs/core/kiln_ai/utils/config.py, libs/core/kiln_ai/utils/test_config.py
Implements deep-copy for non-sensitive settings to prevent mutation of underlying data when hiding sensitive values. Adds test validating provider configs remain unchanged after hide_sensitive operation.
CLI Testing
libs/core/kiln_ai/cli/commands/test_package_project.py
Updates test_package_project tests to use tmp_path fixture instead of fixed output path ./test_output.zip for better test isolation.
Workflow Triggers
.github/workflows/build_desktop.yml
Adds workflow_dispatch, release (on created), and push (to main branch) triggers for desktop build workflow.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Web Client
    participant Page as Compare Page
    participant Store as Data Store
    participant API as Eval API
    participant DB as Database

    Client->>Page: Select models for comparison
    Page->>Store: Update validSelectedModels state
    activate Page
    Page->>API: GET /api/projects/{id}/tasks/{id}/run_configs/
    API->>DB: Query task.run_configs()
    API->>DB: Query completed finetunes with run_configs
    API-->>Page: Return aggregated TaskRunConfig[]
    Page->>Page: Build radar chart data (comparisonFeatures)
    Page-->>Client: Render CompareRadarChart with selected models
    deactivate Page

    Client->>Page: Hover legend item
    Page->>Page: Highlight corresponding series
    Page-->>Client: Visual feedback
Loading
sequenceDiagram
    participant User as User
    participant UI as Generate Page
    participant Modal as Add Nested Modal
    participant API as Server API
    participant DB as Database

    User->>UI: Click "Add Subtopics"
    UI->>Modal: Open nested topics dialog
    User->>Modal: Set subtopics per leaf, configure run config
    User->>Modal: Submit "Generate Subtopics"
    Modal->>Modal: Validate run config and guidance
    Modal->>Modal: Extract all leaf topics
    loop For each leaf topic
        Modal->>API: POST /api/projects/{id}/tasks/{id}/generate_categories
        API->>API: Generate subtopic candidates
        API-->>Modal: Return new categories
        Modal->>DB: Save sub_topics to leaf node
    end
    Modal->>UI: Close dialog, trigger save
    UI-->>User: Display updated topic tree
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • PR #914: Implements the same Pyright-to-Ty type-checker migration across CI workflows, configuration files, and dependencies — directly aligned with this PR's toolchain changes.
  • PR #868: Introduces overlapping finetune run-config handling in eval_api.py and finetune ID helpers, alongside provider/tools parser improvements and related test updates.
  • PR #909: Adds radar comparison chart and related compare-page UI enhancements to the same evals comparison route components.

Suggested labels

codex

Suggested reviewers

  • tawnymanticore
  • scosman
  • chiang-daniel

Poem

🐰 From type-checks swift to charts so bright,
A rabbit's work—the UI's delight!
Finetuned runs now in eval's fold,
Nested topics, stories retold.
With Ty's keen eye and radar's view,
Build better AI—the codex grew! 🌟

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.24% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is minimal and incomplete. It only states the intent to bring PR #926 into remote_config without explaining what changes are included or their scope. Expand the description to clarify the scope of changes beyond PR #926, including finetune run-config support, new models, UI improvements, and infrastructure updates.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'remote_config: new models' is partially related to the changeset. It accurately describes one significant aspect (new model additions to ml_model_list.py), but substantially understates the scope of changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @leonardmq, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the platform's capabilities by integrating a wide array of new AI models, enhancing the evaluation and comparison tools with new visualization options, and improving the data generation process with hierarchical topic creation. It also includes a foundational update to the Python type checking system, ensuring better code quality and maintainability.

Highlights

  • New AI Model Integrations: Several new AI models have been added across various providers, including Mistral (e.g., Mistral Small Creative, Ministral 3 variants, Mistral Large 3 2512), Gemini (Gemini 3 Flash), Nemotron (Nemotron 3 Nano), GLM (GLM 4.7), Minimax (Minimax M2.1), and Bytedance (Seed 1.6, Seed 1.6 Flash).
  • Enhanced Evaluation Run Config Handling: The API for retrieving and managing evaluation run configurations has been refactored to seamlessly include completed fine-tuned models. A new utility module, finetune_run_config_id.py, was introduced to standardize the identification of these fine-tune specific run configurations.
  • Improved UI for Model Comparison: The evaluation comparison page in the web UI now features a new radar chart for visualizing model performance across multiple metrics. The existing scatter plot also received updates, including enhanced legend formatting for better readability and interactive highlighting on hover.
  • Python Type Checking Migration: The project's Python type checking tool has been migrated from pyright to ty. This change is reflected in pyproject.toml, checks.sh, and hooks_mcp.yaml, along with necessary type annotation adjustments across the codebase.
  • Nested Synthetic Data Generation: A new feature has been added to the synthetic data generation UI, allowing users to automatically add subtopics to all existing 'leaf' topics (topics without subtopics). This enables more hierarchical and structured data generation workflows.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (3)
    • .github/workflows/build_and_test.yml
    • .github/workflows/build_desktop.yml
    • .github/workflows/format_and_lint.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request brings a substantial set of features and improvements, primarily focused on integrating fine-tuned models as run configurations, which is a significant enhancement. The changes are wide-ranging, including backend API updates to support this, extensive UI improvements on the model comparison page with new charts, and the addition of many new models. I'm particularly impressed with the thoughtful refactoring in the frontend to use reactive statements and the critical bug fix in config.py to prevent state mutation. The switch to ty for type checking is a notable tooling change. Overall, this is a high-quality contribution with well-implemented features. I have one suggestion for a minor refactoring to reduce code duplication.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/desktop/studio_server/eval_api.py (1)

91-110: Validate finetune run-config IDs match the route project/task (prevents cross-task access).

Currently, a crafted finetune_run_config::other_project::other_task::... can be resolved even when hitting /api/projects/{project_id}/tasks/{task_id}/..., and the response rewraps it under the current task. That’s at least confusing, and can become a data-leak footgun.

Proposed diff (route param validation + completion check)
 def task_run_config_from_id(
     project_id: str, task_id: str, run_config_id: str
 ) -> TaskRunConfig:
@@
     # special case for finetune run configs, it's inside the finetune model
     if run_config_id.startswith("finetune_run_config::"):
+        # Ensure the encoded IDs match the route params
+        try:
+            encoded_project_id, encoded_task_id, _ = run_config_id.removeprefix(
+                "finetune_run_config::"
+            ).split("::", 2)
+        except ValueError:
+            raise HTTPException(
+                status_code=404,
+                detail=f"Task run config not found. ID: {run_config_id}",
+            )
+        if encoded_project_id != project_id or encoded_task_id != task_id:
+            raise HTTPException(
+                status_code=404,
+                detail=f"Task run config not found. ID: {run_config_id}",
+            )
+
         finetune = finetune_from_finetune_run_config_id(run_config_id)
-        if finetune.run_config is not None:
+        if finetune.run_config is not None and finetune.fine_tune_model_id is not None:
             return TaskRunConfig(
                 id=finetune_run_config_id(project_id, task_id, str(finetune.id)),
                 name=finetune.name,
                 description=finetune.description,
                 run_config_properties=finetune.run_config,
                 parent=task,  # special case, we need to reference the task model
             )
🤖 Fix all issues with AI agents
In @app/desktop/studio_server/eval_api.py:
- Around line 66-89: The get_all_run_configs function currently appends
finetune-derived TaskRunConfig objects to the list returned by
task.run_configs(), which may mutate an internal list; instead, make a new list
copy (e.g., copy or list(...) of task.run_configs()) into the local variable
configs before appending finetune entries so you never modify the original. Keep
the rest of the logic (filtering finetunes by run_config and fine_tune_model_id
and constructing TaskRunConfig with finetune_run_config_id, name, description,
run_config_properties, parent=task) unchanged.

In
@app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte:
- Around line 73-93: The legend/series keys currently use
getRunConfigDisplayName(config) which can collide when two run_configs share the
same display name; change series/legend identity to use the unique config.id
instead and build the human-readable multi-line label via the
formatter/legendFormatter mapping so ECharts keys are unique while labels remain
formatted (update buildLegendFormatter, any places using getRunConfigDisplayName
as the series key, and where legendFormatter is consumed to map config.id ->
`${display}\n{sub|Model: ...}\n{sub|Prompt: ...}`).
- Around line 288-302: The current legend hover handlers use an invalid ECharts
target "legendItem"; update the listeners on chartInstance to listen for generic
"mouseover" and "mouseout" events and inside the handler check
params.componentType === "legend" before calling chartInstance.dispatchAction
with type "highlight" / "downplay" and seriesName params.name; adjust the
handlers around chartInstance.on(...) and the dispatchAction(...) calls (refer
to chartInstance, params.componentType, params.name, and dispatchAction) so
legend hovers correctly trigger highlight/downplay.

In
@app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte:
- Around line 50-71: The legend and series currently use the human display name
as the key which can collide; change the series/legend keys to use the unique
config id instead. Keep getRunConfigDisplayName(config) for the human label, but
modify buildLegendFormatter() to produce a map keyed by config.id (i.e., {
[configId]: formattedLegendText }) and update any code that constructs radar
series entries to use config.id as the series name/legend key while using
getRunConfigDisplayName(config) only for visible text; apply the same change to
the other similar blocks referenced (lines ~73-131 and ~154-215) so all
legend/series keys are configId-based.

In @libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py:
- Around line 629-659: The test calls build_chat_formatter with a keyword arg
named input which may not match the actual method signature; change the call to
use a positional argument instead (e.g., call build_chat_formatter("test input")
) so it matches the MockAdapter.build_chat_formatter signature and avoids a
TypeError, and keep the rest of the assertions (formatter.system_message and
adapter.prompt_builder) unchanged.

In @libs/core/kiln_ai/adapters/provider_tools.py:
- Around line 117-136: The helper built_in_provider_from_model_id can raise when
a model's providers is None; update the function to guard against missing/None
providers on each model (e.g., skip models with falsy or absent model.providers)
before iterating so the inner loop only runs when model.providers is a populated
iterable; keep the rest of the logic that compares provider.name to
provider_enum and provider.model_id to model_id unchanged (references:
built_in_provider_from_model_id, built_in_models, model.providers,
provider.name, provider.model_id, ModelProviderName).

In @libs/core/kiln_ai/tools/built_in_tools/math_tools.py:
- Around line 35-40: The runtime failure is caused by attempting to instantiate
TypedDicts (e.g., AddParams) in the run methods; replace lines like kwargs =
AddParams(**kwargs) with params = cast(AddParams, kwargs) (import cast from
typing) and then use params["a"]/params["b"] (or params.get) in AddTool.run,
SubtractTool.run, MultiplyTool.run, and DivideTool.run to avoid instantiating
TypedDicts at runtime; update all four tool classes accordingly and ensure the
typing.cast import is added if missing.

In @tessl.json:
- Around line 1-95: The pyright entry is incorrect: "tessl/pypi-pyright"@1.1.0
points to a non-official PyPI package; replace it with the proper distribution
or registry and update tessl config. Specifically, remove or correct the
"tessl/pypi-pyright" dependency in tessl.json and either (a) switch to the
official npm package (e.g., add "tessl/npm-pyright" or the correct npm
identifier) or (b) verify that the intended PyPI package name and version exist
and update the version to a valid one; then regenerate pins via the tessl tool
and document the regeneration step so tessl-managed files remain consistent.
🧹 Nitpick comments (18)
libs/core/kiln_ai/datamodel/chunk.py (1)

147-156: Type narrowing pragmatism in discriminated union properties.

Both property getters add type: ignore[return-value] to suppress type-checker complaints when returning self.properties as a narrowed TypedDict type. This is a reasonable pragmatic solution given:

  • Runtime checks (lines 148–151 and 160–163) guarantee the correct type before return
  • TypedDict types cannot be narrowed at runtime by type checkers, even after discriminator validation
  • cast() is disallowed by your linting rules (per inline comment)
  • The explanatory comments clearly document the limitation and why this approach is necessary

The code is correct and safe. However, consider whether a longer-term refactoring to use Pydantic BaseModel subclasses or dataclasses instead of TypedDict for the properties union could eliminate the need for type suppression entirely, while still leveraging Pydantic v2's discriminated union feature.

Also applies to: 159-168

libs/core/kiln_ai/utils/test_config.py (1)

284-317: Good test coverage for the deep copy fix.

This test effectively validates that settings(hide_sensitive=True) doesn't mutate the underlying _settings data when masking sensitive keys in nested structures. The test correctly verifies both that masking occurs and that the original data remains intact.

Minor note: The async decorator is unnecessary since this test doesn't use any await statements, but it's harmless in pytest.

✨ Optional: Remove unnecessary async decorator
-async def test_openai_compatible_providers_not_mutated_by_hide_sensitive():
+def test_openai_compatible_providers_not_mutated_by_hide_sensitive():
     config = Config.shared()
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (1)

683-699: Inconsistency: current_task prop missing from existing modal's RunConfigComponent.

The new Dialog passes current_task={guidance_data.task} (line 759), but the existing "Generate Subtopics" modal's RunConfigComponent (lines 683-699) does not pass current_task. This inconsistency may cause subtle behavioral differences between the two modals.

Consider adding current_task={guidance_data.task} to the existing RunConfigComponent for consistency:

♻️ Suggested consistency fix
 <RunConfigComponent
   bind:this={run_config_component_modal}
   {project_id}
+  current_task={guidance_data.task}
   requires_structured_output={true}
   hide_prompt_selector={true}
   show_tools_selector_in_advanced={true}

Also applies to: 755-771

CONTRIBUTING.md (1)

77-78: Doc update LGTM; consider linking to Ty’s official docs.

libs/core/kiln_ai/tools/rag_tools.py (1)

227-235: Avoid rebinding kwargs; prefer explicit param extraction (and optional runtime guard) over TypedDict(**kwargs) + ignore.

RagParams(**kwargs) is mainly for typing, and the ignore masks the same failure you’d otherwise get at Line 231. Consider using a dedicated params variable (and, if you want, a small guard for friendlier errors).

Proposed tweak
-from typing import List, TypedDict
+from typing import List, TypedDict, cast
@@
     async def run(
         self, context: ToolCallContext | None = None, **kwargs
     ) -> ToolCallResult:
-        kwargs = RagParams(**kwargs)  # type: ignore[missing-typed-dict-key]
-        query = kwargs["query"]
+        params = cast(RagParams, kwargs)
+        query = params.get("query")
+        if not isinstance(query, str) or not query:
+            raise ValueError("Missing required parameter 'query'")
app/desktop/studio_server/finetune_api.py (3)

274-286: Centralize provider-id normalization (or fix finetune_registry key typing) to avoid per-call-site type: ignore.

This keeps runtime the same, but avoids scattering invalid-argument-type suppressions and reduces the chance of future mismatches between membership checks and indexing.


355-365: Same concern: prefer a typed helper/alias for finetune_registry[...] over local type: ignore.


426-437: Same concern: registry typing/normalization would remove this ignore and de-duplicate logic.

app/desktop/studio_server/tool_api.py (1)

609-615: Inconsistent property construction between add and edit endpoints.

In add_kiln_task_tool, properties are now constructed using the KilnTaskServerProperties constructor (lines 609-615), but edit_kiln_task_tool still uses a plain dict (lines 641-647).

For consistency and maintainability, both endpoints should use the same approach.

♻️ Apply KilnTaskServerProperties to edit_kiln_task_tool
     existing_tool_server.name = tool_data.name
     existing_tool_server.description = tool_data.description
-    existing_tool_server.properties = {
-        "name": tool_data.name,
-        "description": tool_data.description,
-        "task_id": tool_data.task_id,
-        "run_config_id": tool_data.run_config_id,
-        "is_archived": tool_data.is_archived,
-    }
+    existing_tool_server.properties = KilnTaskServerProperties(
+        name=tool_data.name,
+        description=tool_data.description,
+        task_id=tool_data.task_id,
+        run_config_id=tool_data.run_config_id,
+        is_archived=tool_data.is_archived,
+    )
libs/core/kiln_ai/datamodel/basemodel.py (1)

329-329: Multiple type: ignore annotations added for Ty compatibility.

Four type: ignore annotations have been added to suppress Ty type checker errors in metaprogramming and dynamic type construction code:

  • Line 329: mutable_copy return value
  • Lines 694 & 698: Dynamic method creation with parameterized return types
  • Line 790: Recursive validation with dynamic parent types

While these may be necessary for code that uses advanced metaprogramming patterns that type checkers struggle with, consider whether any of these indicate design issues that could be addressed with refactoring.

For future maintainability, consider adding inline comments explaining why each type: ignore is necessary, especially for Lines 694, 698, and 790 where dynamic type construction is involved.

Also applies to: 694-694, 698-698, 790-790

libs/core/kiln_ai/adapters/ml_model_list.py (1)

1519-1587: Gemini 3 Flash: reasoning_capable=True but Gemini API provider may need gemini_reasoning_enabled=True for consistency.
OpenRouter/Vertex set gemini_reasoning_enabled=True, but the Gemini API provider doesn’t; if that flag gates reasoning behavior, this is likely an omission.

Proposed patch (if Gemini API requires the flag)
             KilnModelProvider(
                 name=ModelProviderName.gemini_api,
                 model_id="gemini-3-flash-preview",
                 structured_output_mode=StructuredOutputMode.json_schema,
                 suggested_for_data_gen=True,
                 suggested_for_evals=True,
                 supports_doc_extraction=True,
                 multimodal_capable=True,
                 supports_vision=True,
                 multimodal_mime_types=[
@@
                     KilnMimeType.MP4,
                     KilnMimeType.MOV,
                 ],
+                gemini_reasoning_enabled=True,
                 reasoning_capable=True,
                 thinking_level="medium",
             ),
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte (1)

1-21: Nice, reusable no-data state; consider small a11y/theme tweaks.

  • If the SVG is decorative, add aria-hidden="true" (and/or focusable="false").
  • Consider theme-aware classes (e.g., text-base-content/50) instead of hardcoded text-gray-* if dark mode matters.
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (1)

138-178: Type widening is correct for the data structure.

The change from list[dict[str, str | None]] to list[dict[str, Any]] accurately reflects the actual message structure, which includes complex nested objects like tool_calls arrays (line 162). The original type was too restrictive for the diverse message types being constructed.

For improved type safety, consider defining TypedDict types for the different message shapes (BasicChatMessage dict, ToolCallMessage dict, ToolResponseMessage dict) instead of using Any. However, this is optional as Any is acceptable for this use case.

libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (2)

5-10: Consider centralizing the prefix as a constant.

You now have multiple call sites depending on "finetune_run_config::"; a module-level constant would reduce future drift.


13-23: Avoid name shadowing: parameter finetune_run_config_id hides the function name.

Rename the parameter (e.g., run_config_id) to keep stack traces / debugging clearer.

Proposed diff
-def finetune_from_finetune_run_config_id(finetune_run_config_id: str) -> Finetune:
+def finetune_from_finetune_run_config_id(run_config_id: str) -> Finetune:
@@
-    if not finetune_run_config_id.startswith("finetune_run_config::"):
+    if not run_config_id.startswith("finetune_run_config::"):
         raise ValueError(
-            f"Invalid finetune run config ID: {finetune_run_config_id}, expected format: finetune_run_config::project_id::task_id::finetune_id"
+            f"Invalid finetune run config ID: {run_config_id}, expected format: finetune_run_config::project_id::task_id::finetune_id"
         )
 
-    model_id = finetune_run_config_id.removeprefix("finetune_run_config::")
+    model_id = run_config_id.removeprefix("finetune_run_config::")
     return finetune_from_id(model_id)
app/desktop/studio_server/test_eval_api.py (1)

552-648: Good new coverage for finetune run-config lookup + aggregation.

One gap to consider adding: ensure a finetune_run_config::other_project::other_task::... ID is rejected (prevents cross-task leakage if someone crafts IDs).

libs/core/kiln_ai/adapters/provider_tools.py (1)

259-276: Make parser_from_finetune safe for missing/legacy base_model_id.

Right now built_in_provider_from_model_id(fine_tune.base_model_id, ...) assumes it’s a non-empty string. If older finetunes can exist without it, this will throw or misbehave.

Proposed diff
 def parser_from_finetune(
     fine_tune: Finetune,
 ) -> ModelParserID | None:
@@
-    base_model_provider = built_in_provider_from_model_id(
-        fine_tune.base_model_id, fine_tune.provider
-    )
+    if not getattr(fine_tune, "base_model_id", None):
+        return parser_from_data_strategy(fine_tune.data_strategy)
+
+    base_model_provider = built_in_provider_from_model_id(
+        fine_tune.base_model_id, fine_tune.provider
+    )

Based on learnings, keeping parser selection provider-specific (as this does) is the right direction.

app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte (1)

86-128: Missing-data-as-0 is potentially misleading; consider an explicit “missing” strategy.

Options:

  • Drop any series that has any missing key (stricter but honest), or
  • Show ChartNoData when any selected run config lacks enough keys, or
  • Keep 0, but annotate in tooltip/legend that some values are missing.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae1512c and 811194a.

⛔ Files ignored due to path filters (2)
  • app/web_ui/package-lock.json is excluded by !**/package-lock.json
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (51)
  • .cursor/mcp.json
  • .cursor/rules/.gitignore
  • .github/workflows/build_and_test.yml
  • .github/workflows/build_desktop.yml
  • .github/workflows/format_and_lint.yml
  • .gitignore
  • .tessl/.gitignore
  • AGENTS.md
  • CONTRIBUTING.md
  • README.md
  • app/desktop/studio_server/eval_api.py
  • app/desktop/studio_server/finetune_api.py
  • app/desktop/studio_server/test_eval_api.py
  • app/desktop/studio_server/tool_api.py
  • app/web_ui/src/lib/api_schema.d.ts
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
  • checks.sh
  • hooks_mcp.yaml
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
  • libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py
  • libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • libs/core/kiln_ai/adapters/test_prompt_builders.py
  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/cli/commands/test_package_project.py
  • libs/core/kiln_ai/datamodel/basemodel.py
  • libs/core/kiln_ai/datamodel/chunk.py
  • libs/core/kiln_ai/datamodel/dataset_filters.py
  • libs/core/kiln_ai/datamodel/dataset_split.py
  • libs/core/kiln_ai/datamodel/vector_store.py
  • libs/core/kiln_ai/tools/built_in_tools/math_tools.py
  • libs/core/kiln_ai/tools/rag_tools.py
  • libs/core/kiln_ai/utils/config.py
  • libs/core/kiln_ai/utils/test_config.py
  • libs/server/kiln_server/document_api.py
  • libs/server/kiln_server/mcp/mcp_server_tool_utils.py
  • pyproject.toml
  • pyrightconfig.json
  • tessl.json
💤 Files with no reviewable changes (2)
  • app/web_ui/src/lib/api_schema.d.ts
  • pyrightconfig.json
🧰 Additional context used
📓 Path-based instructions (10)
libs/core/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Python 3.10+ for the core library (libs/core) and Python 3.13 for the desktop app

Use Python 3.10+ for the core library (libs/core)

Files:

  • libs/core/kiln_ai/datamodel/chunk.py
  • libs/core/kiln_ai/adapters/test_prompt_builders.py
  • libs/core/kiln_ai/tools/rag_tools.py
  • libs/core/kiln_ai/utils/test_config.py
  • libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py
  • libs/core/kiln_ai/tools/built_in_tools/math_tools.py
  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/datamodel/basemodel.py
  • libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
  • libs/core/kiln_ai/datamodel/vector_store.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/core/kiln_ai/datamodel/dataset_filters.py
  • libs/core/kiln_ai/datamodel/dataset_split.py
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
  • libs/core/kiln_ai/utils/config.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • libs/core/kiln_ai/cli/commands/test_package_project.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation

**/*.py: Use Pydantic v2, not v1
Use asyncio for asynchronous operations in Python

Files:

  • libs/core/kiln_ai/datamodel/chunk.py
  • app/desktop/studio_server/finetune_api.py
  • libs/core/kiln_ai/adapters/test_prompt_builders.py
  • libs/core/kiln_ai/tools/rag_tools.py
  • libs/core/kiln_ai/utils/test_config.py
  • libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py
  • libs/core/kiln_ai/tools/built_in_tools/math_tools.py
  • app/desktop/studio_server/eval_api.py
  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/datamodel/basemodel.py
  • app/desktop/studio_server/tool_api.py
  • libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
  • libs/core/kiln_ai/datamodel/vector_store.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • app/desktop/studio_server/test_eval_api.py
  • libs/core/kiln_ai/datamodel/dataset_filters.py
  • libs/core/kiln_ai/datamodel/dataset_split.py
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
  • libs/core/kiln_ai/utils/config.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • libs/core/kiln_ai/cli/commands/test_package_project.py
  • libs/server/kiln_server/document_api.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
  • libs/server/kiln_server/mcp/mcp_server_tool_utils.py
app/web_ui/**/*.svelte

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

app/web_ui/**/*.svelte: Use Svelte v4 (not v5) for the web frontend UI
Use DaisyUI for UI components in the web frontend

Use Svelte v4, not v5

Files:

  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
app/web_ui/**/*.{svelte,ts}

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Tailwind CSS for styling in the web frontend

Files:

  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
app/web_ui/**/*.{svelte,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Tailwind CSS and DaisyUI for styling the frontend

Files:

  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte
app/desktop/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use Python 3.13 for the desktop application (app/desktop)

Files:

  • app/desktop/studio_server/finetune_api.py
  • app/desktop/studio_server/eval_api.py
  • app/desktop/studio_server/tool_api.py
  • app/desktop/studio_server/test_eval_api.py
{libs/server,app/desktop/studio_server}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI for REST server implementation

Files:

  • app/desktop/studio_server/finetune_api.py
  • app/desktop/studio_server/eval_api.py
  • app/desktop/studio_server/tool_api.py
  • app/desktop/studio_server/test_eval_api.py
  • libs/server/kiln_server/document_api.py
  • libs/server/kiln_server/mcp/mcp_server_tool_utils.py
**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use pytest for testing Python code

Files:

  • libs/core/kiln_ai/adapters/test_prompt_builders.py
  • libs/core/kiln_ai/utils/test_config.py
  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • app/desktop/studio_server/test_eval_api.py
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
  • libs/core/kiln_ai/cli/commands/test_package_project.py
**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest for testing in Python

Files:

  • libs/core/kiln_ai/adapters/test_prompt_builders.py
  • libs/core/kiln_ai/utils/test_config.py
  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • app/desktop/studio_server/test_eval_api.py
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
  • libs/core/kiln_ai/cli/commands/test_package_project.py
libs/server/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use FastAPI for creating REST servers

Files:

  • libs/server/kiln_server/document_api.py
  • libs/server/kiln_server/mcp/mcp_server_tool_utils.py
🧠 Learnings (24)
📓 Common learnings
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/routes/(app)/docs/library/[project_id]/upload_file_dialog.svelte:107-120
Timestamp: 2025-09-10T08:32:18.688Z
Learning: leonardmq prefers to work within the constraints of their SDK codegen for API calls, even when typing is awkward (like casting FormData to match expected types), rather than using alternative approaches like native fetch that would cause compiler errors with their generated types.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/datamodel/chunk.py:138-160
Timestamp: 2025-08-22T11:17:56.862Z
Learning: leonardmq prefers to avoid redundant validation checks when upstream systems already guarantee preconditions are met. He trusts the attachment system to ensure paths are properly formatted and prefers letting the attachment method (resolve_path) handle any edge cases rather than adding defensive precondition checks.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 390
File: libs/core/kiln_ai/adapters/provider_tools.py:494-499
Timestamp: 2025-07-23T08:58:45.769Z
Learning: leonardmq prefers to keep tightly coupled implementation details (like API versions) hardcoded when they are not user-facing and could break other modules if changed. The shared Config class is reserved for user-facing customizable values, not internal implementation details.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 341
File: libs/server/kiln_server/document_api.py:44-51
Timestamp: 2025-06-18T08:22:58.510Z
Learning: leonardmq prefers to defer fixing blocking I/O in async handlers when: the operation is very fast (milliseconds), user-triggered rather than automated, has no concurrency concerns, and would require additional testing to fix properly. He acknowledges such issues as valid but makes pragmatic decisions about timing the fixes.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/datamodel/rag.py:33-35
Timestamp: 2025-09-04T06:45:44.212Z
Learning: leonardmq requires vector_store_config_id to be a mandatory field in RagConfig (similar to extractor_config_id, chunker_config_id, embedding_config_id) for consistency. He prefers fixing dependent code that breaks due to missing required fields rather than making fields optional to accommodate incomplete data.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 654
File: app/web_ui/src/routes/(app)/docs/rag_configs/[project_id]/create_rag_config/edit_rag_config_form.svelte:141-152
Timestamp: 2025-09-25T06:38:14.854Z
Learning: leonardmq prefers simple onMount initialization patterns over reactive statements when possible, and is cautious about maintaining internal state for idempotency in Svelte components. He values simplicity and safety in component lifecycle management.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 402
File: libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.py:0-0
Timestamp: 2025-07-14T03:43:07.283Z
Learning: leonardmq prefers to keep defensive validation checks even when they're technically redundant, viewing them as useful "quick sanity checks" that provide additional safety nets. He values defensive programming over strict DRY (Don't Repeat Yourself) principles when the redundant code serves as a safeguard.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/lib/api_schema.d.ts:0-0
Timestamp: 2025-09-10T08:27:47.227Z
Learning: leonardmq correctly identifies auto-generated files and prefers not to manually edit them. When suggesting changes to generated TypeScript API schema files, the fix should be made in the source OpenAPI specification in the backend, not in the generated TypeScript file.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/adapters/vector_store/lancedb_adapter.py:136-150
Timestamp: 2025-09-04T06:34:12.318Z
Learning: leonardmq prefers brute force re-insertion of all chunks when partial chunks exist in the LanceDB vector store, rather than selectively inserting only missing chunks. His reasoning is that documents typically have only dozens of chunks (making overwriting cheap), and partial insertion likely indicates data corruption or conflicts that should be resolved by re-inserting all chunks to ensure consistency.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/adapters/vector_store/lancedb_adapter.py:98-103
Timestamp: 2025-09-04T07:08:04.248Z
Learning: leonardmq prefers to let TableNotFoundError bubble up in delete_nodes_by_document_id operations rather than catching it, as this error indicates operations are being done in the wrong order on the caller side and should be surfaced as a programming error rather than handled gracefully.
📚 Learning: 2025-12-15T22:02:37.380Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-15T22:02:37.380Z
Learning: Follow the instructions in .tessl/RULES.md

Applied to files:

  • .tessl/.gitignore
  • .cursor/rules/.gitignore
  • AGENTS.md
📚 Learning: 2025-09-25T07:20:11.459Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 650
File: app/web_ui/src/routes/(app)/docs/library/[project_id]/[document_id]/+page.svelte:19-19
Timestamp: 2025-09-25T07:20:11.459Z
Learning: When analyzing relative import paths in SvelteKit route structures, be more careful to verify the actual directory structure exists before suggesting corrections. The import path ../../../../generate/[project_id]/[task_id]/table_button.svelte from app/web_ui/src/routes/(app)/docs/library/[project_id]/[document_id]/+page.svelte correctly resolves to the existing file at app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/table_button.svelte.

Applied to files:

  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
  • app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
📚 Learning: 2025-10-24T05:01:15.465Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 736
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/qna/qna_ui_store.ts:552-585
Timestamp: 2025-10-24T05:01:15.465Z
Learning: In the Q&A generation workflow (app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/qna/qna_ui_store.ts), chunking is intentionally only ensured for target="all" generation. When users generate Q&A for a specific document or part, they do not alter the chunking strategy; the UI only surfaces document/part-level generation actions after the user has already run the global chunking flow (or chosen to use full documents without chunking). This is by design to separate the chunking decision from per-item generation.
<!-- [/add_learning]

Applied to files:

  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

Applied to files:

  • libs/core/kiln_ai/adapters/test_prompt_builders.py
  • libs/core/kiln_ai/utils/test_config.py
  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-11-24T19:13:16.966Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: .cursor/rules/project.mdc:0-0
Timestamp: 2025-11-24T19:13:16.966Z
Learning: Run linting, formatting, and typechecking tools before completing tasks to ensure code meets standards

Applied to files:

  • checks.sh
  • .github/workflows/format_and_lint.yml
📚 Learning: 2025-11-24T19:13:16.966Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: .cursor/rules/project.mdc:0-0
Timestamp: 2025-11-24T19:13:16.966Z
Learning: Applies to **/*test*.py : Use pytest for testing Python code

Applied to files:

  • hooks_mcp.yaml
📚 Learning: 2025-12-06T07:53:22.155Z
Learnt from: chiang-daniel
Repo: Kiln-AI/Kiln PR: 781
File: app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte:200-222
Timestamp: 2025-12-06T07:53:22.155Z
Learning: In app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte, the process_model_change function intentionally overrides selected_run_config_id when a model has a model_specific_run_config property. This is by design to ensure fine-tuned models automatically use their associated run configuration (the configuration they were trained with), even if triggered by selecting a saved run config that uses that model.

Applied to files:

  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
📚 Learning: 2025-10-22T04:31:20.500Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 739
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte:170-177
Timestamp: 2025-10-22T04:31:20.500Z
Learning: In Svelte components for the Kiln project, leonardmq prefers tracking previous query parameter values and comparing them in reactive blocks (rather than just checking for existence) to avoid duplicate execution on initial load while still reacting to subsequent navigation changes. This pattern supports both current behavior and future feature additions like mode switching.

Applied to files:

  • app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
📚 Learning: 2025-09-17T17:36:30.231Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 614
File: libs/server/kiln_server/document_api.py:1413-1414
Timestamp: 2025-09-17T17:36:30.231Z
Learning: When RagConfig schema changes are made in the Kiln AI system, older RagConfig instances created before the schema change are discarded rather than migrated, eliminating the need for fallback logic or backwards compatibility handling in the API responses. This is acceptable because RagConfig features are still in pre-release state.

Applied to files:

  • libs/core/kiln_ai/tools/rag_tools.py
📚 Learning: 2025-09-17T17:34:34.343Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 614
File: libs/server/kiln_server/document_api.py:205-207
Timestamp: 2025-09-17T17:34:34.343Z
Learning: As of September 2025, RagConfig features in the Kiln AI system are still in pre-release state and have not been released to users yet, so breaking backwards compatibility changes are acceptable for RagConfig-related fields and functionality.

Applied to files:

  • libs/core/kiln_ai/tools/rag_tools.py
📚 Learning: 2025-08-08T16:19:20.074Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1736-1741
Timestamp: 2025-08-08T16:19:20.074Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwq_32b with ModelProviderName.siliconflow_cn (model_id "Qwen/QwQ-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via tests/responses that SiliconFlow returns outputs that parse correctly without an R1 parser. Parser usage remains provider-specific and should only be added when validated.

Applied to files:

  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:14:54.346Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1979-1983
Timestamp: 2025-08-08T16:14:54.346Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py, for ModelName.deepseek_r1_distill_qwen_32b on provider siliconflow_cn (model_id "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via testing that SiliconFlow returns final outputs that parse without an R1 parser, and reasoning may be omitted under structured output; rely on reasoning_optional_for_structured_output=True.

Applied to files:

  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:15:20.796Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:2434-2439
Timestamp: 2025-08-08T16:15:20.796Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwen_3_8b_no_thinking with ModelProviderName.siliconflow_cn (model_id "Qwen/Qwen3-8B"), the qwen3_style_no_think formatter is not required: SiliconFlow’s non‑thinking Qwen3‑8B returns clean output without <answer> wrappers. Formatter/parser usage is provider-specific and should be added only when verified by tests/responses.

Applied to files:

  • libs/core/kiln_ai/adapters/test_provider_tools.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-10-21T00:23:00.770Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 737
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte:287-301
Timestamp: 2025-10-21T00:23:00.770Z
Learning: In Kiln AI, for topic/inputs generation flows (such as in generate_samples_modal.svelte and generated_data_node.svelte), the task input_json_schema is always set. Therefore, hardcoding requires_structured_output={true} in RunConfigComponent for these flows is correct and does not need to be conditional like it is for output generation.

Applied to files:

  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
📚 Learning: 2025-11-24T19:13:16.966Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: .cursor/rules/project.mdc:0-0
Timestamp: 2025-11-24T19:13:16.966Z
Learning: Applies to app/web_ui/**/*.svelte : Use DaisyUI for UI components in the web frontend

Applied to files:

  • app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
📚 Learning: 2025-08-22T11:15:53.584Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/adapters/embedding/embedding_registry.py:14-20
Timestamp: 2025-08-22T11:15:53.584Z
Learning: In the Kiln AI codebase, model_provider_name fields in datamodels like EmbeddingConfig are typed as ModelProviderName enum (which inherits from str, Enum), not as plain strings. Therefore, accessing .value on these fields is valid and returns the string representation of the enum value.

Applied to files:

  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/provider_tools.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-10-28T05:25:09.283Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 757
File: libs/core/kiln_ai/adapters/reranker_list.py:39-42
Timestamp: 2025-10-28T05:25:09.283Z
Learning: In the Kiln codebase (Python), model family and model name fields in reranker and similar model definition classes (like KilnRerankerModel in libs/core/kiln_ai/adapters/reranker_list.py) should use plain str types rather than Enum types. This is because users receive over-the-air (OTA) updates for new models, and using Enums would break when clients with older builds encounter new model families their Enum definitions don't include. Strings provide forward compatibility for the OTA update mechanism.

Applied to files:

  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-09-10T08:06:53.717Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 585
File: libs/core/kiln_ai/adapters/ml_embedding_model_list.py:32-33
Timestamp: 2025-09-10T08:06:53.717Z
Learning: In the Kiln AI codebase, friendly_name fields in embedding model configurations should maintain the original model naming conventions from providers (e.g., "gemini-embedding-001", "text-embedding-3-small") for human readability, rather than following snake_case conventions like the enum values.

Applied to files:

  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-10-01T21:48:25.854Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 674
File: libs/core/kiln_ai/tools/kiln_task_tool.py:30-34
Timestamp: 2025-10-01T21:48:25.854Z
Learning: In libs/core/kiln_ai/datamodel/external_tool_server.py, KilnTaskServerProperties is a TypedDict, not a Pydantic model. At runtime it's a regular Python dict, so .get() method works correctly and no .model_dump() conversion is needed.

Applied to files:

  • app/desktop/studio_server/tool_api.py
📚 Learning: 2025-10-29T11:19:31.059Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 759
File: app/web_ui/src/lib/api_schema.d.ts:2485-2490
Timestamp: 2025-10-29T11:19:31.059Z
Learning: In libs/server/kiln_server/document_api.py and the generated app/web_ui/src/lib/api_schema.d.ts, CreateVectorStoreConfigRequest.properties is now a union of three distinct public schemas: LanceDBConfigFTSPropertiesPublic, LanceDBConfigVectorPropertiesPublic, and LanceDBConfigHybridPropertiesPublic. These are intentionally split to allow independent evolution; do not suggest merging them.

Applied to files:

  • libs/core/kiln_ai/datamodel/vector_store.py
  • libs/server/kiln_server/document_api.py
📚 Learning: 2025-10-27T09:38:25.562Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 753
File: libs/server/kiln_server/document_api.py:279-299
Timestamp: 2025-10-27T09:38:25.562Z
Learning: In the Kiln codebase, leonardmq intentionally overrides (rather than using setdefault) certain internal config fields like include_metadata and include_prev_next_rel in semantic chunker properties, even if the frontend sends them. These fields are too granular for user control and are set to enforce config stability.

Applied to files:

  • libs/server/kiln_server/document_api.py
📚 Learning: 2025-07-16T09:37:39.816Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:0-0
Timestamp: 2025-07-16T09:37:39.816Z
Learning: The `glm_z1_rumination_32b_0414` model was intentionally removed from the built_in_models list due to output formatting issues: output was duplicated in both `output` and `reasoning` fields, and contained random internal JSON in the output. This model should not be re-added without addressing these formatting problems.

Applied to files:

  • libs/core/kiln_ai/adapters/ml_model_list.py
🧬 Code graph analysis (15)
libs/core/kiln_ai/utils/test_config.py (1)
libs/core/kiln_ai/utils/config.py (4)
  • Config (32-293)
  • shared (190-193)
  • save_setting (278-279)
  • settings (256-276)
libs/core/kiln_ai/adapters/test_provider_tools.py (2)
libs/core/kiln_ai/adapters/provider_tools.py (2)
  • built_in_provider_from_model_id (117-136)
  • parser_from_finetune (259-276)
libs/core/kiln_ai/adapters/ml_model_list.py (1)
  • ModelParserID (227-233)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)
  • config (47-58)
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • ModelProviderName (84-106)
libs/core/kiln_ai/datamodel/basemodel.py (1)
libs/core/kiln_ai/datamodel/test_basemodel.py (4)
  • relationship_name (118-119)
  • relationship_name (146-147)
  • parent_type (122-123)
  • parent_type (150-151)
app/desktop/studio_server/tool_api.py (2)
app/web_ui/src/lib/types.ts (1)
  • KilnTaskServerProperties (95-96)
libs/core/kiln_ai/datamodel/external_tool_server.py (1)
  • KilnTaskServerProperties (43-48)
libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py (1)
libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (2)
  • finetune_run_config_id (5-10)
  • finetune_from_finetune_run_config_id (13-23)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (1)
libs/core/kiln_ai/adapters/chat/chat_formatter.py (1)
  • messages (71-72)
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py (5)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)
  • BaseAdapter (55-370)
libs/core/kiln_ai/adapters/run_output.py (1)
  • RunOutput (10-14)
libs/core/kiln_ai/datamodel/task_output.py (2)
  • DataSource (191-318)
  • DataSourceType (163-174)
libs/core/kiln_ai/datamodel/task.py (1)
  • Task (102-179)
libs/core/kiln_ai/datamodel/task_run.py (1)
  • Usage (17-73)
app/desktop/studio_server/test_eval_api.py (3)
app/desktop/studio_server/eval_api.py (2)
  • get_all_run_configs (66-88)
  • task_run_config_from_id (91-114)
libs/core/kiln_ai/datamodel/finetune.py (1)
  • Finetune (23-127)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • FineTuneStatusType (52-61)
libs/core/kiln_ai/datamodel/dataset_filters.py (1)
libs/core/kiln_ai/datamodel/test_dataset_split.py (1)
  • task_run (78-93)
libs/core/kiln_ai/adapters/provider_tools.py (3)
app/web_ui/src/lib/types.ts (2)
  • ModelProviderName (65-65)
  • Finetune (18-18)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • ModelProviderName (84-106)
libs/core/kiln_ai/adapters/ml_model_list.py (2)
  • KilnModelProvider (244-316)
  • ModelParserID (227-233)
libs/core/kiln_ai/cli/commands/test_package_project.py (1)
libs/core/kiln_ai/cli/commands/package_project.py (1)
  • package_project (587-729)
libs/server/kiln_server/document_api.py (1)
libs/core/kiln_ai/datamodel/chunk.py (2)
  • ChunkerType (35-37)
  • FixedWindowChunkerProperties (49-52)
libs/core/kiln_ai/adapters/ml_model_list.py (2)
libs/core/kiln_ai/datamodel/datamodel_enums.py (3)
  • ModelProviderName (84-106)
  • StructuredOutputMode (27-49)
  • KilnMimeType (109-133)
libs/core/kiln_ai/datamodel/extraction.py (1)
  • friendly_name (289-291)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)

Comment on lines 66 to 89
def get_all_run_configs(project_id: str, task_id: str) -> list[TaskRunConfig]:
"""
Returns all run configs for a task, including completed fine-tune run configs.
Only includes fine-tunes that have a fine_tune_model_id (are completed and usable).
"""
task = task_from_id(project_id, task_id)
configs = task.run_configs()

# Get run configs from finetunes and only include completed fine-tunes
finetunes = task.finetunes()
for finetune in finetunes:
if finetune.run_config is not None and finetune.fine_tune_model_id is not None:
configs.append(
TaskRunConfig(
id=finetune_run_config_id(project_id, task_id, str(finetune.id)),
name=finetune.name,
description=finetune.description,
run_config_properties=finetune.run_config,
parent=task, # special case, we need to reference the task model
)
)

return configs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t mutate the list returned by task.run_configs(); build a new list.

If task.run_configs() returns an internal list (not a copy), appending finetune-derived configs could leak into other call paths.

Proposed diff
 def get_all_run_configs(project_id: str, task_id: str) -> list[TaskRunConfig]:
@@
     task = task_from_id(project_id, task_id)
-    configs = task.run_configs()
+    configs = list(task.run_configs())
🤖 Prompt for AI Agents
In @app/desktop/studio_server/eval_api.py around lines 66 - 89, The
get_all_run_configs function currently appends finetune-derived TaskRunConfig
objects to the list returned by task.run_configs(), which may mutate an internal
list; instead, make a new list copy (e.g., copy or list(...) of
task.run_configs()) into the local variable configs before appending finetune
entries so you never modify the original. Keep the rest of the logic (filtering
finetunes by run_config and fine_tune_model_id and constructing TaskRunConfig
with finetune_run_config_id, name, description, run_config_properties,
parent=task) unchanged.

Comment on lines 73 to 93
// Get simple display name for the series (used as the internal name/key)
function getRunConfigDisplayName(config: TaskRunConfig): string {
const modelId = config.run_config_properties?.model_name
const providerId = config.run_config_properties?.model_provider_name
return config.name || getDetailedModelName(config, model_info) || "Unknown"
}
if (modelId && providerId && model_info?.models) {
const key = `${providerId}/${modelId}`
if (model_info.models[key]) {
return model_info.models[key].name
}
}
// Build a map from display name to full legend text (name, model, prompt)
function buildLegendFormatter(): Record<string, string> {
const formatter: Record<string, string> = {}
for (const config of run_configs) {
if (!config.id) continue
const displayName = getRunConfigDisplayName(config)
const modelName = getDetailedModelName(config, model_info) || "Unknown"
const promptName = getRunConfigPromptDisplayName(config, prompts)
return config.name || "Unknown"
// Multi-line legend: display name on first line, model and prompt on 2nd/3rd
formatter[displayName] =
`${displayName}\n{sub|Model: ${modelName}}\n{sub|Prompt: ${promptName}}`
}
return formatter
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid series/legend key collisions: don’t use displayName as the ECharts series identity.

If two run configs share the same config.name (or same fallback), ECharts will treat them as the same series/legend item and your legendFormatter map will overwrite entries. Use config.id as the series/legend key and format it to a multi-line label via formatter.

Proposed diff (key by config.id)
-  // Get simple display name for the series (used as the internal name/key)
+  // Get display name (used only for human-readable labels)
   function getRunConfigDisplayName(config: TaskRunConfig): string {
     return config.name || getDetailedModelName(config, model_info) || "Unknown"
   }

-  function buildLegendFormatter(): Record<string, string> {
+  function buildLegendFormatter(): Record<string, string> {
     const formatter: Record<string, string> = {}
     for (const config of run_configs) {
       if (!config.id) continue
-
-      const displayName = getRunConfigDisplayName(config)
+      const displayName = getRunConfigDisplayName(config)
       const modelName = getDetailedModelName(config, model_info) || "Unknown"
       const promptName = getRunConfigPromptDisplayName(config, prompts)
 
-      formatter[displayName] =
-        `${displayName}\n{sub|Model: ${modelName}}\n{sub|Prompt: ${promptName}}`
+      formatter[config.id] =
+        `${displayName}\n{sub|Model: ${modelName}}\n{sub|Prompt: ${promptName}}`
     }
     return formatter
   }
@@
     run_configs.forEach((config) => {
       const configId = config.id
       if (!configId) return
@@
       if (xValue !== null && yValue !== null) {
-        const name = getRunConfigDisplayName(config)
-        legend.push(name)
+        legend.push(configId)
 
         series.push({
-          name,
+          name: configId,
           type: "scatter",
           data: [[xValue, yValue, configId]],
           symbolSize: 15,
           emphasis: {
             scale: 2,
           },
         })
       }
     })

Also applies to: 137-162, 170-222

🤖 Prompt for AI Agents
In
@app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte
around lines 73 - 93, The legend/series keys currently use
getRunConfigDisplayName(config) which can collide when two run_configs share the
same display name; change series/legend identity to use the unique config.id
instead and build the human-readable multi-line label via the
formatter/legendFormatter mapping so ECharts keys are unique while labels remain
formatted (update buildLegendFormatter, any places using getRunConfigDisplayName
as the series key, and where legendFormatter is consumed to map config.id ->
`${display}\n{sub|Model: ...}\n{sub|Prompt: ...}`).

Comment on lines 288 to 302
// Add legend hover interaction to highlight corresponding chart points
chartInstance.on("mouseover", "legendItem", (params: { name: string }) => {
chartInstance?.dispatchAction({
type: "highlight",
seriesName: params.name,
})
})
chartInstance.on("mouseout", "legendItem", (params: { name: string }) => {
chartInstance?.dispatchAction({
type: "downplay",
seriesName: params.name,
})
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Apache ECharts v6 legend hover event mouseover mouseout legendItem

💡 Result:

Yes — in ECharts v6 you can detect hover over legend items by listening to general mouse events (e.g. "mouseover" / "mouseout") and checking params.componentType === 'legend'. You can also react to the highlight/downplay actions the legend triggers.

Example:

  • listen for legend hover:
    var chart = echarts.init(dom);
    chart.on('mouseover', function(params){
    if (params.componentType === 'legend') {
    console.log('legend hover:', params.name);
    // do something, e.g. dispatchAction to highlight series
    chart.dispatchAction({ type: 'highlight', seriesName: params.name });
    }
    });
    chart.on('mouseout', function(params){
    if (params.componentType === 'legend') {
    console.log('legend out:', params.name);
    chart.dispatchAction({ type: 'downplay', seriesName: params.name });
    }
    });

Notes:

  • ECharts supports general mouse events ('click','mouseover','mouseout',etc.). [1]
  • Legend-specific interactions are documented in the Legend handbook; legend events (like select changes) exist, and legend items will produce params where componentType === 'legend'. [2]
  • You can also subscribe to 'highlight' and 'downplay' events (fired by component interactions) if you prefer to react to those actions. [3]
  • If you only need to show info on legend hover, legend.tooltip.formatter is another built-in option. [4]

Sources: [1] ECharts events docs; [2] ECharts legend docs; [3] discussion re: legend mouse/highlight events; [4] legend.tooltip example.


🌐 Web query:

ECharts chart.on legend item click hover event handler parameters

💡 Result:

Short answer:

  • Legend toggle event: listen to "legendselectchanged". Handler receives params like:

    • type: "legendselectchanged"
    • name: string (legend item name)
    • selected: Object (map of legend name → boolean)
      Example: myChart.on('legendselectchanged', params => { /* params.name, params.selected */ }). [1][2]
  • Legend click / hover as mouse events: you can listen to mouse events ('click', 'mouseover', 'mouseout', etc.). The event params follow the general mouse-event shape and for legend-originated events include componentType = "legend", plus name, event type, etc. Typical fields:

    • componentType: string (e.g. "legend")
    • type: string (e.g. "click" or "mouseover")
    • name: string (legend item name)
    • (other mouse-event fields if applicable). You can also filter with chart.on('click', {component: 'legend'}, handler) or scoped selector. [3][4]

References:
[1] ECharts — Event and Action (legend example).
[2] ECharts — events API (legendselectchanged param shape).
[3] ECharts — events (mouse event params structure).
[4] ECharts — echartsInstance.on / chart.on usage examples.


🌐 Web query:

ECharts "legendItem" event target documentation

💡 Result:

ECharts does not have a separate "legendItem" event name — legend interactions are exposed as:

  • Mouse events on legend graphics (e.g. click/mouseover) produce the usual event params with componentType = "legend" and name = the legend item name (see general mouse-event params). [1]

  • Component behavior events for legend changes are: legendselectchanged, legendselected, legendunselected, legendselectall, legendinverseselect, and legendscroll (their params include name and selected table where applicable). [1][2]

References:

  • Events list and legend event payloads (includes mouse-event params and legendselectchanged/legendselected/etc.). [1]
  • Handbook: listening to legend events (example using legendselectchanged). [2]

Sources:
[1] Apache ECharts — events (events.md).
[2] Apache ECharts Handbook — Event and Action.


Fix ECharts legend hover event listener — "legendItem" is not a valid ECharts event target.

In Apache ECharts v6, there is no "legendItem" event query. Legend mouse events require checking params.componentType === 'legend' instead. Update to:

chartInstance.on("mouseover", (params) => {
  if (params.componentType === "legend") {
    chartInstance?.dispatchAction({
      type: "highlight",
      seriesName: params.name,
    })
  }
})

chartInstance.on("mouseout", (params) => {
  if (params.componentType === "legend") {
    chartInstance?.dispatchAction({
      type: "downplay",
      seriesName: params.name,
    })
  }
})
🤖 Prompt for AI Agents
In
@app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte
around lines 288 - 302, The current legend hover handlers use an invalid ECharts
target "legendItem"; update the listeners on chartInstance to listen for generic
"mouseover" and "mouseout" events and inside the handler check
params.componentType === "legend" before calling chartInstance.dispatchAction
with type "highlight" / "downplay" and seriesName params.name; adjust the
handlers around chartInstance.on(...) and the dispatchAction(...) calls (refer
to chartInstance, params.componentType, params.name, and dispatchAction) so
legend hovers correctly trigger highlight/downplay.

Comment on lines 50 to 71
// Get simple display name for the series (used as the internal name/key)
function getRunConfigDisplayName(config: TaskRunConfig): string {
return config.name || getDetailedModelName(config, model_info) || "Unknown"
}
// Build a map from display name to full legend text (name, model, prompt)
function buildLegendFormatter(): Record<string, string> {
const formatter: Record<string, string> = {}
for (const configId of selectedRunConfigIds) {
const config = run_configs.find((c) => c.id === configId)
if (!config) continue
const displayName = getRunConfigDisplayName(config)
const modelName = getDetailedModelName(config, model_info) || "Unknown"
const promptName = getRunConfigPromptDisplayName(config, prompts)
// Multi-line legend: display name on first line, model and prompt on 2nd/3rd
formatter[displayName] =
`${displayName}\n{sub|Model: ${modelName}}\n{sub|Prompt: ${promptName}}`
}
return formatter
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same as scatter: key legend/series by configId to avoid duplicate-name collisions.

Suggested approach (no full diff)
  • Use configId as the name for each radar series entry and as the legend item.
  • Keep getRunConfigDisplayName() for human text.
  • Make buildLegendFormatter() return a { [configId]: formattedLegendText } map.

Also applies to: 73-131, 154-215

🤖 Prompt for AI Agents
In
@app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte
around lines 50 - 71, The legend and series currently use the human display name
as the key which can collide; change the series/legend keys to use the unique
config id instead. Keep getRunConfigDisplayName(config) for the human label, but
modify buildLegendFormatter() to produce a map keyed by config.id (i.e., {
[configId]: formattedLegendText }) and update any code that constructs radar
series entries to use config.id as the series name/legend key while using
getRunConfigDisplayName(config) only for visible text; apply the same change to
the other similar blocks referenced (lines ~73-131 and ~154-215) so all
legend/series keys are configId-based.

Comment on lines 629 to 659
async def test_custom_prompt_builder(base_task):
"""Test that custom prompt builder can be injected via AdapterConfig"""

# Create a custom prompt builder
class CustomPromptBuilder(BasePromptBuilder):
def build_base_prompt(self) -> str:
return "This is a custom prompt from injected builder"

custom_builder = CustomPromptBuilder(base_task)

adapter = MockAdapter(
task=base_task,
run_config=RunConfigProperties(
model_name="test_model",
model_provider_name="openai",
prompt_id="simple_prompt_builder",
structured_output_mode="json_schema",
),
config=AdapterConfig(prompt_builder=custom_builder),
)

# Mock model provider
provider = MagicMock()
provider.reasoning_capable = False
provider.tuned_chat_strategy = None
adapter.model_provider = MagicMock(return_value=provider)

# Test that the custom prompt builder is used
formatter = adapter.build_chat_formatter(input="test input")
assert formatter.system_message == "This is a custom prompt from injected builder"
assert adapter.prompt_builder == custom_builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential test break: build_chat_formatter(input="...") may not match the method signature.
Elsewhere in this file it’s called positionally (build_chat_formatter("test input")), so using a keyword arg here could raise TypeError if the parameter isn’t literally named input.

Proposed patch
-    formatter = adapter.build_chat_formatter(input="test input")
+    formatter = adapter.build_chat_formatter("test input")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_custom_prompt_builder(base_task):
"""Test that custom prompt builder can be injected via AdapterConfig"""
# Create a custom prompt builder
class CustomPromptBuilder(BasePromptBuilder):
def build_base_prompt(self) -> str:
return "This is a custom prompt from injected builder"
custom_builder = CustomPromptBuilder(base_task)
adapter = MockAdapter(
task=base_task,
run_config=RunConfigProperties(
model_name="test_model",
model_provider_name="openai",
prompt_id="simple_prompt_builder",
structured_output_mode="json_schema",
),
config=AdapterConfig(prompt_builder=custom_builder),
)
# Mock model provider
provider = MagicMock()
provider.reasoning_capable = False
provider.tuned_chat_strategy = None
adapter.model_provider = MagicMock(return_value=provider)
# Test that the custom prompt builder is used
formatter = adapter.build_chat_formatter(input="test input")
assert formatter.system_message == "This is a custom prompt from injected builder"
assert adapter.prompt_builder == custom_builder
async def test_custom_prompt_builder(base_task):
"""Test that custom prompt builder can be injected via AdapterConfig"""
# Create a custom prompt builder
class CustomPromptBuilder(BasePromptBuilder):
def build_base_prompt(self) -> str:
return "This is a custom prompt from injected builder"
custom_builder = CustomPromptBuilder(base_task)
adapter = MockAdapter(
task=base_task,
run_config=RunConfigProperties(
model_name="test_model",
model_provider_name="openai",
prompt_id="simple_prompt_builder",
structured_output_mode="json_schema",
),
config=AdapterConfig(prompt_builder=custom_builder),
)
# Mock model provider
provider = MagicMock()
provider.reasoning_capable = False
provider.tuned_chat_strategy = None
adapter.model_provider = MagicMock(return_value=provider)
# Test that the custom prompt builder is used
formatter = adapter.build_chat_formatter("test input")
assert formatter.system_message == "This is a custom prompt from injected builder"
assert adapter.prompt_builder == custom_builder
🤖 Prompt for AI Agents
In @libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py around lines
629 - 659, The test calls build_chat_formatter with a keyword arg named input
which may not match the actual method signature; change the call to use a
positional argument instead (e.g., call build_chat_formatter("test input") ) so
it matches the MockAdapter.build_chat_formatter signature and avoids a
TypeError, and keep the rest of the assertions (formatter.system_message and
adapter.prompt_builder) unchanged.

Comment on lines 117 to 136
def built_in_provider_from_model_id(
model_id: str, provider_name: ModelProviderName | str
) -> KilnModelProvider | None:
"""
Look up a built-in provider entry by the provider and its provider-specific model ID.
"""
try:
provider_enum = (
provider_name
if isinstance(provider_name, ModelProviderName)
else ModelProviderName(provider_name)
)
except ValueError:
return None

for model in built_in_models:
for provider in model.providers:
if provider.name == provider_enum and provider.model_id == model_id:
return provider
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against model.providers is None in built_in_provider_from_model_id.

Even if current built_in_models always populates providers, this helper is generic and will throw if a model entry lacks providers.

Proposed diff
 def built_in_provider_from_model_id(
     model_id: str, provider_name: ModelProviderName | str
 ) -> KilnModelProvider | None:
@@
-    for model in built_in_models:
-        for provider in model.providers:
+    for model in built_in_models:
+        if not model.providers:
+            continue
+        for provider in model.providers:
             if provider.name == provider_enum and provider.model_id == model_id:
                 return provider
     return None
🤖 Prompt for AI Agents
In @libs/core/kiln_ai/adapters/provider_tools.py around lines 117 - 136, The
helper built_in_provider_from_model_id can raise when a model's providers is
None; update the function to guard against missing/None providers on each model
(e.g., skip models with falsy or absent model.providers) before iterating so the
inner loop only runs when model.providers is a populated iterable; keep the rest
of the logic that compares provider.name to provider_enum and provider.model_id
to model_id unchanged (references: built_in_provider_from_model_id,
built_in_models, model.providers, provider.name, provider.model_id,
ModelProviderName).

Comment on lines 35 to 40
async def run(self, context=None, **kwargs) -> ToolCallResult:
"""Add two numbers and return the result."""
kwargs = AddParams(**kwargs)
kwargs = AddParams(**kwargs) # type: ignore[missing-typed-dict-key]
a = kwargs["a"]
b = kwargs["b"]
return ToolCallResult(output=str(a + b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's check the file exists and read the relevant sections
cat -n libs/core/kiln_ai/tools/built_in_tools/math_tools.py | head -160

Repository: Kiln-AI/Kiln

Length of output: 168


Replace TypedDict instantiation with cast() to avoid runtime failure.

The current code attempts to instantiate AddParams(**kwargs), which fails at runtime because TypedDicts cannot be instantiated. The # type: ignore comment only suppresses the type checker warning but does not prevent the actual runtime error.

Replace with params = cast(AddParams, kwargs) to provide type safety while avoiding instantiation. This change applies to all four tool classes (AddTool, SubtractTool, MultiplyTool, DivideTool).

Proposed diff
-from typing import TypedDict, Union
+from typing import TypedDict, Union, cast
@@
 class AddTool(KilnTool):
@@
     async def run(self, context=None, **kwargs) -> ToolCallResult:
         """Add two numbers and return the result."""
-        kwargs = AddParams(**kwargs)  # type: ignore[missing-typed-dict-key]
-        a = kwargs["a"]
-        b = kwargs["b"]
+        params = cast(AddParams, kwargs)
+        a = params["a"]
+        b = params["b"]
         return ToolCallResult(output=str(a + b))
@@
 class SubtractTool(KilnTool):
@@
     async def run(self, context=None, **kwargs) -> ToolCallResult:
         """Subtract b from a and return the result."""
-        kwargs = SubtractParams(**kwargs)  # type: ignore[missing-typed-dict-key]
-        a = kwargs["a"]
-        b = kwargs["b"]
+        params = cast(SubtractParams, kwargs)
+        a = params["a"]
+        b = params["b"]
         return ToolCallResult(output=str(a - b))
@@
 class MultiplyTool(KilnTool):
@@
     async def run(self, context=None, **kwargs) -> ToolCallResult:
         """Multiply two numbers and return the result."""
-        kwargs = MultiplyParams(**kwargs)  # type: ignore[missing-typed-dict-key]
-        a = kwargs["a"]
-        b = kwargs["b"]
+        params = cast(MultiplyParams, kwargs)
+        a = params["a"]
+        b = params["b"]
         return ToolCallResult(output=str(a * b))
@@
 class DivideTool(KilnTool):
@@
     async def run(self, context=None, **kwargs) -> ToolCallResult:
         """Divide a by b and return the result."""
-        kwargs = DivideParams(**kwargs)  # type: ignore[missing-typed-dict-key]
-        a = kwargs["a"]
-        b = kwargs["b"]
+        params = cast(DivideParams, kwargs)
+        a = params["a"]
+        b = params["b"]
         if b == 0:
             raise ZeroDivisionError("Cannot divide by zero")
         return ToolCallResult(output=str(a / b))
🤖 Prompt for AI Agents
In @libs/core/kiln_ai/tools/built_in_tools/math_tools.py around lines 35 - 40,
The runtime failure is caused by attempting to instantiate TypedDicts (e.g.,
AddParams) in the run methods; replace lines like kwargs = AddParams(**kwargs)
with params = cast(AddParams, kwargs) (import cast from typing) and then use
params["a"]/params["b"] (or params.get) in AddTool.run, SubtractTool.run,
MultiplyTool.run, and DivideTool.run to avoid instantiating TypedDicts at
runtime; update all four tool classes accordingly and ensure the typing.cast
import is added if missing.

Comment on lines 1 to 95
{
"name": "project",
"dependencies": {
"tessl/pypi-pillow": {
"version": "11.3.0"
},
"tessl/pypi-pyinstaller": {
"version": "6.15.0"
},
"tessl/pypi-certifi": {
"version": "2024.12.0"
},
"tessl/pypi-kiln-ai": {
"version": "0.22.1"
},
"tessl/pypi-pyright": {
"version": "1.1.0"
},
"tessl/pypi-pytest": {
"version": "8.4.0"
},
"tessl/pypi-pytest-xdist": {
"version": "3.8.0"
},
"tessl/pypi-python-dotenv": {
"version": "1.1.0"
},
"tessl/pypi-watchfiles": {
"version": "1.1.0"
},
"tessl/npm-floating-ui--dom": {
"version": "1.7.0"
},
"tessl/npm-sveltejs--kit": {
"version": "2.37.0"
},
"tessl/npm-sveltejs--vite-plugin-svelte": {
"version": "6.1.0"
},
"tessl/npm-tailwindcss--typography": {
"version": "0.5.0"
},
"tessl/npm-typescript-eslint--eslint-plugin": {
"version": "8.42.0"
},
"tessl/npm-typescript-eslint--parser": {
"version": "8.42.0"
},
"tessl/npm-daisyui": {
"version": "4.12.0"
},
"tessl/npm-echarts": {
"version": "6.0.0"
},
"tessl/npm-eslint-config-prettier": {
"version": "10.1.0"
},
"tessl/npm-eslint-plugin-svelte": {
"version": "3.12.0"
},
"tessl/npm-eslint": {
"version": "9.34.0"
},
"tessl/npm-highlight.js": {
"version": "11.11.0"
},
"tessl/npm-openapi-typescript": {
"version": "7.9.0"
},
"tessl/npm-postcss": {
"version": "8.5.0"
},
"tessl/npm-prettier": {
"version": "3.6.0"
},
"tessl/npm-svelte-check": {
"version": "3.8.0"
},
"tessl/npm-svelte": {
"version": "4.2.0"
},
"tessl/npm-tailwindcss": {
"version": "3.4.0"
},
"tessl/npm-typescript": {
"version": "5.9.0"
},
"tessl/npm-vite": {
"version": "7.1.0"
},
"tessl/npm-vitest": {
"version": "4.0.0"
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify PyPI versions exist (maps tessl/pypi-<name> -> <name>)
python - <<'PY'
import json, urllib.request, sys
data=json.load(open("tessl.json"))
deps=data["dependencies"]
pypi=[k.split("tessl/pypi-",1)[1] for k in deps if k.startswith("tessl/pypi-")]
bad=[]
for name in pypi:
    ver=deps[f"tessl/pypi-{name}"]["version"]
    url=f"https://pypi.org/pypi/{name}/json"
    try:
        with urllib.request.urlopen(url) as r:
            j=json.load(r)
        if ver not in j.get("releases", {}):
            bad.append((name, ver))
    except Exception as e:
        bad.append((name, ver, f"(error: {e})"))
if bad:
    print("Missing or errored PyPI versions:")
    for item in bad: print(f"  - {item[0]}=={item[1]} {item[2] if len(item) > 2 else ''}")
    sys.exit(1)
print("PyPI versions: OK")
PY

# Verify npm versions exist (best-effort mapping for common scoped packages)
python - <<'PY'
import json, urllib.request, sys

data=json.load(open("tessl.json"))
deps=data["dependencies"]

mapping = {
  "tessl/npm-floating-ui--dom": "@floating-ui/dom",
  "tessl/npm-sveltejs--kit": "@sveltejs/kit",
  "tessl/npm-sveltejs--vite-plugin-svelte": "@sveltejs/vite-plugin-svelte",
  "tessl/npm-tailwindcss--typography": "@tailwindcss/typography",
  "tessl/npm-typescript-eslint--eslint-plugin": "@typescript-eslint/eslint-plugin",
  "tessl/npm-typescript-eslint--parser": "@typescript-eslint/parser",
}

def npm_name(key: str) -> str:
    if key in mapping: return mapping[key]
    return key.split("tessl/npm-",1)[1].replace("--","/")

npm=[k for k in deps if k.startswith("tessl/npm-")]
bad=[]
for key in npm:
    name=npm_name(key)
    ver=deps[key]["version"]
    url=f"https://registry.npmjs.org/{name}"
    req=urllib.request.Request(url, headers={"Accept":"application/vnd.npm.install-v1+json"})
    try:
        with urllib.request.urlopen(req) as r:
            j=json.load(r)
        if ver not in j.get("versions", {}):
            bad.append((name, ver, key))
    except Exception as e:
        bad.append((name, ver, key, f"(error: {e})"))
if bad:
    print("Missing or errored npm versions:")
    for item in bad: print(f"  - {item[0]}@{item[1]} from {item[2]} {item[3] if len(item) > 3 else ''}")
    sys.exit(1)
print("npm versions: OK")
PY

Repository: Kiln-AI/Kiln

Length of output: 1516


🏁 Script executed:

#!/bin/bash

# Check for related lockfiles and generation scripts
echo "=== Checking for related files ==="
fd -t f "(package-lock|requirements|pyproject|poetry)" . --max-depth 3 2>/dev/null | head -20

echo ""
echo "=== Checking tessl.json in git ==="
git log --oneline -- tessl.json 2>/dev/null | head -10

echo ""
echo "=== Checking for generation/update scripts ==="
rg -l "tessl\.json|tessl" . --type sh --type py --max-count 20 2>/dev/null | head -10

echo ""
echo "=== Checking tessl.json comments/context ==="
head -20 tessl.json

Repository: Kiln-AI/Kiln

Length of output: 828


🌐 Web query:

pillow==11.3.0 release date pypi

💡 Result:

Pillow 11.3.0 was published to PyPI on July 1, 2025 (with some wheel files uploaded July 3, 2025). [1]

Sources
[1] PyPI — Pillow release files and upload dates.


🌐 Web query:

pytest==8.4.0 release pypi

💡 Result:

pytest 8.4.0 was released to PyPI on June 2, 2025. [1][2]

Sources:
[1] pytest changelog (pytest 8.4.0 — 2025-06-02).
[2] pytest GitHub releases — tag 8.4.0 (2025-06-02).


🌐 Web query:

typescript==5.9.0 npm latest version

💡 Result:

No — TypeScript 5.9.0 is not the latest. The current stable release is v5.9.2 (TypeScript 5.9 series). [1][2]

Sources:

  • GitHub releases (TypeScript 5.9 — v5.9.2). [1]
  • npm package page (TypeScript versions listing). [2]

🏁 Script executed:

#!/bin/bash

# Look for README or docs mentioning tessl
echo "=== Files mentioning tessl ==="
rg -l "tessl" --type md --type txt 2>/dev/null | head -10

# Check all references to tessl
echo ""
echo "=== All mentions of 'tessl' in repo ==="
rg "tessl" --type md --type txt -C 2 2>/dev/null | head -50

Repository: Kiln-AI/Kiln

Length of output: 325


🌐 Web query:

pyright version 1.1.0 pypi release

💡 Result:

Short answer: There is no official Microsoft "pyright" 1.1.0 release on PyPI. Microsoft distributes Pyright via npm (current npm version shown on npmjs.com), while PyPI hosts third‑party wrappers/forks (some with 1.1.0 or other 1.1.x tags). See npm (official) and PyPI examples: [1][2].

Sources:

  • npm (official Pyright): pyright version and publish info. [1]
  • PyPI (example wrapper/fork listings and package pages showing PyPI packages named "pyright" / related projects). [2][3]

🌐 Web query:

watchfiles 1.1.0 pypi release

💡 Result:

watchfiles 1.1.0 was released on PyPI on June 15, 2025 — install with pip install watchfiles==1.1.0. [1][2]

Sources:
[1] PyPI release history for watchfiles (shows 1.1.0 — Jun 15, 2025).
[2] Libraries.io / PyPI package page for watchfiles (lists v1.1.0).


🌐 Web query:

kiln-ai 0.22.1 pypi package

💡 Result:

kiln-ai 0.22.1 is available on PyPI (released Oct 7, 2025). Install with:

pip install kiln-ai==0.22.1

Source: PyPI release history. [1]

References
[1] PyPI — kiln-ai release history (shows 0.22.1 released Oct 7, 2025).


Verify and correct the pyright package specification; most versions exist but one is problematic.

The pinned versions for pillow, pytest, watchfiles, and kiln-ai are valid and exist in their registries. However, tessl/pypi-pyright at version 1.1.0 references a non-official PyPI package — Microsoft distributes Pyright via npm, not PyPI. The PyPI "pyright" package appears to be a third-party wrapper or fork. Verify that 1.1.0 is the correct package and version, or update to the official npm package if applicable.

Since tessl.json is managed by the external "tessl" tool (as indicated by the "tessl-managed" marker in AGENTS.md), ensure the tool is configured correctly to generate valid pinned versions and document the regeneration process to prevent future drift.

🤖 Prompt for AI Agents
In @tessl.json around lines 1 - 95, The pyright entry is incorrect:
"tessl/pypi-pyright"@1.1.0 points to a non-official PyPI package; replace it
with the proper distribution or registry and update tessl config. Specifically,
remove or correct the "tessl/pypi-pyright" dependency in tessl.json and either
(a) switch to the official npm package (e.g., add "tessl/npm-pyright" or the
correct npm identifier) or (b) verify that the intended PyPI package name and
version exist and update the version to a valid one; then regenerate pins via
the tessl tool and document the regeneration step so tessl-managed files remain
consistent.

@scosman scosman merged commit 811194a into remote_config Jan 12, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants