-
Notifications
You must be signed in to change notification settings - Fork 334
Update ML Model List #921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ML Model List #921
Conversation
…der-gets-wiped-after-app-restart
Eval with new fine-tune data model
…n-in-new-task. People are confusing description and prompt. Don't need it.
…_create Remove Task Description field from create flow
Fewer builds: we don't need to be running builds on every push.
Add tessl to Kiln repo
Add hooks-mcp to .cursor/mcp.json
Fix: don't save test files into root. Use tmp
Mike Wants a Chart
* add gemini flash + nemotron 3 * adding gemini 3 flash and pro to vertex. and adding nemotron 3 to ollama * adding alias for nemotron 3
WalkthroughIncludes finetune-backed run-config handling in eval APIs, adds radar-chart and no-data UI for eval comparisons, implements nested subtopic generation UI and exported modal actions, updates model catalog and adapter prompt-builder injection, and replaces Pyright with Ty in CI/tooling plus MCP/tessl config additions. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📊 Coverage ReportOverall Coverage: 91% Diff: origin/remote_config...HEAD
Summary
Line-by-lineView line-by-line diff coverageapp/desktop/studio_server/eval_api.pyLines 497-505 497 # Load the list of run configs to use. Two options:
498 run_configs: list[TaskRunConfig] = []
499 if all_run_configs:
500 # special case, we cannot directly lod task.run_configs(), we need to also get all finetune run configs which lives inside the finetune model
! 501 run_configs = get_all_run_configs(project_id, task_id)
502 else:
503 if len(run_config_ids) == 0:
504 raise HTTPException(
505 status_code=400,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte (1)
48-51: Consider conditionally excludingdescriptionwhen creating.Both
warn_before_unload(line 50) andhas_edits()(line 171) checktask.descriptioneven though the field is now hidden during creation. While this doesn't break functionality (description will be empty), you could improve consistency by conditionally excluding the description check whencreatingis true.Optional refinement
For
warn_before_unload(around line 50):$: warn_before_unload = !saved && - ([task.name, task.description, task.instruction].some((value) => !!value) || + ([task.name, ...(editing ? [task.description] : []), task.instruction].some((value) => !!value) || task.requirements.some((req) => !!req.name || !!req.instruction))For
has_edits()(around line 171):return ( !!task.name || - !!task.description || + (editing && !!task.description) || !!task.instruction || !!task.thinking_instruction ||Also applies to: 165-177
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (1)
735-790: Dialog UI looks good with one optional enhancement.The nested topics dialog is well-structured with proper loading and error states. The conditional rendering of RunConfigComponent based on task availability is defensive.
Optional: Add disabled state to button during operation
While the loading spinner at line 740 prevents the button from being visible during operation, adding a disabled state provides extra protection against race conditions:
<button class="btn mt-2 btn-primary" + disabled={adding_nested_topics || !guidance_data.task} on:click={add_nested_topics_to_all_leaf_topics} > Generate Subtopics </button>This prevents rapid double-clicks before the loading state renders.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte (3)
14-31: Type and prop definitions are well-structured.The
ComparisonFeaturetype is duplicated from the parent page. Consider extracting this to a shared types file if it's used in multiple places, but for now this is acceptable.
36-71: Data key filtering and helper functions are correct.Excluding the cost section (
kiln_cost_section) from radar chart is appropriate since cost metrics have vastly different scales than evaluation scores.Note:
getRunConfigDisplayNameandbuildLegendFormatterare duplicated fromcompare_chart.svelte. Consider extracting these to a shared utility if the pattern continues to expand.
217-226: Reactive dependency tracking works but is slightly obscure.The pattern
(model_info || model_info === null)always evaluates to true—it's a technique to ensure Svelte tracksmodel_infoas a dependency. A simpler approach would be to reference the values directly in the block body or use an explicit dependency array pattern.Alternative: Clearer dependency tracking
- $: if ( - chartInstance && - comparisonFeatures && - selectedRunConfigIds && - (model_info || model_info === null) && - (prompts || prompts === null) - ) { + $: if (chartInstance && comparisonFeatures && selectedRunConfigIds) { + // Track async-loaded dependencies + void model_info + void prompts updateChart() }libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (1)
5-10: Consider input validation for ID components.The function correctly builds the finetune run config ID, but doesn't validate that
project_id,task_id, orfinetune_iddon't contain the::separator. While this is consistent with similar functions in the codebase (e.g.,parse_custom_model_id), it could lead to ambiguous IDs if the inputs contain::.🔎 Optional validation to prevent separator in ID components
def finetune_run_config_id(project_id: str, task_id: str, finetune_id: str) -> str: """ Build in-memory ID for run-config inside a finetune. Format: finetune_run_config::project_id::task_id::finetune_id project_id::task_id::finetune_id is Finetune.model_id() """ + for component, name in [(project_id, "project_id"), (task_id, "task_id"), (finetune_id, "finetune_id")]: + if "::" in component: + raise ValueError(f"{name} cannot contain '::' separator: {component}") return f"finetune_run_config::{project_id}::{task_id}::{finetune_id}"libs/core/kiln_ai/adapters/provider_tools.py (1)
117-137: LGTM! Correct lookup implementation.The function correctly looks up a built-in provider by model ID and provider name, with proper enum handling and graceful None returns for not-found cases. The linear search is acceptable for the current scale of built-in models.
🔎 Optional: Performance optimization for frequent lookups
If this function is called frequently (e.g., in hot paths or loops), consider adding a module-level cache indexed by (provider_name, model_id):
# At module level _provider_by_id_cache: dict[tuple[ModelProviderName, str], KilnModelProvider] = {} def _build_provider_cache(): """Build lookup cache on first use.""" if _provider_by_id_cache: return for model in built_in_models: for provider in model.providers: if provider.model_id: _provider_by_id_cache[(provider.name, provider.model_id)] = provider def built_in_provider_from_model_id( model_id: str, provider_name: ModelProviderName | str ) -> KilnModelProvider | None: """Look up a built-in provider entry by the provider and its provider-specific model ID.""" try: provider_enum = ( provider_name if isinstance(provider_name, ModelProviderName) else ModelProviderName(provider_name) ) except ValueError: return None _build_provider_cache() return _provider_by_id_cache.get((provider_enum, model_id))app/desktop/studio_server/eval_api.py (1)
99-114: Consider adding an explicit 404 for finetunes without run_config.When a finetune is found but has no
run_configset (line 102 check fails), the code falls through to the generic 404 at line 111-113. While functional, this could be confusing for debugging since the finetune exists but just lacks a run_config.Suggested improvement for clarity
# special case for finetune run configs, it's inside the finetune model if run_config_id.startswith("finetune_run_config::"): finetune = finetune_from_finetune_run_config_id(run_config_id) if finetune.run_config is not None: return TaskRunConfig( id=finetune_run_config_id(project_id, task_id, str(finetune.id)), name=finetune.name, description=finetune.description, run_config_properties=finetune.run_config, parent=task, # special case, we need to reference the task model ) + else: + raise HTTPException( + status_code=404, + detail=f"Finetune {finetune.id} exists but has no run_config set.", + ) raise HTTPException( status_code=404, detail=f"Task run config not found. ID: {run_config_id}", )
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
app/web_ui/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (30)
.cursor/mcp.json(1 hunks).cursor/rules/.gitignore(1 hunks).github/workflows/build_desktop.yml(1 hunks).tessl/.gitignore(1 hunks)AGENTS.md(1 hunks)app/desktop/studio_server/eval_api.py(6 hunks)app/desktop/studio_server/test_eval_api.py(9 hunks)app/web_ui/src/lib/api_schema.d.ts(0 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte(5 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte(1 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte(9 hunks)app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte(1 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte(6 hunks)app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte(1 hunks)app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte(1 hunks)app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte(1 hunks)libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py(1 hunks)libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py(1 hunks)libs/core/kiln_ai/adapters/ml_model_list.py(3 hunks)libs/core/kiln_ai/adapters/model_adapters/base_adapter.py(3 hunks)libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py(2 hunks)libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py(2 hunks)libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py(1 hunks)libs/core/kiln_ai/adapters/provider_tools.py(3 hunks)libs/core/kiln_ai/adapters/test_prompt_builders.py(2 hunks)libs/core/kiln_ai/adapters/test_provider_tools.py(11 hunks)libs/core/kiln_ai/cli/commands/test_package_project.py(2 hunks)libs/core/kiln_ai/utils/config.py(2 hunks)libs/core/kiln_ai/utils/test_config.py(1 hunks)tessl.json(1 hunks)
💤 Files with no reviewable changes (1)
- app/web_ui/src/lib/api_schema.d.ts
🧰 Additional context used
📓 Path-based instructions (9)
app/web_ui/**/*.svelte
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
app/web_ui/**/*.svelte: Use Svelte v4 (not v5) for the web frontend UI
Use DaisyUI for UI components in the web frontendUse Svelte v4, not v5
Files:
app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelteapp/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte
app/web_ui/**/*.{svelte,ts}
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
Use Tailwind CSS for styling in the web frontend
Files:
app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelteapp/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte
app/web_ui/**/*.{svelte,ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use Tailwind CSS and DaisyUI for styling the frontend
Files:
app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelteapp/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte
libs/core/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
Use Python 3.10+ for the core library (libs/core) and Python 3.13 for the desktop app
Use Python 3.10+ for the core library (libs/core)
Files:
libs/core/kiln_ai/adapters/test_prompt_builders.pylibs/core/kiln_ai/utils/test_config.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/adapters/test_provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_base_adapter.pylibs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.pylibs/core/kiln_ai/adapters/provider_tools.pylibs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.pylibs/core/kiln_ai/utils/config.pylibs/core/kiln_ai/cli/commands/test_package_project.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/base_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
**/*test*.py
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
Use pytest for testing Python code
Files:
libs/core/kiln_ai/adapters/test_prompt_builders.pylibs/core/kiln_ai/utils/test_config.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/adapters/test_provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_base_adapter.pyapp/desktop/studio_server/test_eval_api.pylibs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.pylibs/core/kiln_ai/cli/commands/test_package_project.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation
**/*.py: Use Pydantic v2, not v1
Use asyncio for asynchronous operations in Python
Files:
libs/core/kiln_ai/adapters/test_prompt_builders.pylibs/core/kiln_ai/utils/test_config.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/adapters/test_provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_base_adapter.pyapp/desktop/studio_server/test_eval_api.pylibs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.pylibs/core/kiln_ai/adapters/provider_tools.pylibs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.pylibs/core/kiln_ai/utils/config.pyapp/desktop/studio_server/eval_api.pylibs/core/kiln_ai/cli/commands/test_package_project.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/base_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use pytest for testing in Python
Files:
libs/core/kiln_ai/adapters/test_prompt_builders.pylibs/core/kiln_ai/utils/test_config.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/adapters/test_provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_base_adapter.pyapp/desktop/studio_server/test_eval_api.pylibs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.pylibs/core/kiln_ai/cli/commands/test_package_project.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
app/desktop/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use Python 3.13 for the desktop application (app/desktop)
Files:
app/desktop/studio_server/test_eval_api.pyapp/desktop/studio_server/eval_api.py
{libs/server,app/desktop/studio_server}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI for REST server implementation
Files:
app/desktop/studio_server/test_eval_api.pyapp/desktop/studio_server/eval_api.py
🧠 Learnings (16)
📓 Common learnings
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
📚 Learning: 2025-12-15T22:02:37.369Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-15T22:02:37.369Z
Learning: Follow the instructions in .tessl/RULES.md
Applied to files:
.cursor/rules/.gitignoreAGENTS.md.tessl/.gitignore
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
Applied to files:
libs/core/kiln_ai/adapters/test_prompt_builders.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/adapters/test_provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_base_adapter.pylibs/core/kiln_ai/adapters/provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/base_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-12-06T07:53:22.155Z
Learnt from: chiang-daniel
Repo: Kiln-AI/Kiln PR: 781
File: app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte:200-222
Timestamp: 2025-12-06T07:53:22.155Z
Learning: In app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte, the process_model_change function intentionally overrides selected_run_config_id when a model has a model_specific_run_config property. This is by design to ensure fine-tuned models automatically use their associated run configuration (the configuration they were trained with), even if triggered by selecting a saved run config that uses that model.
Applied to files:
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
📚 Learning: 2025-11-24T19:13:16.966Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: .cursor/rules/project.mdc:0-0
Timestamp: 2025-11-24T19:13:16.966Z
Learning: Applies to app/web_ui/**/*.svelte : Use DaisyUI for UI components in the web frontend
Applied to files:
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelteapp/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
📚 Learning: 2025-10-22T04:31:20.500Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 739
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte:170-177
Timestamp: 2025-10-22T04:31:20.500Z
Learning: In Svelte components for the Kiln project, leonardmq prefers tracking previous query parameter values and comparing them in reactive blocks (rather than just checking for existence) to avoid duplicate execution on initial load while still reacting to subsequent navigation changes. This pattern supports both current behavior and future feature additions like mode switching.
Applied to files:
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte
📚 Learning: 2025-09-25T07:20:11.459Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 650
File: app/web_ui/src/routes/(app)/docs/library/[project_id]/[document_id]/+page.svelte:19-19
Timestamp: 2025-09-25T07:20:11.459Z
Learning: When analyzing relative import paths in SvelteKit route structures, be more careful to verify the actual directory structure exists before suggesting corrections. The import path ../../../../generate/[project_id]/[task_id]/table_button.svelte from app/web_ui/src/routes/(app)/docs/library/[project_id]/[document_id]/+page.svelte correctly resolves to the existing file at app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/table_button.svelte.
Applied to files:
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelteapp/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
📚 Learning: 2025-10-21T00:23:00.770Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 737
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generate_samples_modal.svelte:287-301
Timestamp: 2025-10-21T00:23:00.770Z
Learning: In Kiln AI, for topic/inputs generation flows (such as in generate_samples_modal.svelte and generated_data_node.svelte), the task input_json_schema is always set. Therefore, hardcoding requires_structured_output={true} in RunConfigComponent for these flows is correct and does not need to be conditional like it is for output generation.
Applied to files:
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte
📚 Learning: 2025-08-08T16:19:20.074Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1736-1741
Timestamp: 2025-08-08T16:19:20.074Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwq_32b with ModelProviderName.siliconflow_cn (model_id "Qwen/QwQ-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via tests/responses that SiliconFlow returns outputs that parse correctly without an R1 parser. Parser usage remains provider-specific and should only be added when validated.
Applied to files:
libs/core/kiln_ai/adapters/test_provider_tools.pylibs/core/kiln_ai/adapters/provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/base_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.
Applied to files:
libs/core/kiln_ai/adapters/test_provider_tools.pylibs/core/kiln_ai/adapters/provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:14:54.346Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1979-1983
Timestamp: 2025-08-08T16:14:54.346Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py, for ModelName.deepseek_r1_distill_qwen_32b on provider siliconflow_cn (model_id "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via testing that SiliconFlow returns final outputs that parse without an R1 parser, and reasoning may be omitted under structured output; rely on reasoning_optional_for_structured_output=True.
Applied to files:
libs/core/kiln_ai/adapters/test_provider_tools.pylibs/core/kiln_ai/adapters/provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/base_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-10-24T05:01:15.465Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 736
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/qna/qna_ui_store.ts:552-585
Timestamp: 2025-10-24T05:01:15.465Z
Learning: In the Q&A generation workflow (app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/qna/qna_ui_store.ts), chunking is intentionally only ensured for target="all" generation. When users generate Q&A for a specific document or part, they do not alter the chunking strategy; the UI only surfaces document/part-level generation actions after the user has already run the global chunking flow (or chosen to use full documents without chunking). This is by design to separate the chunking decision from per-item generation.
<!-- [/add_learning]
Applied to files:
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte
📚 Learning: 2025-08-08T16:15:20.796Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:2434-2439
Timestamp: 2025-08-08T16:15:20.796Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwen_3_8b_no_thinking with ModelProviderName.siliconflow_cn (model_id "Qwen/Qwen3-8B"), the qwen3_style_no_think formatter is not required: SiliconFlow’s non‑thinking Qwen3‑8B returns clean output without <answer> wrappers. Formatter/parser usage is provider-specific and should be added only when verified by tests/responses.
Applied to files:
libs/core/kiln_ai/adapters/provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-22T11:15:53.584Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/adapters/embedding/embedding_registry.py:14-20
Timestamp: 2025-08-22T11:15:53.584Z
Learning: In the Kiln AI codebase, model_provider_name fields in datamodels like EmbeddingConfig are typed as ModelProviderName enum (which inherits from str, Enum), not as plain strings. Therefore, accessing .value on these fields is valid and returns the string representation of the enum value.
Applied to files:
libs/core/kiln_ai/adapters/provider_tools.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-10-28T05:25:09.283Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 757
File: libs/core/kiln_ai/adapters/reranker_list.py:39-42
Timestamp: 2025-10-28T05:25:09.283Z
Learning: In the Kiln codebase (Python), model family and model name fields in reranker and similar model definition classes (like KilnRerankerModel in libs/core/kiln_ai/adapters/reranker_list.py) should use plain str types rather than Enum types. This is because users receive over-the-air (OTA) updates for new models, and using Enums would break when clients with older builds encounter new model families their Enum definitions don't include. Strings provide forward compatibility for the OTA update mechanism.
Applied to files:
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-09-10T08:06:53.717Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 585
File: libs/core/kiln_ai/adapters/ml_embedding_model_list.py:32-33
Timestamp: 2025-09-10T08:06:53.717Z
Learning: In the Kiln AI codebase, friendly_name fields in embedding model configurations should maintain the original model naming conventions from providers (e.g., "gemini-embedding-001", "text-embedding-3-small") for human readability, rather than following snake_case conventions like the enum values.
Applied to files:
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/ml_model_list.py
🧬 Code graph analysis (9)
libs/core/kiln_ai/adapters/test_provider_tools.py (2)
libs/core/kiln_ai/adapters/provider_tools.py (2)
built_in_provider_from_model_id(117-136)parser_from_finetune(259-276)libs/core/kiln_ai/adapters/ml_model_list.py (1)
ModelParserID(216-222)
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py (3)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (2)
AdapterConfig(36-52)model_provider(87-102)libs/core/kiln_ai/adapters/run_output.py (1)
RunOutput(10-14)libs/core/kiln_ai/datamodel/run_config.py (1)
RunConfigProperties(24-62)
app/desktop/studio_server/test_eval_api.py (2)
app/desktop/studio_server/eval_api.py (2)
get_all_run_configs(66-88)task_run_config_from_id(91-114)libs/core/kiln_ai/datamodel/finetune.py (1)
Finetune(23-127)
libs/core/kiln_ai/adapters/provider_tools.py (2)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
ModelProviderName(84-106)libs/core/kiln_ai/adapters/ml_model_list.py (2)
KilnModelProvider(233-305)ModelParserID(216-222)
libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py (1)
libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (2)
finetune_run_config_id(5-10)finetune_from_finetune_run_config_id(13-23)
libs/core/kiln_ai/cli/commands/test_package_project.py (2)
libs/core/kiln_ai/cli/commands/package_project.py (1)
package_project(587-729)libs/core/kiln_ai/datamodel/project.py (1)
tasks(42-43)
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
ModelProviderName(84-106)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)
config(47-58)
libs/core/kiln_ai/adapters/ml_model_list.py (1)
libs/core/kiln_ai/datamodel/datamodel_enums.py (3)
ModelProviderName(84-106)StructuredOutputMode(27-49)KilnMimeType(109-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Generate Coverage Report
🔇 Additional comments (58)
libs/core/kiln_ai/cli/commands/test_package_project.py (1)
653-709: LGTM! Excellent test isolation improvement.The changes properly utilize pytest's
tmp_pathfixture to ensure each test writes to its own isolated temporary directory. This eliminates potential conflicts from hard-coded paths and follows pytest best practices. The updates are consistent with existing patterns in other test classes within this file..cursor/mcp.json (1)
8-12: No action needed. TheHooksMCPconfiguration is valid without an explicittypefield. Cursor MCP servers using STDIO transport require only "command" and "args" fields, and stdio is the default transport for command-based servers. The explicit"type": "stdio"on thetesslserver is optional; both configurations are equivalent and correct.Likely an incorrect or invalid review comment.
.github/workflows/build_desktop.yml (2)
4-6: LGTM! Good additions for workflow flexibility.Adding
workflow_dispatchfor manual triggering and thereleasetrigger for automated builds on release creation are both best practices for desktop application CI/CD workflows.
7-9: Verify branch restriction aligns with remote_config workflow needs.The push trigger is now restricted to the
mainbranch only. After merging this intoremote_config, the workflow file on that branch will only trigger builds when pushing tomain, not when pushing toremote_config.If
remote_configis a long-lived branch where desktop builds are needed independently, consider adding it to the branches list. If builds should only run frommain, then this restriction is appropriate.app/web_ui/src/routes/(fullscreen)/setup/(setup)/create_task/edit_task.svelte (1)
284-294: LGTM! Correctly hides Task Description during creation.The conditional rendering logic properly restricts the Task Description field to edit mode only, aligning with the stated objective to simplify the creation UI.
app/web_ui/src/routes/(fullscreen)/setup/(setup)/connect_providers/connect_providers.svelte (1)
133-133: No issues identified. Google and Partner models and generative AI features on Vertex AI are exposed as specific regional endpoints and a global endpoint, and location='global' is valid for the Google Gen AI SDK and Vertex AI SDK for Python. Additionally, all Gemini 3 and Gemini 2.5 preview models released after June 2025 are only available in the global location, which confirms that global location provides access to the newest Gemini models. The instructional text accurately recommends using "global" as a valid Google Cloud location for Vertex AI Gemini models.libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py (1)
5-6: LGTM! Clean import consolidation.The consolidation of imports improves readability without any functional changes.
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py (2)
6-11: LGTM! Import additions support new test.The additional imports for
AdapterConfigandBasePromptBuilderare necessary for testing the custom prompt builder injection feature.
629-659: LGTM! Well-structured test for custom prompt builder injection.The test effectively validates that:
- Custom prompt builders can be injected via
AdapterConfig- The adapter uses the injected builder instead of the default
- The custom builder's output appears in the formatter
The test follows pytest conventions and properly mocks dependencies.
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (5)
15-15: LGTM! Import addition supports new functionality.The
BasePromptBuilderimport is necessary for the type annotation and prompt builder injection feature.
47-52: LGTM! Well-documented configuration option.The
prompt_builderfield addition toAdapterConfigis well-designed:
- Properly typed as optional
- Clear documentation explains the injection mechanism and fallback behavior
- Maintains backward compatibility
61-65: LGTM! Clear documentation of prompt building behavior.The docstring update effectively documents the prompt building mechanism and how users can customize it via
AdapterConfig.
71-71: LGTM! Backward-compatible signature update.The optional
configparameter maintains backward compatibility while enabling the new customization feature.
76-81: LGTM! Clean implementation of prompt builder injection.The initialization logic correctly implements the precedence:
- Use custom prompt builder from config if provided
- Otherwise, load from
run_config.prompt_idThis maintains backward compatibility while enabling the new customization point.
libs/core/kiln_ai/adapters/test_prompt_builders.py (2)
38-38: LGTM: Import addition supports type annotation update.The
InputTypeimport is necessary for the updated method signature below.
80-80: The BaseAdapter interface correctly uses InputType, and the MockAdapter._run method signature at line 80 properly matches this pattern. InputType is a union type encompassing both structured types (Dict/List) and strings, making the signature change valid and type-safe.app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte (1)
821-856: LGTM! Clean UI enhancement for nested topics workflow.The conditional rendering based on topic presence is well-structured, and the button styling follows DaisyUI patterns appropriately. The new "Add Subtopics" button integrates smoothly into the existing step flow.
app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/generated_data_node.svelte (5)
17-17: LGTM! Solid defensive pattern.Separating the RunConfigComponent refs prevents null reference issues when components are conditionally mounted/unmounted. The explanatory comment is helpful.
Also applies to: 26-28
160-160: LGTM! Ref usage is consistent.The updated references correctly use
run_config_component_modalwith proper null handling via optional chaining.Also applies to: 684-684
257-279: LGTM! Clean recursive implementation.The
get_all_leaf_topicsfunction correctly identifies and collects all leaf nodes (topics without subtopics) using proper recursion with path tracking.
281-289: LGTM! State management is clean.The state variables and modal open function are well-structured. Resetting the error state when opening the modal provides good UX.
291-400: LGTM! Robust nested topics generation logic.The function handles the nested topics workflow well:
- Comprehensive validation upfront
- Sequential processing with partial saves after each leaf topic
- Proper error handling with KilnError
- Analytics tracking
Sequential processing (vs. parallel) is a reasonable design choice here since it provides better error tracking and saves progress incrementally.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/chart_no_data.svelte (1)
1-21: Clean no-data state component.The implementation is straightforward and reusable. Tailwind CSS styling is consistent with project conventions.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte (3)
566-579: Well-structured reactive state helpers.The reactive declarations correctly derive loading and data availability states. The type guard
(m): m is stringensures proper type narrowing forvalidSelectedModels.
921-934: Radar chart integration looks correct.The
CompareRadarChartis properly wired with the selected run configurations viavalidSelectedModels, while the existingCompareChartcontinues to show all run configurations. This distinction makes sense for the different visualization purposes.
602-609: Loading state logic is correct.The compound condition properly guards the loading spinner: it only appears when models are selected, all are still loading, and no data has arrived yet. This prevents unnecessary spinners and flickering.
app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_chart.svelte (5)
73-93: Legend formatting implementation looks good.The multi-line legend format with rich text styling provides useful context. Note that if multiple run configs share the same display name, the legend entries may collide, but this is a pre-existing constraint of the charting approach rather than a new issue.
154-157: Good interactivity enhancement.The
emphasis.scale: 2provides clear visual feedback when hovering over data points, complementing the legend hover interactions.
205-227: Legend and layout updates are well-coordinated.The legend positioning (
left: "70%") with grid adjustment (right: "34%") provides adequate space for multi-line legend entries while keeping the chart readable.
262-301: Legend hover interactions are a nice UX addition.The highlight/downplay actions on legend hover help users correlate legend entries with chart data points. The
hasDataPointsreactive check correctly determines data availability.
367-370: No-data state handling is improved.The condition
axisOptions.length <= 1 || !hasDataPointscorrectly covers both insufficient metrics and missing data scenarios, delegating to the reusableChartNoDatacomponent.app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/compare_radar_chart.svelte (3)
73-131: Radar chart data generation is well-implemented.The 10% padding on max values (
max * 1.1) ensures data points don't touch the radar boundary. The fallback tomax = 1when all values are 0 prevents potential edge cases. Null values are correctly replaced with 0 for radar visualization.
142-215: Chart update logic is comprehensive.The radar chart configuration is well-thought-out with proper positioning to accommodate the legend. Clearing the chart when there's no data prevents stale visualizations.
249-263: Template rendering logic is correct.The
dataKeys.length >= 3guard appropriately hides the radar chart when there aren't enough metrics for meaningful visualization. TheChartNoDatafallback provides consistent UX.libs/core/kiln_ai/adapters/ml_model_list.py (4)
130-130: LGTM: New model enum entries added correctly.The new
gemini_3_flashandnemotron_3_nanoenum members follow the existing naming conventions and are properly integrated with their corresponding model configurations below.Also applies to: 132-132
1496-1507: Configuration is correct and verified.The Vertex provider configuration for Gemini 3 Pro Preview is accurate. The model_id "gemini-3-pro-preview" matches the official Vertex AI preview model naming, and Gemini 3 Pro is the most advanced reasoning Gemini model. The thinking_level="medium" balances speed and reasoning with some reasoning capability while prioritizing low-latency operations, which is appropriate for the model. The configuration of reasoning_capable and gemini_reasoning_enabled are correct for this model's capabilities on Vertex AI.
1508-1577: Configuration verified for Gemini 3 Flash across all three providers.Gemini 3 Flash is accessible via the Gemini API, Vertex AI, and other platforms, confirming availability on all three providers in the code. The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with structured output and configurable reasoning via thinking levels. The thinking_level parameter controls the amount of internal reasoning the model performs (minimal, low, medium, or high), which aligns with the
thinking_level="medium"configuration in the Gemini API and Vertex providers.The model enables complex video analysis, data extraction, and document analysis with near real-time performance, confirming suitability for data generation and evaluation tasks marked in the configuration. All listed multimodal MIME types for images, audio, and video are supported by the model according to documented capabilities.
1971-1991: Configuration is correct as-is.The Nemotron 3 Nano model configuration follows established patterns and is properly verified:
- Model is fully available on OpenRouter with open-weights, datasets, and recipes
- Model is available on Ollama
- Model supports configurable reasoning capabilities, justifying
reasoning_capable=Truefor both providers- No parser configuration required, consistent with learnings that parsers are provider-specific and should only be added when validated
- Model family classification as Llama aligns with the Nemotron product line
.tessl/.gitignore (1)
1-2: LGTM!The ignore patterns for Tessl-related artifacts are appropriate.
.cursor/rules/.gitignore (1)
1-1: LGTM!The ignore pattern correctly prevents generated Tessl files from being committed.
libs/core/kiln_ai/utils/config.py (2)
1-1: LGTM!The
copyimport is correctly added to support the deepcopy operation.
263-263: Good defensive coding with deep copy.Using
copy.deepcopy(v)prevents unintended mutation of the underlying_settingswhen masking sensitive fields in nested structures. This is particularly important for list-based settings likeopenai_compatible_providerswhere the masking logic modifies nested dictionaries.AGENTS.md (1)
45-47: LGTM!The agent rules reference is clear and aligns with the Tessl workflow integration. Based on learnings, following the instructions in
.tessl/RULES.mdis the expected pattern.libs/core/kiln_ai/utils/test_config.py (1)
284-317: Excellent test coverage for non-mutation behavior.This test comprehensively verifies that masking sensitive fields via
settings(hide_sensitive=True)does not mutate the underlying configuration data. The test correctly:
- Sets up providers with sensitive api_key fields
- Confirms masking works ([hidden] appears in masked view)
- Validates original data remains intact and accessible
libs/core/kiln_ai/adapters/fine_tune/finetune_run_config_id.py (1)
13-23: LGTM! Good error handling and validation.The function correctly validates the ID prefix and extracts the model ID using
removeprefix()(Python 3.9+, compatible with the project's 3.10+ requirement). The error message is clear and actionable.libs/core/kiln_ai/adapters/provider_tools.py (2)
259-277: Good prioritization logic for parser selection.The function correctly prioritizes the base model's parser over the data strategy parser, which makes sense for cases like R1 models fine-tuned without thinking data that still output thinking tokens. This aligns with the learnings that parser usage should be based on observed behavior in tests.
Based on learnings, parser selection is provider-specific and should reflect actual model behavior.
296-296: LGTM! Proper integration of enhanced parser logic.The change to use
parser_from_finetune(fine_tune)correctly integrates the new parser priority logic, ensuring that fine-tuned models inherit appropriate parsers from their base models when applicable.libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)
1173-1173: LGTM - Consistent enum usage for provider names.Using
ModelProviderName.openaiinstead of the string literal"openai"improves type safety and aligns with the enum-based provider naming pattern used throughout the codebase.Also applies to: 1243-1243
libs/core/kiln_ai/adapters/fine_tune/test_finetune_run_config_id.py (1)
1-54: Well-structured test coverage for finetune run config ID utilities.The tests appropriately cover:
- ID format construction
- Valid ID parsing with proper mock verification
- Invalid ID handling with parametrized edge cases
libs/core/kiln_ai/adapters/test_provider_tools.py (3)
269-292: Good test coverage for built_in_provider_from_model_id.Tests correctly verify:
- Successful lookup returns the expected provider with correct attributes
- Various invalid lookup scenarios return
None(nonexistent model, wrong provider, invalid provider string)
1222-1275: Comprehensive tests for parser_from_finetune logic.The three tests effectively cover the priority hierarchy:
- Base model parser takes precedence when available
- Falls back to data_strategy parser when model has no parser
- Returns
Nonewhen neither source provides a parser
74-74: Fixture updates align with new parser_from_finetune requirements.Adding
base_model_idto mock finetune fixtures ensures the newparser_from_finetunefunction can look up the base model provider to check for parser information.Also applies to: 87-87, 100-100, 629-629, 965-965
app/desktop/studio_server/test_eval_api.py (4)
552-591: Good test for finetune run config retrieval via task_run_config_from_id.The test properly:
- Creates a Finetune with run_config
- Mocks
finetune_from_finetune_run_config_id- Verifies the returned TaskRunConfig has correct id, name, description, and properties
593-648: Thorough test for get_all_run_configs aggregation logic.The test correctly validates that:
- Regular run configs are included
- Completed finetunes with run_config appear with the proper ID format
- Incomplete finetunes (running, with
fine_tune_model_id=None) are excluded
1990-2078: Comprehensive test covering all finetune status scenarios.This test effectively validates the filtering logic for finetune run configs by testing:
- Completed finetunes with run_config ✓ included
- Running finetunes ✗ excluded (no fine_tune_model_id)
- Unknown status finetunes ✗ excluded
- Failed finetunes ✗ excluded
- Completed without run_config ✗ excluded
737-737: Necessary fixture update to prevent test failures.Adding
mock_task.finetunes.return_value = []ensures the test doesn't fail whenget_all_run_configsattempts to iterate over finetunes.app/desktop/studio_server/eval_api.py (2)
66-88: Well-implemented aggregation of run configs including finetunes.The function correctly:
- Fetches task's native run_configs
- Filters finetunes to only include those with both
run_configandfine_tune_model_id(completed and usable)- Constructs proper finetune_run_config_id format for each included finetune
7-10: Consistent integration of finetune run configs across all endpoints.The imports and updates to
get_run_configs,run_eval_config, andget_eval_config_score_summaryproperly leverageget_all_run_configsto ensure finetune-based run configs are included wherever run configs are aggregated.Also applies to: 361-361, 500-501, 654-655
| { | ||
| "name": "project", | ||
| "dependencies": { | ||
| "tessl/pypi-pillow": { | ||
| "version": "11.3.0" | ||
| }, | ||
| "tessl/pypi-pyinstaller": { | ||
| "version": "6.15.0" | ||
| }, | ||
| "tessl/pypi-certifi": { | ||
| "version": "2024.12.0" | ||
| }, | ||
| "tessl/pypi-kiln-ai": { | ||
| "version": "0.22.1" | ||
| }, | ||
| "tessl/pypi-pyright": { | ||
| "version": "1.1.0" | ||
| }, | ||
| "tessl/pypi-pytest": { | ||
| "version": "8.4.0" | ||
| }, | ||
| "tessl/pypi-pytest-xdist": { | ||
| "version": "3.8.0" | ||
| }, | ||
| "tessl/pypi-python-dotenv": { | ||
| "version": "1.1.0" | ||
| }, | ||
| "tessl/pypi-watchfiles": { | ||
| "version": "1.1.0" | ||
| }, | ||
| "tessl/npm-floating-ui--dom": { | ||
| "version": "1.7.0" | ||
| }, | ||
| "tessl/npm-sveltejs--kit": { | ||
| "version": "2.37.0" | ||
| }, | ||
| "tessl/npm-sveltejs--vite-plugin-svelte": { | ||
| "version": "6.1.0" | ||
| }, | ||
| "tessl/npm-tailwindcss--typography": { | ||
| "version": "0.5.0" | ||
| }, | ||
| "tessl/npm-typescript-eslint--eslint-plugin": { | ||
| "version": "8.42.0" | ||
| }, | ||
| "tessl/npm-typescript-eslint--parser": { | ||
| "version": "8.42.0" | ||
| }, | ||
| "tessl/npm-daisyui": { | ||
| "version": "4.12.0" | ||
| }, | ||
| "tessl/npm-echarts": { | ||
| "version": "6.0.0" | ||
| }, | ||
| "tessl/npm-eslint-config-prettier": { | ||
| "version": "10.1.0" | ||
| }, | ||
| "tessl/npm-eslint-plugin-svelte": { | ||
| "version": "3.12.0" | ||
| }, | ||
| "tessl/npm-eslint": { | ||
| "version": "9.34.0" | ||
| }, | ||
| "tessl/npm-highlight.js": { | ||
| "version": "11.11.0" | ||
| }, | ||
| "tessl/npm-openapi-typescript": { | ||
| "version": "7.9.0" | ||
| }, | ||
| "tessl/npm-postcss": { | ||
| "version": "8.5.0" | ||
| }, | ||
| "tessl/npm-prettier": { | ||
| "version": "3.6.0" | ||
| }, | ||
| "tessl/npm-svelte-check": { | ||
| "version": "3.8.0" | ||
| }, | ||
| "tessl/npm-svelte": { | ||
| "version": "4.2.0" | ||
| }, | ||
| "tessl/npm-tailwindcss": { | ||
| "version": "3.4.0" | ||
| }, | ||
| "tessl/npm-typescript": { | ||
| "version": "5.9.0" | ||
| }, | ||
| "tessl/npm-vite": { | ||
| "version": "7.1.0" | ||
| }, | ||
| "tessl/npm-vitest": { | ||
| "version": "4.0.0" | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple dependencies are outdated and require updates.
The manifest contains several outdated or invalid versions:
- pyright: 1.1.0 is not a valid version; upgrade to 1.1.407
- pillow: 11.3.0 is from July 2025; upgrade to 12.0.0+ (October 2025)
- typescript: 5.9.0 should be updated to 5.9.3
- vite: 7.1.0 should be updated to 7.3.0+ for latest fixes and security patches
Run dependency security audits after updating to ensure no known vulnerabilities exist in the chosen versions.
🤖 Prompt for AI Agents
In tessl.json (lines 1-95) several dependency versions are outdated/invalid:
update "tessl/pypi-pyright" to version "1.1.407", bump "tessl/pypi-pillow" to a
12.0.0+ release (e.g., "12.0.0"), change "tessl/npm-typescript" to "5.9.3", and
update "tessl/npm-vite" to "7.3.0" (or newer 7.3.x); after changing those
entries, run your dependency security/audit tooling and tests to verify
compatibility and resolve any new conflicts.
Ty Typechecking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/desktop/studio_server/tool_api.py (1)
641-647: UseKilnTaskServerPropertiesconstructor for consistency.This endpoint still uses a dict literal while
add_kiln_task_tool(lines 609-615) now uses theKilnTaskServerPropertiesconstructor. For consistency and better static type checking, please use the same pattern here.🔎 Proposed fix for consistency
- existing_tool_server.properties = { - "name": tool_data.name, - "description": tool_data.description, - "task_id": tool_data.task_id, - "run_config_id": tool_data.run_config_id, - "is_archived": tool_data.is_archived, - } + existing_tool_server.properties = KilnTaskServerProperties( + name=tool_data.name, + description=tool_data.description, + task_id=tool_data.task_id, + run_config_id=tool_data.run_config_id, + is_archived=tool_data.is_archived, + )
🧹 Nitpick comments (7)
libs/core/kiln_ai/datamodel/vector_store.py (1)
96-101: Consider usingtyping.cast()for more explicit type narrowing.The
type: ignore[return-value]comments are valid since the runtime checks guarantee the correct type. However, usingcast()would more explicitly document the type assertion and is generally preferred over type ignore comments.🔎 Proposed refactor using cast()
Add the import at the top of the file:
from typing import TYPE_CHECKING, Literal, Union, castThen update each property:
@property def lancedb_vector_properties(self) -> LanceDBConfigVectorProperties: if self.properties["store_type"] != VectorStoreType.LANCE_DB_VECTOR: raise ValueError( f"Lancedb vector properties are only available for LanceDB vector store type. Got {self.properties.get('store_type')}" ) - return self.properties # type: ignore[return-value] + return cast(LanceDBConfigVectorProperties, self.properties) @property def lancedb_hybrid_properties(self) -> LanceDBConfigHybridProperties: if self.properties["store_type"] != VectorStoreType.LANCE_DB_HYBRID: raise ValueError( f"Lancedb hybrid properties are only available for LanceDB hybrid store type. Got {self.properties.get('store_type')}" ) - return self.properties # type: ignore[return-value] + return cast(LanceDBConfigHybridProperties, self.properties) @property def lancedb_fts_properties(self) -> LanceDBConfigFTSProperties: if self.properties["store_type"] != VectorStoreType.LANCE_DB_FTS: raise ValueError( f"Lancedb FTS properties are only available for LanceDB FTS store type. Got {self.properties.get('store_type')}" ) - return self.properties # type: ignore[return-value] + return cast(LanceDBConfigFTSProperties, self.properties)Also applies to: 104-109, 112-117
libs/core/kiln_ai/tools/rag_tools.py (1)
230-230: Consider using Pydantic v2 for parameter validation instead of suppressing type checks.The
type: ignoresuppresses a legitimate type-safety check, andTypedDictprovides no runtime validation. If the caller omitsquery, line 231 will raise aKeyError. Per the coding guidelines, Pydantic v2 should be used for data validation, and this would provide both type hints and runtime validation.🔎 Proposed refactor using Pydantic v2 BaseModel
Replace the
RagParamsTypedDict definition (lines 61-62):-class RagParams(TypedDict): - query: str +class RagParams(BaseModel): + query: strThen update line 230 to validate and parse the parameters:
- kwargs = RagParams(**kwargs) # type: ignore[missing-typed-dict-key] - query = kwargs["query"] + params = RagParams(**kwargs) + query = params.queryThis approach:
- Removes the need for
type: ignore- Provides runtime validation that
queryis present and is a string- Aligns with the coding guideline to use Pydantic v2 for data validation
- Makes the code consistent with
ChunkContext(line 37), which already usesBaseModelapp/desktop/studio_server/finetune_api.py (1)
284-284: LGTM - Type suppressions are safe with runtime guards.The
type: ignore[invalid-argument-type]annotations on the threefinetune_registrylookups are appropriate. Each access is protected by a prior existence check (lines 279, 359, 431) that raises an HTTPException if the key is missing, ensuring runtime safety.Optional: Consider tightening finetune_registry type definitions
If
finetune_registrytype definitions could be refined to accept the specific key types being used (ModelProviderName,str), these type ignores might become unnecessary. However, this is purely cosmetic and can be deferred.Also applies to: 363-363, 436-436
app/desktop/studio_server/tool_api.py (1)
507-516: Consider applying TypedDict constructors to all property helper functions.For consistency with the new pattern in
add_kiln_task_tool, consider updating_remote_tool_server_propertiesand_local_tool_server_propertiesto use their respective TypedDict constructors (RemoteServerPropertiesandLocalServerProperties). This would improve static type checking across all tool server types.🔎 Proposed refactor
For
_remote_tool_server_properties:def _remote_tool_server_properties( tool_data: ExternalToolServerCreationRequest, ) -> RemoteServerProperties: - # Create the ExternalToolServer with all data for validation - return { - "server_url": tool_data.server_url, - "headers": tool_data.headers, - "secret_header_keys": tool_data.secret_header_keys, - "is_archived": tool_data.is_archived, - } + return RemoteServerProperties( + server_url=tool_data.server_url, + headers=tool_data.headers, + secret_header_keys=tool_data.secret_header_keys, + is_archived=tool_data.is_archived, + )For
_local_tool_server_properties:def _local_tool_server_properties( tool_data: LocalToolServerCreationRequest, ) -> LocalServerProperties: - return { - "command": tool_data.command, - "args": tool_data.args, - "env_vars": tool_data.env_vars, - "secret_env_var_keys": tool_data.secret_env_var_keys, - "is_archived": tool_data.is_archived, - } + return LocalServerProperties( + command=tool_data.command, + args=tool_data.args, + env_vars=tool_data.env_vars, + secret_env_var_keys=tool_data.secret_env_var_keys, + is_archived=tool_data.is_archived, + )Also applies to: 567-576
libs/server/kiln_server/document_api.py (2)
360-380: Type narrowing workaround is acceptable, but consider explicit checks.The
type: ignorecomments work around the type checker's inability to narrow discriminated unions in match statements. While this is pragmatic, it suppresses all type errors for those attribute accesses, potentially hiding future issues.💡 Optional: Use explicit isinstance checks for better type safety
Consider using explicit type narrowing to avoid type: ignore comments:
def get_properties_for_chunker_type( self, ) -> SemanticChunkerProperties | FixedWindowChunkerProperties: properties = self.properties - match properties.chunker_type: - case ChunkerType.SEMANTIC: + if isinstance(properties, SemanticChunkerPropertiesPublic): - return SemanticChunkerProperties( - chunker_type=ChunkerType.SEMANTIC, - embedding_config_id=properties.embedding_config_id, # type: ignore[possibly-missing-attribute] - buffer_size=properties.buffer_size, # type: ignore[possibly-missing-attribute] - breakpoint_percentile_threshold=properties.breakpoint_percentile_threshold, # type: ignore[possibly-missing-attribute] + return SemanticChunkerProperties( + chunker_type=ChunkerType.SEMANTIC, + embedding_config_id=properties.embedding_config_id, + buffer_size=properties.buffer_size, + breakpoint_percentile_threshold=properties.breakpoint_percentile_threshold, - include_metadata=False, - include_prev_next_rel=False, - ) - case ChunkerType.FIXED_WINDOW: + include_metadata=False, + include_prev_next_rel=False, + ) + elif isinstance(properties, FixedWindowChunkerPropertiesPublic): - return FixedWindowChunkerProperties( - chunker_type=ChunkerType.FIXED_WINDOW, - chunk_size=properties.chunk_size, # type: ignore[possibly-missing-attribute] - chunk_overlap=properties.chunk_overlap, # type: ignore[possibly-missing-attribute] + return FixedWindowChunkerProperties( + chunker_type=ChunkerType.FIXED_WINDOW, + chunk_size=properties.chunk_size, + chunk_overlap=properties.chunk_overlap, - ) - raise_exhaustive_enum_error(properties.chunker_type) + ) + raise_exhaustive_enum_error(properties.chunker_type)This eliminates the need for
type: ignorecomments while maintaining type safety.
1633-1642: Type narrowing workaround (same pattern as lines 360-380).Same discriminated union narrowing issue. The
if request.properties.chunker_type == ChunkerType.SEMANTICcheck should narrow the type, but the type checker doesn't recognize it. The validation logic is correct.Consider applying the same explicit isinstance check pattern suggested for
get_properties_for_chunker_typeif you refactor that method.libs/core/kiln_ai/tools/built_in_tools/math_tools.py (1)
37-37: Type suppressions added as part of Ty migration.The
# type: ignore[missing-typed-dict-key]comments successfully suppress type checker warnings for the new Ty tooling. Note that the patternkwargs = AddParams(**kwargs)reassigns kwargs to a TypedDict but provides no runtime validation—TypedDict is just a dict at runtime. The keys are validated by the parameters_schema in the base class, so this is safe.If you want actual runtime type validation in the future, consider using Pydantic models instead of TypedDict:
Optional refactor to add runtime validation
from pydantic import BaseModel class AddParams(BaseModel): a: Union[int, float] b: Union[int, float] async def run(self, context=None, **kwargs) -> ToolCallResult: """Add two numbers and return the result.""" params = AddParams(**kwargs) # Pydantic validates at runtime return ToolCallResult(output=str(params.a + params.b))Also applies to: 75-75, 110-110, 151-151
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (20)
.github/workflows/build_and_test.yml(1 hunks).github/workflows/format_and_lint.yml(2 hunks).gitignore(1 hunks)CONTRIBUTING.md(1 hunks)app/desktop/studio_server/finetune_api.py(3 hunks)app/desktop/studio_server/tool_api.py(1 hunks)checks.sh(2 hunks)hooks_mcp.yaml(1 hunks)libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py(2 hunks)libs/core/kiln_ai/datamodel/basemodel.py(3 hunks)libs/core/kiln_ai/datamodel/chunk.py(2 hunks)libs/core/kiln_ai/datamodel/dataset_filters.py(3 hunks)libs/core/kiln_ai/datamodel/dataset_split.py(1 hunks)libs/core/kiln_ai/datamodel/vector_store.py(1 hunks)libs/core/kiln_ai/tools/built_in_tools/math_tools.py(4 hunks)libs/core/kiln_ai/tools/rag_tools.py(1 hunks)libs/server/kiln_server/document_api.py(3 hunks)libs/server/kiln_server/mcp/mcp_server_tool_utils.py(1 hunks)pyproject.toml(2 hunks)pyrightconfig.json(0 hunks)
💤 Files with no reviewable changes (1)
- pyrightconfig.json
✅ Files skipped from review due to trivial changes (3)
- .gitignore
- .github/workflows/build_and_test.yml
- libs/server/kiln_server/mcp/mcp_server_tool_utils.py
🧰 Additional context used
📓 Path-based instructions (5)
libs/core/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
Use Python 3.10+ for the core library (libs/core) and Python 3.13 for the desktop app
Use Python 3.10+ for the core library (libs/core)
Files:
libs/core/kiln_ai/tools/rag_tools.pylibs/core/kiln_ai/datamodel/dataset_filters.pylibs/core/kiln_ai/datamodel/basemodel.pylibs/core/kiln_ai/adapters/fine_tune/dataset_formatter.pylibs/core/kiln_ai/tools/built_in_tools/math_tools.pylibs/core/kiln_ai/datamodel/chunk.pylibs/core/kiln_ai/datamodel/vector_store.pylibs/core/kiln_ai/datamodel/dataset_split.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation
**/*.py: Use Pydantic v2, not v1
Use asyncio for asynchronous operations in Python
Files:
libs/core/kiln_ai/tools/rag_tools.pylibs/core/kiln_ai/datamodel/dataset_filters.pylibs/core/kiln_ai/datamodel/basemodel.pylibs/core/kiln_ai/adapters/fine_tune/dataset_formatter.pyapp/desktop/studio_server/finetune_api.pylibs/core/kiln_ai/tools/built_in_tools/math_tools.pyapp/desktop/studio_server/tool_api.pylibs/server/kiln_server/document_api.pylibs/core/kiln_ai/datamodel/chunk.pylibs/core/kiln_ai/datamodel/vector_store.pylibs/core/kiln_ai/datamodel/dataset_split.py
app/desktop/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use Python 3.13 for the desktop application (app/desktop)
Files:
app/desktop/studio_server/finetune_api.pyapp/desktop/studio_server/tool_api.py
{libs/server,app/desktop/studio_server}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI for REST server implementation
Files:
app/desktop/studio_server/finetune_api.pyapp/desktop/studio_server/tool_api.pylibs/server/kiln_server/document_api.py
libs/server/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
Use FastAPI for creating REST servers
Files:
libs/server/kiln_server/document_api.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
📚 Learning: 2025-09-17T17:36:30.231Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 614
File: libs/server/kiln_server/document_api.py:1413-1414
Timestamp: 2025-09-17T17:36:30.231Z
Learning: When RagConfig schema changes are made in the Kiln AI system, older RagConfig instances created before the schema change are discarded rather than migrated, eliminating the need for fallback logic or backwards compatibility handling in the API responses. This is acceptable because RagConfig features are still in pre-release state.
Applied to files:
libs/core/kiln_ai/tools/rag_tools.py
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
Applied to files:
libs/core/kiln_ai/datamodel/basemodel.py
📚 Learning: 2025-11-24T19:13:16.966Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: .cursor/rules/project.mdc:0-0
Timestamp: 2025-11-24T19:13:16.966Z
Learning: Applies to **/*test*.py : Use pytest for testing Python code
Applied to files:
hooks_mcp.yaml
📚 Learning: 2025-11-24T19:13:16.966Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: .cursor/rules/project.mdc:0-0
Timestamp: 2025-11-24T19:13:16.966Z
Learning: Run linting, formatting, and typechecking tools before completing tasks to ensure code meets standards
Applied to files:
checks.sh
📚 Learning: 2025-10-01T21:48:25.854Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 674
File: libs/core/kiln_ai/tools/kiln_task_tool.py:30-34
Timestamp: 2025-10-01T21:48:25.854Z
Learning: In libs/core/kiln_ai/datamodel/external_tool_server.py, KilnTaskServerProperties is a TypedDict, not a Pydantic model. At runtime it's a regular Python dict, so .get() method works correctly and no .model_dump() conversion is needed.
Applied to files:
app/desktop/studio_server/tool_api.py
📚 Learning: 2025-10-27T09:38:25.562Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 753
File: libs/server/kiln_server/document_api.py:279-299
Timestamp: 2025-10-27T09:38:25.562Z
Learning: In the Kiln codebase, leonardmq intentionally overrides (rather than using setdefault) certain internal config fields like include_metadata and include_prev_next_rel in semantic chunker properties, even if the frontend sends them. These fields are too granular for user control and are set to enforce config stability.
Applied to files:
libs/server/kiln_server/document_api.py
📚 Learning: 2025-10-29T11:19:31.059Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 759
File: app/web_ui/src/lib/api_schema.d.ts:2485-2490
Timestamp: 2025-10-29T11:19:31.059Z
Learning: In libs/server/kiln_server/document_api.py and the generated app/web_ui/src/lib/api_schema.d.ts, CreateVectorStoreConfigRequest.properties is now a union of three distinct public schemas: LanceDBConfigFTSPropertiesPublic, LanceDBConfigVectorPropertiesPublic, and LanceDBConfigHybridPropertiesPublic. These are intentionally split to allow independent evolution; do not suggest merging them.
Applied to files:
libs/server/kiln_server/document_api.pylibs/core/kiln_ai/datamodel/vector_store.py
🧬 Code graph analysis (6)
libs/core/kiln_ai/datamodel/dataset_filters.py (1)
libs/core/kiln_ai/datamodel/test_dataset_split.py (1)
task_run(78-93)
libs/core/kiln_ai/datamodel/basemodel.py (1)
libs/core/kiln_ai/datamodel/test_basemodel.py (4)
relationship_name(118-119)relationship_name(146-147)parent_type(122-123)parent_type(150-151)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (1)
libs/core/kiln_ai/adapters/chat/chat_formatter.py (1)
messages(71-72)
app/desktop/studio_server/finetune_api.py (1)
libs/core/kiln_ai/adapters/fine_tune/test_together_finetune.py (1)
finetune(65-77)
app/desktop/studio_server/tool_api.py (2)
app/web_ui/src/lib/types.ts (1)
KilnTaskServerProperties(95-96)libs/core/kiln_ai/datamodel/external_tool_server.py (1)
KilnTaskServerProperties(43-48)
libs/server/kiln_server/document_api.py (2)
libs/core/kiln_ai/datamodel/chunk.py (2)
ChunkerType(35-37)FixedWindowChunkerProperties(49-52)app/web_ui/src/lib/types.ts (1)
ChunkerType(62-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Cursor Bugbot
- GitHub Check: Generate Coverage Report
- GitHub Check: Build Desktop Apps (macos-15-intel)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build, Typecheck, and Test Python (3.11)
- GitHub Check: Count Tests
- GitHub Check: Build, Typecheck, and Test Python (3.10)
- GitHub Check: Generate Coverage Report
🔇 Additional comments (17)
libs/core/kiln_ai/datamodel/dataset_split.py (1)
234-234: LGTM! Improved explicitness and type-checking compatibility.The ternary operator makes the None-check explicit and improves readability. This is more type-checker friendly and aligns well with the explicit None-checking pattern used in the subsequent lines (235-238).
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (2)
138-140: LGTM! Type annotation correctly broadened to match actual data.The return type change from
dict[str, str | None]todict[str, Any]is correct because the function returns dictionaries containing non-string values like "tool_calls" (line 162) and "tool_call_id" (line 171). This aligns with how the function is used throughout the file (lines 236, 272, 285).
187-187: LGTM! Local variable type correctly updated for consistency.The type annotation properly reflects the updated return type of
generate_chat_message_list.libs/core/kiln_ai/datamodel/dataset_filters.py (2)
3-3: LGTM! Type annotation improves type safety.The addition of the
Dictimport and explicit type annotation forstatic_dataset_filtersenhances type safety and makes the code more self-documenting. The annotation correctly describes the mapping structure.Also applies to: 131-136
22-23: LGTM! Parameter naming improves consistency.Changing the parameter name from
_totask_runmakes the function signature consistent with theDatasetFilterprotocol definition (line 17) and improves code readability.libs/core/kiln_ai/datamodel/chunk.py (1)
156-156: LGTM! Type suppressions are well-justified.The
type: ignore[return-value]annotations are appropriate given Python's type system limitations with TypedDict narrowing. The accompanying comments clearly explain the rationale, runtime checks ensure correctness, and the approach respects the project's linting constraints.Also applies to: 168-168
libs/core/kiln_ai/datamodel/basemodel.py (1)
329-329: LGTM - Type suppressions are justified for metaprogramming.The four
type: ignoreannotations are appropriate for these dynamic code paths:
- Line 329 (
mutable_copy): Pydantic'smodel_copy(deep=True)returns the same type, but the type checker can't infer this through the genericSelfreturn type.- Lines 694, 698 (
_create_child_method): Dynamic method generation using genericchild_classparameters requires runtime type construction that the type system cannot validate statically.- Line 790 (
_validate_nested): Recursive call with**kwargsunpacking in a complex inheritance hierarchy where the type checker cannot validate the dynamic arguments.All of these are standard patterns when working with Pydantic models and Python metaprogramming.
Also applies to: 694-694, 698-698, 790-790
libs/server/kiln_server/document_api.py (1)
2149-2157: LGTM! Improved type safety with TypedDict.The change from a plain dictionary to a properly typed
FixedWindowChunkerPropertiesTypedDict improves type safety. The values passed are correct:
chunk_sizeis guaranteed to be a PositiveInt (checked at line 2144)chunk_overlapdefaults to 0, which satisfies NonNegativeInthooks_mcp.yaml (1)
25-27: LGTM! Typecheck command updated to use Ty.The migration from
uv run pyright .touvx ty checkaligns with the broader tooling migration across the codebase.CONTRIBUTING.md (1)
77-77: LGTM! Documentation updated for Ty tooling.The IDE extension recommendation now correctly reflects the migration from Pyright to Ty.
pyproject.toml (2)
43-47: LGTM! Ty configuration is well-structured.The Ty source configuration appropriately excludes test files, build directories, and the web UI. The
invalid-key = "ignore"rule is reasonable during the migration period.
54-57: LGTM! Enhanced linting rules improve code quality.The additional Ruff rules (import sorting, unused imports, RUF rules, FastAPI linting, and tidy imports) and the ban on
typing.castare excellent additions that enforce better practices.checks.sh (3)
34-41: LGTM! Type checking and linting steps are well-structured.The separation of Ruff checks into distinct steps and the addition of the Ty type checking step improve clarity and align with the tooling migration.
43-50: LGTM! Improved misspelling check feedback.The addition of a success message after the misspelling check provides better user feedback when the check passes.
52-55: The script reference is valid.app/web_ui/src/lib/check_schema.shexists and is executable.Likely an incorrect or invalid review comment.
.github/workflows/format_and_lint.yml (2)
38-39: LGTM! Proper dependency installation.Changing to
uv sync --all-extras --devcorrectly installs all project dependencies including the new ty type checker, rather than just installing ruff separately.
49-51: LGTM! Type checking added to CI pipeline.The new "Typecheck with ty" step ensures type checking runs in CI, aligning with the local development workflow in checks.sh.
| properties=KilnTaskServerProperties( | ||
| name=tool_data.name, | ||
| description=tool_data.description, | ||
| task_id=tool_data.task_id, | ||
| run_config_id=tool_data.run_config_id, | ||
| is_archived=tool_data.is_archived, | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Type safety improvement, but inconsistent with edit endpoint.
The use of KilnTaskServerProperties constructor is excellent for static type checking. However, the corresponding edit_kiln_task_tool endpoint (lines 641-647) still uses a dict literal, creating an inconsistency.
Please update edit_kiln_task_tool to use the same pattern for consistency.
🤖 Prompt for AI Agents
In app/desktop/studio_server/tool_api.py around lines 609 and 641-647, the
create path uses KilnTaskServerProperties but the edit_kiln_task_tool endpoint
still constructs properties with a dict literal; change edit_kiln_task_tool to
construct and pass a KilnTaskServerProperties(...) instance with the same fields
(name, description, task_id, run_config_id, is_archived) instead of the dict to
keep typing consistent and avoid mixed patterns.
| "ruff>=0.13.0", | ||
| "watchfiles>=1.1.0", | ||
| "scalar-fastapi>=1.4.3", | ||
| "ty>=0.0.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update version constraint for ty—currently in beta with breaking changes expected.
ty is in Beta release and Astral recommends it for motivated users for production use, but you shouldn't use ty in production yet because it's a preview release with missing features and known issues. The current stable version on PyPI is 0.0.4, and a stable release is eyed for 2026. Using ty>=0.0.2 allows all 0.0.x versions, which during beta development can introduce breaking changes between minor versions. Consider pinning to a specific version (e.g., ty==0.0.4) or at minimum constraining to a narrower range to avoid unexpected CI/CD failures during the development phase.
🤖 Prompt for AI Agents
In pyproject.toml around line 26, the dependency "ty>=0.0.2" is too permissive
for a beta package and can pull breaking 0.0.x releases; change the constraint
to a stable pinned or narrowed range such as "ty==0.0.4" or "ty>=0.0.4,<0.1.0"
to prevent unexpected breaking upgrades, update the lockfile if present, and
rerun dependency installs/CI to validate the change.
* adding glm 4.7 * addressing CR feedback * lock typecheck version for now * siliconflow has a Pro prefix for the endpoint for GLM 4.7 --------- Co-authored-by: scosman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @libs/core/kiln_ai/adapters/ml_model_list.py:
- Around line 1538-1567: The KilnModelProvider entry for model_id
"gemini-3-flash-preview" is missing the gemini_reasoning_enabled flag despite
having reasoning_capable=True and thinking_level="medium"; update that
KilnModelProvider instance to include gemini_reasoning_enabled=True (place it
after multimodal_mime_types or alongside the other capability flags) to match
the pattern used by other Gemini providers.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/build_and_test.yml.github/workflows/format_and_lint.ymllibs/core/kiln_ai/adapters/ml_model_list.py
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/build_and_test.yml
- .github/workflows/format_and_lint.yml
🧰 Additional context used
📓 Path-based instructions (2)
libs/core/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
Use Python 3.10+ for the core library (libs/core) and Python 3.13 for the desktop app
Use Python 3.10+ for the core library (libs/core)
Files:
libs/core/kiln_ai/adapters/ml_model_list.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project.mdc)
**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation
**/*.py: Use Pydantic v2, not v1
Use asyncio for asynchronous operations in Python
Files:
libs/core/kiln_ai/adapters/ml_model_list.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.
Applied to files:
libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
Applied to files:
libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:19:20.074Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1736-1741
Timestamp: 2025-08-08T16:19:20.074Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwq_32b with ModelProviderName.siliconflow_cn (model_id "Qwen/QwQ-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via tests/responses that SiliconFlow returns outputs that parse correctly without an R1 parser. Parser usage remains provider-specific and should only be added when validated.
Applied to files:
libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-10-28T05:25:09.283Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 757
File: libs/core/kiln_ai/adapters/reranker_list.py:39-42
Timestamp: 2025-10-28T05:25:09.283Z
Learning: In the Kiln codebase (Python), model family and model name fields in reranker and similar model definition classes (like KilnRerankerModel in libs/core/kiln_ai/adapters/reranker_list.py) should use plain str types rather than Enum types. This is because users receive over-the-air (OTA) updates for new models, and using Enums would break when clients with older builds encounter new model families their Enum definitions don't include. Strings provide forward compatibility for the OTA update mechanism.
Applied to files:
libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-09-10T08:06:53.717Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 585
File: libs/core/kiln_ai/adapters/ml_embedding_model_list.py:32-33
Timestamp: 2025-09-10T08:06:53.717Z
Learning: In the Kiln AI codebase, friendly_name fields in embedding model configurations should maintain the original model naming conventions from providers (e.g., "gemini-embedding-001", "text-embedding-3-small") for human readability, rather than following snake_case conventions like the enum values.
Applied to files:
libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:15:20.796Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:2434-2439
Timestamp: 2025-08-08T16:15:20.796Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), for ModelName.qwen_3_8b_no_thinking with ModelProviderName.siliconflow_cn (model_id "Qwen/Qwen3-8B"), the qwen3_style_no_think formatter is not required: SiliconFlow’s non‑thinking Qwen3‑8B returns clean output without <answer> wrappers. Formatter/parser usage is provider-specific and should be added only when verified by tests/responses.
Applied to files:
libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-08T16:14:54.346Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1979-1983
Timestamp: 2025-08-08T16:14:54.346Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py, for ModelName.deepseek_r1_distill_qwen_32b on provider siliconflow_cn (model_id "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via testing that SiliconFlow returns final outputs that parse without an R1 parser, and reasoning may be omitted under structured output; rely on reasoning_optional_for_structured_output=True.
Applied to files:
libs/core/kiln_ai/adapters/ml_model_list.py
📚 Learning: 2025-08-22T11:15:53.584Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/adapters/embedding/embedding_registry.py:14-20
Timestamp: 2025-08-22T11:15:53.584Z
Learning: In the Kiln AI codebase, model_provider_name fields in datamodels like EmbeddingConfig are typed as ModelProviderName enum (which inherits from str, Enum), not as plain strings. Therefore, accessing .value on these fields is valid and returns the string representation of the enum value.
Applied to files:
libs/core/kiln_ai/adapters/ml_model_list.py
🧬 Code graph analysis (1)
libs/core/kiln_ai/adapters/ml_model_list.py (2)
libs/core/kiln_ai/datamodel/datamodel_enums.py (3)
ModelProviderName(84-106)StructuredOutputMode(27-49)KilnMimeType(109-133)libs/core/kiln_ai/datamodel/extraction.py (1)
friendly_name(289-291)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Generate Coverage Report
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (macos-15-intel)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Generate Coverage Report
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build, Typecheck, and Test Python (3.12)
- GitHub Check: Build, Typecheck, and Test Python (3.10)
- GitHub Check: Count Tests
🔇 Additional comments (4)
libs/core/kiln_ai/adapters/ml_model_list.py (4)
130-130: LGTM: Enum additions follow naming conventions.The new model name enum entries follow the established snake_case pattern and are consistent with existing model naming conventions.
Also applies to: 132-132, 198-198
1497-1507: LGTM: Vertex provider addition is well-configured.The Vertex provider configuration for Gemini 3 Pro Preview is consistent with the existing OpenRouter and Gemini API providers, including appropriate reasoning flags and thinking level settings.
1972-1992: LGTM: Nemotron 3 Nano model is well-configured.The new Nemotron 3 Nano model follows established patterns:
- Correctly uses
ModelFamily.llama(consistent with Nemotron 70B)- Has reasoning capabilities properly configured
- No parser specified (appropriate per learnings - parsers should only be added when validated through testing)
5155-5175: LGTM: GLM 4.7 model configuration is consistent.The new GLM 4.7 model is properly configured:
- Consistent with existing GLM model patterns (e.g., GLM 4.6)
- Appropriate use of
reasoning_optional_for_structured_outputflag for SiliconFlow provider- No parser specified (correct per learnings)
- Friendly name follows established conventions
| KilnModelProvider( | ||
| name=ModelProviderName.gemini_api, | ||
| model_id="gemini-3-flash-preview", | ||
| structured_output_mode=StructuredOutputMode.json_schema, | ||
| suggested_for_data_gen=True, | ||
| suggested_for_evals=True, | ||
| supports_doc_extraction=True, | ||
| multimodal_capable=True, | ||
| supports_vision=True, | ||
| multimodal_mime_types=[ | ||
| # documents | ||
| KilnMimeType.PDF, | ||
| KilnMimeType.CSV, | ||
| KilnMimeType.TXT, | ||
| KilnMimeType.HTML, | ||
| KilnMimeType.MD, | ||
| # images | ||
| KilnMimeType.JPG, | ||
| KilnMimeType.PNG, | ||
| # audio | ||
| KilnMimeType.MP3, | ||
| KilnMimeType.WAV, | ||
| KilnMimeType.OGG, | ||
| # video | ||
| KilnMimeType.MP4, | ||
| KilnMimeType.MOV, | ||
| ], | ||
| reasoning_capable=True, | ||
| thinking_level="medium", | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing gemini_reasoning_enabled flag in Gemini API provider.
The Gemini 3 Flash Gemini API provider has reasoning_capable=True (line 1565) and thinking_level="medium" (line 1566), but is missing the gemini_reasoning_enabled=True flag. This is inconsistent with:
- The OpenRouter provider for this same model (line 1536 has
gemini_reasoning_enabled=True) - The Vertex provider for this same model (line 1575 has
gemini_reasoning_enabled=True) - The general pattern across other Gemini models where
reasoning_capable=Trueis paired withgemini_reasoning_enabled=True(e.g., Gemini 2.5 Pro, Gemini 3 Pro Preview)
🔎 Proposed fix
Add the missing flag after multimodal_mime_types:
KilnMimeType.MOV,
],
reasoning_capable=True,
+ gemini_reasoning_enabled=True,
thinking_level="medium",
),🤖 Prompt for AI Agents
In @libs/core/kiln_ai/adapters/ml_model_list.py around lines 1538 - 1567, The
KilnModelProvider entry for model_id "gemini-3-flash-preview" is missing the
gemini_reasoning_enabled flag despite having reasoning_capable=True and
thinking_level="medium"; update that KilnModelProvider instance to include
gemini_reasoning_enabled=True (place it after multimodal_mime_types or alongside
the other capability flags) to match the pattern used by other Gemini providers.
What does this PR do?
Merging main into remote_config. has nemotron, gemini 3 pro enablement with vertex, and gemini 3 flash
Summary by CodeRabbit
New Features
Improvements
Chores
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.
Note
Adds a radar comparison chart and UX tweaks, surfaces completed finetunes as run configs in evals, introduces Gemini 3 Flash and Nemotron 3 Nano (incl. Vertex support), and replaces Pyright with Ty across CI and tooling.
compare_radar_chart.sveltewith legend formatting and no-data state; enhanced scatter chart legend/positioning and reactive selection.chart_no_data.svelte.get_all_run_configsandtask_run_config_from_idsupport finetune-derived run configs (completed only); removed legacytask_run_configsroute; endpoints use helper.gemini_3_flash, Vertex entries for Gemini 3 Pro/Flash, andnemotron_3_nano; enum updates.AdapterConfig.Config.settings(hide_sensitive=True).pyrightconfig.json.mainand releases.task_run_configs); small lockfile bumps; tessl/cursor configs; new ignores.Written by Cursor Bugbot for commit d0a10c7. This will update automatically on new commits. Configure here.