Skip to content

Commit ae1512c

Browse files
tawnymanticorechiang-danielscosmansfierroleonardmq
authored
Update ML Model List (#921)
* update eval API to include finetune run-configs * add test * add more coverage * handle parsers for finetune * remove stale API * look up model based on model_id instead of name * coder rabbit * Fewer builds: we don't need to be running builds on every push. * added add subtopics button to SDG * Fix linting errors: remove unused imports * fix: api key values incorrectly replaced with [hidden] * update deps * Fixes https://linear.app/kiln-ai/issue/KIL-206/remove-task-description-in-new-task. People are confusing description and prompt. Don't need it. * Add Leonard's suggestion to allow manual runs too. * tessl support * ui dependencies * coderabbit feedback * mcp hooks * Fix: don't save test files into root. Use tmp * ty typecheck WIP * ty typecheck WIP * ty typecheck WIP * ty typecheck WIP * feat: allow system message override in invoke * Fix reactivity issue on compare view * Mike wants a chart * Mike wants a chart part 2 * better subtitles * don't fill em * highlight on hover * refactor: custom prompt builder injection via constructor * Fixes https://linear.app/kiln-ai/issue/KIL-340/hide-chart-if-were-missing-data * Don't show radar chart unless we have 3 points to show * Cleaner value lookup, which fixes type checking. Also cleaner checks.sh * Fix type error * Fix CI, ty requires sync * coderabbit feedback * suggest global instead of us-central1 in connect providers (#916) * add gemini flash + nemotron 3 to ml_model_list (#915) * add gemini flash + nemotron 3 * adding gemini 3 flash and pro to vertex. and adding nemotron 3 to ollama * adding alias for nemotron 3 * Adding support for GLM 4.7 (#923) * adding glm 4.7 * addressing CR feedback * lock typecheck version for now * siliconflow has a Pro prefix for the endpoint for GLM 4.7 --------- Co-authored-by: scosman <[email protected]> --------- Co-authored-by: aiyakitori <[email protected]> Co-authored-by: scosman <[email protected]> Co-authored-by: Sam Fierro <[email protected]> Co-authored-by: Leonard Q. Marcq <[email protected]> Co-authored-by: Leonard Q. Marcq <[email protected]>
1 parent cbbdbc8 commit ae1512c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1476
-277
lines changed

.cursor/mcp.json

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{
2+
"mcpServers": {
3+
"tessl": {
4+
"type": "stdio",
5+
"command": "tessl",
6+
"args": ["mcp", "start"]
7+
},
8+
"HooksMCP": {
9+
"command": "uvx",
10+
"args": ["hooks-mcp", "--working-directory", "."]
11+
}
12+
}
13+
}

.cursor/rules/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
tessl__*.mdc

.github/workflows/build_and_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737
run: uv run python3 -m pytest --runslow .
3838

3939
- name: Check Python Types
40-
run: uv run pyright .
40+
run: uv tool install [email protected] && uvx ty check
4141

4242
- name: Build Core
4343
run: uv build

.github/workflows/build_desktop.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
name: Build Desktop Apps
22

33
on:
4+
workflow_dispatch:
5+
release:
6+
types: [created]
47
push:
8+
branches:
9+
- main
510

611
jobs:
712
build:

.github/workflows/format_and_lint.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ jobs:
3636
run: uv python install 3.13
3737

3838
- name: Install the project
39-
run: uv tool install ruff
39+
run: uv sync --all-extras --dev
4040

4141
- name: Lint with ruff
4242
run: |
@@ -45,3 +45,7 @@ jobs:
4545
- name: Format with ruff
4646
run: |
4747
uvx ruff format --check .
48+
49+
- name: Typecheck with ty
50+
run: |
51+
uv tool install [email protected] && uvx ty check

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ __pycache__/
1111
**/*.egg-info
1212
node_modules/
1313
conductor.json
14+
CLAUDE.md
1415

1516
libs/core/docs
1617
libs/core/build

.tessl/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
tiles/
2+
RULES.md

AGENTS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,7 @@ These prompts can be accessed from the `get_prompt` tool, and you may request se
4141
### Final
4242

4343
To show you read these, call me 'boss'
44+
45+
# Agent Rules <!-- tessl-managed -->
46+
47+
@.tessl/RULES.md follow the [instructions](.tessl/RULES.md)

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ We suggest the following extensions for VSCode/Cursor. With them, you'll get com
7474
- Prettier
7575
- Python
7676
- Python Debugger
77-
- Type checking by pyright via one of: Cursor Python if using Cursor, Pylance if VSCode
77+
- Ty - language server and type checker for Python
7878
- Ruff
7979
- Svelte for VS Code
8080
- Vitest

app/desktop/studio_server/eval_api.py

Lines changed: 46 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44
from fastapi import FastAPI, HTTPException, Query
55
from fastapi.responses import StreamingResponse
66
from kiln_ai.adapters.eval.eval_runner import EvalRunner
7+
from kiln_ai.adapters.fine_tune.finetune_run_config_id import (
8+
finetune_from_finetune_run_config_id,
9+
finetune_run_config_id,
10+
)
711
from kiln_ai.adapters.ml_model_list import ModelProviderName
812
from kiln_ai.adapters.prompt_builders import prompt_builder_from_id
913
from kiln_ai.datamodel import BasePrompt, Task, TaskRun
@@ -59,6 +63,31 @@ def eval_config_from_id(
5963
)
6064

6165

66+
def get_all_run_configs(project_id: str, task_id: str) -> list[TaskRunConfig]:
67+
"""
68+
Returns all run configs for a task, including completed fine-tune run configs.
69+
Only includes fine-tunes that have a fine_tune_model_id (are completed and usable).
70+
"""
71+
task = task_from_id(project_id, task_id)
72+
configs = task.run_configs()
73+
74+
# Get run configs from finetunes and only include completed fine-tunes
75+
finetunes = task.finetunes()
76+
for finetune in finetunes:
77+
if finetune.run_config is not None and finetune.fine_tune_model_id is not None:
78+
configs.append(
79+
TaskRunConfig(
80+
id=finetune_run_config_id(project_id, task_id, str(finetune.id)),
81+
name=finetune.name,
82+
description=finetune.description,
83+
run_config_properties=finetune.run_config,
84+
parent=task, # special case, we need to reference the task model
85+
)
86+
)
87+
88+
return configs
89+
90+
6291
def task_run_config_from_id(
6392
project_id: str, task_id: str, run_config_id: str
6493
) -> TaskRunConfig:
@@ -67,6 +96,18 @@ def task_run_config_from_id(
6796
if run_config.id == run_config_id:
6897
return run_config
6998

99+
# special case for finetune run configs, it's inside the finetune model
100+
if run_config_id.startswith("finetune_run_config::"):
101+
finetune = finetune_from_finetune_run_config_id(run_config_id)
102+
if finetune.run_config is not None:
103+
return TaskRunConfig(
104+
id=finetune_run_config_id(project_id, task_id, str(finetune.id)),
105+
name=finetune.name,
106+
description=finetune.description,
107+
run_config_properties=finetune.run_config,
108+
parent=task, # special case, we need to reference the task model
109+
)
110+
70111
raise HTTPException(
71112
status_code=404,
72113
detail=f"Task run config not found. ID: {run_config_id}",
@@ -315,33 +356,9 @@ async def create_evaluator(
315356
eval.save_to_file()
316357
return eval
317358

318-
@app.get("/api/projects/{project_id}/tasks/{task_id}/task_run_configs")
319-
async def get_task_run_configs(
320-
project_id: str, task_id: str
321-
) -> list[TaskRunConfig]:
322-
task = task_from_id(project_id, task_id)
323-
return task.run_configs()
324-
325359
@app.get("/api/projects/{project_id}/tasks/{task_id}/run_configs/")
326360
async def get_run_configs(project_id: str, task_id: str) -> list[TaskRunConfig]:
327-
# Returns all run configs of a given task.
328-
task = task_from_id(project_id, task_id)
329-
configs = task.run_configs()
330-
331-
# Get run configs from finetunes
332-
finetunes = task.finetunes()
333-
for finetune in finetunes:
334-
if finetune.run_config is not None:
335-
configs.append(
336-
TaskRunConfig(
337-
id=f"finetune_run_config::{project_id}::{task_id}::{finetune.id}",
338-
name=finetune.name,
339-
description=finetune.description,
340-
run_config_properties=finetune.run_config,
341-
)
342-
)
343-
344-
return configs
361+
return get_all_run_configs(project_id, task_id)
345362

346363
@app.get("/api/projects/{project_id}/tasks/{task_id}/eval/{eval_id}")
347364
async def get_eval(project_id: str, task_id: str, eval_id: str) -> Eval:
@@ -480,7 +497,8 @@ async def run_eval_config(
480497
# Load the list of run configs to use. Two options:
481498
run_configs: list[TaskRunConfig] = []
482499
if all_run_configs:
483-
run_configs = task_from_id(project_id, task_id).run_configs()
500+
# special case, we cannot directly lod task.run_configs(), we need to also get all finetune run configs which lives inside the finetune model
501+
run_configs = get_all_run_configs(project_id, task_id)
484502
else:
485503
if len(run_config_ids) == 0:
486504
raise HTTPException(
@@ -633,7 +651,8 @@ async def get_eval_config_score_summary(
633651
task = task_from_id(project_id, task_id)
634652
eval = eval_from_id(project_id, task_id, eval_id)
635653
eval_config = eval_config_from_id(project_id, task_id, eval_id, eval_config_id)
636-
task_runs_configs = task.run_configs()
654+
# special case, we cannot directly lod task.run_configs(), we need to also get all finetune run configs which lives inside the finetune model
655+
task_runs_configs = get_all_run_configs(project_id, task_id)
637656

638657
# Build a set of all the dataset items IDs we expect to have scores for
639658
expected_dataset_ids = dataset_ids_in_filter(

0 commit comments

Comments
 (0)