Conversation
WalkthroughAdds Kiln Copilot end-to-end support: new POST /api/projects/{project_id}/tasks/{task_id}/spec_with_copilot endpoint and request model, frontend wiring to choose Copilot vs standard spec creation, new backend Copilot utilities and models, migration of spec persistence logic into Copilot utilities, and updated types/schema across client and server. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as Spec Builder UI
participant FE as Frontend Payload Builder
participant API as Backend Copilot API
participant Copilot as Kiln Copilot Service
participant DB as Database
UI->>FE: saveSpec(use_kiln_copilot=true)
FE->>FE: build definition & properties
FE->>API: POST /spec_with_copilot (CreateSpecWithCopilotRequest)
API->>API: validate request & create Eval/EvalConfig
API->>Copilot: generate_batch (target/task infos, spec)
Copilot-->>API: list[SampleApi]
API->>DB: create TaskRuns (eval/train/golden) and persist TaskRuns
API->>DB: create Spec and link Eval
API-->>FE: Spec response (Spec)
FE-->>UI: spec_id or error
Note over API,DB: on any failure => rollback created Eval/TaskRuns
sequenceDiagram
participant UI as Spec Builder UI
participant FE as Frontend Payload Builder
participant API as Backend Spec API
participant DB as Database
UI->>FE: saveSpec(use_kiln_copilot=false)
FE->>FE: build definition & properties
FE->>API: POST /spec (legacy payload)
API->>DB: create Spec (default priority/status)
API-->>FE: Spec response
FE-->>UI: spec_id
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @sfierro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the architecture for creating new specifications within the application. By moving the intricate persistence and example generation logic for Kiln Copilot-assisted spec creation from the frontend to a new backend Python endpoint, the change centralizes critical business logic. This improves maintainability, enhances data integrity by ensuring server-side validation and atomic operations, and streamlines the client-side code by offloading complex orchestration. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📊 Coverage ReportOverall Coverage: 91% Diff: origin/sfierro/specs...HEAD
Summary
Line-by-lineView line-by-line diff coverageapp/desktop/studio_server/copilot_api.pyLines 207-215 207 @app.post("/api/copilot/question_spec")
208 async def question_spec(
209 input: SpecQuestionerInput,
210 ) -> QuestionSet:
! 211 api_key = get_copilot_api_key()
212 client = get_authenticated_client(api_key)
213
214 questioner_input = SpecQuestionerInputServerApi.from_dict(input.model_dump())Lines 240-248 240 @app.post("/api/copilot/refine_spec_with_question_answers")
241 async def submit_question_answers(
242 request: SubmitAnswersRequest,
243 ) -> RefineSpecWithQuestionAnswersResponse:
! 244 api_key = get_copilot_api_key()
245 client = get_authenticated_client(api_key)
246
247 submit_input = SubmitAnswersRequestServerApi.from_dict(request.model_dump())Lines 289-321 289
290 All models are validated before any saves occur. If validation fails,
291 no data is persisted.
292 """
! 293 task = task_from_id(project_id, task_id)
294
295 # Generate tag suffixes
! 296 eval_tag_suffix = request.name.lower().replace(" ", "_")
! 297 eval_tag = f"eval_{eval_tag_suffix}"
! 298 train_tag = f"eval_train_{eval_tag_suffix}"
! 299 golden_tag = f"eval_golden_{eval_tag_suffix}"
300
301 # Extract spec_type from properties (discriminated union)
! 302 spec_type = request.properties["spec_type"]
303
304 # Determine eval properties
! 305 template = spec_eval_template(spec_type)
! 306 output_scores = [spec_eval_output_score(request.name)]
! 307 eval_set_filter_id = f"tag::{eval_tag}"
! 308 eval_configs_filter_id = f"tag::{golden_tag}"
! 309 evaluation_data_type = spec_eval_data_type(
310 spec_type, request.evaluate_full_trace
311 )
312
313 # Build models but don't save yet, collect all models first
! 314 models_to_save: list[Eval | EvalConfig | TaskRun | Spec] = []
315
316 # 1. Create the Eval
! 317 eval_model = Eval(
318 parent=task,
319 name=request.name,
320 description=None,
321 template=template,Lines 324-335 324 eval_configs_filter_id=eval_configs_filter_id,
325 template_properties=None,
326 evaluation_data_type=evaluation_data_type,
327 )
! 328 models_to_save.append(eval_model)
329
330 # 2. Create judge eval config
! 331 eval_config = EvalConfig(
332 parent=eval_model,
333 name=generate_memorable_name(),
334 config_type=EvalConfigType.llm_as_judge,
335 model_name=request.judge_info.task_metadata.model_name,Lines 338-359 338 "eval_steps": [request.judge_info.prompt],
339 "task_description": request.task_description,
340 },
341 )
! 342 models_to_save.append(eval_config)
343
344 # Set as default config after ID is assigned
! 345 eval_model.current_config_id = eval_config.id
346
347 # 3. Generate examples via copilot API
! 348 api_key = get_copilot_api_key()
! 349 task_input_schema = (
350 str(task.input_json_schema) if task.input_json_schema else ""
351 )
! 352 task_output_schema = (
353 str(task.output_json_schema) if task.output_json_schema else ""
354 )
! 355 all_examples = await generate_copilot_examples(
356 api_key=api_key,
357 target_task_info=TaskInfoApi(
358 task_prompt=request.task_prompt_with_few_shot,
359 task_input_schema=task_input_schema,Lines 372-380 372 spec_definition=request.definition,
373 )
374
375 # 4. Create TaskRuns for eval, train, and golden datasets
! 376 task_runs = create_dataset_task_runs(
377 all_examples=all_examples,
378 reviewed_examples=request.reviewed_examples,
379 eval_tag=eval_tag,
380 train_tag=train_tag,Lines 380-393 380 train_tag=train_tag,
381 golden_tag=golden_tag,
382 spec_name=request.name,
383 )
! 384 for run in task_runs:
! 385 run.parent = task
! 386 models_to_save.extend(task_runs)
387
388 # 5. Create the Spec using pre-computed definition and properties from client
! 389 spec = Spec(
390 parent=task,
391 name=request.name,
392 definition=request.definition,
393 properties=request.properties,Lines 395-430 395 status=SpecStatus.active,
396 tags=[],
397 eval_id=eval_model.id,
398 )
! 399 models_to_save.append(spec)
400
401 # All models are now created and validated via Pydantic.
402 # Save everything, with cleanup on failure.
! 403 saved_models: list[Eval | EvalConfig | TaskRun | Spec] = []
! 404 try:
! 405 eval_model.save_to_file()
! 406 saved_models.append(eval_model)
407
! 408 eval_config.save_to_file()
! 409 saved_models.append(eval_config)
410
! 411 for run in task_runs:
! 412 run.save_to_file()
! 413 saved_models.append(run)
414
! 415 spec.save_to_file()
! 416 saved_models.append(spec)
! 417 except Exception:
418 # Clean up any models that were successfully saved before the error
! 419 for model in reversed(saved_models):
! 420 try:
! 421 model.delete()
! 422 except Exception:
423 # Log cleanup error but continue, the original error is more important
! 424 logger.exception(
425 f"Failed to delete {type(model).__name__} during cleanup"
426 )
! 427 raise
428
! 429 return specapp/desktop/util/spec_utils.pyLines 77-87 77 topic_generation_task_info: Task info for topic generation
78 input_generation_task_info: Task info for input generation
79 spec_definition: The rendered spec definition
80 """
! 81 client = get_authenticated_client(api_key)
82
! 83 generate_input = GenerateBatchInput.from_dict(
84 {
85 "target_task_info": target_task_info.model_dump(),
86 "topic_generation_task_info": topic_generation_task_info.model_dump(),
87 "input_generation_task_info": input_generation_task_info.model_dump(),Lines 90-115 90 "num_topics": NUM_TOPICS,
91 }
92 )
93
! 94 result = await generate_batch_v1_copilot_generate_batch_post.asyncio(
95 client=client,
96 body=generate_input,
97 )
98
! 99 if result is None:
! 100 raise HTTPException(
101 status_code=500, detail="Failed to generate batch: No response"
102 )
103
! 104 if isinstance(result, HTTPValidationError):
! 105 raise HTTPException(
106 status_code=422,
107 detail=f"Validation error: {result.to_dict()}",
108 )
109
! 110 if not isinstance(result, GenerateBatchOutput):
! 111 raise HTTPException(
112 status_code=500,
113 detail=f"Failed to generate batch: Unexpected response type {type(result)}",
114 )Lines 113-125 113 detail=f"Failed to generate batch: Unexpected response type {type(result)}",
114 )
115
116 # Convert result to flat list of SampleApi
! 117 examples: list[SampleApi] = []
! 118 data_dict = result.to_dict().get("data_by_topic", {})
! 119 for topic_examples in data_dict.values():
! 120 for ex in topic_examples:
! 121 examples.append(
122 SampleApi(
123 input=ex.get("input", ""),
124 output=ex.get("output", ""),
125 )Lines 124-137 124 output=ex.get("output", ""),
125 )
126 )
127
! 128 return examples
129
130
131 def spec_eval_output_score(spec_name: str) -> EvalOutputScore:
132 """Create an EvalOutputScore for a spec."""
! 133 return EvalOutputScore(
134 name=spec_name,
135 type=TaskOutputRatingType.pass_fail,
136 instruction=f"Evaluate if the model's behaviour meets the spec: {spec_name}.",
137 )Lines 140-178 140 def spec_eval_data_type(
141 spec_type: SpecType, evaluate_full_trace: bool = False
142 ) -> EvalDataType:
143 """Determine the eval data type for a spec."""
! 144 if spec_type == SpecType.reference_answer_accuracy:
! 145 return EvalDataType.reference_answer
146
! 147 if evaluate_full_trace:
! 148 return EvalDataType.full_trace
149 else:
! 150 return EvalDataType.final_answer
151
152
153 def spec_eval_template(spec_type: SpecType) -> EvalTemplateId | None:
154 """Get the eval template for a spec type."""
! 155 match spec_type:
! 156 case SpecType.appropriate_tool_use:
! 157 return EvalTemplateId.tool_call
! 158 case SpecType.reference_answer_accuracy:
! 159 return EvalTemplateId.rag
! 160 case SpecType.factual_correctness:
! 161 return EvalTemplateId.factual_correctness
! 162 case SpecType.toxicity:
! 163 return EvalTemplateId.toxicity
! 164 case SpecType.bias:
! 165 return EvalTemplateId.bias
! 166 case SpecType.maliciousness:
! 167 return EvalTemplateId.maliciousness
! 168 case SpecType.jailbreak:
! 169 return EvalTemplateId.jailbreak
! 170 case SpecType.issue:
! 171 return EvalTemplateId.issue
! 172 case SpecType.desired_behaviour:
! 173 return EvalTemplateId.desired_behaviour
! 174 case (
175 SpecType.tone
176 | SpecType.formatting
177 | SpecType.localization
178 | SpecType.hallucinationsLines 180-188 180 | SpecType.nsfw
181 | SpecType.taboo
182 | SpecType.prompt_leakage
183 ):
! 184 return None
185
186
187 def sample_and_remove(examples: list[SampleApi], n: int) -> list[SampleApi]:
188 """Randomly sample and remove n items from a list.Lines 189-213 189
190 Mutates the input list by removing the sampled elements.
191 Uses swap-and-pop for O(1) removal.
192 """
! 193 sampled: list[SampleApi] = []
! 194 count = min(n, len(examples))
195
! 196 for _ in range(count):
! 197 if not examples:
! 198 break
! 199 random_index = random.randint(0, len(examples) - 1)
200 # Swap with last element and pop
! 201 examples[random_index], examples[-1] = examples[-1], examples[random_index]
! 202 sampled.append(examples.pop())
203
! 204 return sampled
205
206
207 def create_task_run_from_sample(sample: SampleApi, tag: str) -> TaskRun:
208 """Create a TaskRun from a SampleApi (without parent set)."""
! 209 data_source = DataSource(
210 type=DataSourceType.synthetic,
211 properties={
212 "adapter_name": KILN_ADAPTER_NAME,
213 "model_name": KILN_COPILOT_MODEL_NAME,Lines 215-224 215 },
216 )
217
218 # Access input using model_dump since SampleApi uses alias
! 219 sample_dict = sample.model_dump(by_alias=True)
! 220 return TaskRun(
221 input=sample_dict["input"],
222 input_source=data_source,
223 output=TaskOutput(
224 output=sample.output,Lines 231-239 231 def create_task_run_from_reviewed(
232 example: ReviewedExample, tag: str, spec_name: str
233 ) -> TaskRun:
234 """Create a TaskRun from a reviewed example with rating (without parent set)."""
! 235 data_source = DataSource(
236 type=DataSourceType.synthetic,
237 properties={
238 "adapter_name": KILN_ADAPTER_NAME,
239 "model_name": KILN_COPILOT_MODEL_NAME,Lines 240-251 240 "model_provider": KILN_COPILOT_MODEL_PROVIDER,
241 },
242 )
243
! 244 rating_key = f"named::{spec_name}"
! 245 rating_value = 1.0 if example.user_says_meets_spec else 0.0
246
! 247 return TaskRun(
248 input=example.input,
249 input_source=data_source,
250 output=TaskOutput(
251 output=example.output,Lines 281-311 281 - Golden dataset (reviewed examples + unrated examples to reach MIN_GOLDEN_EXAMPLES)
282
283 Returns TaskRuns without parent set - caller must set parent.
284 """
! 285 task_runs: list[TaskRun] = []
286
287 # Sample examples for eval and train datasets
! 288 eval_examples = sample_and_remove(all_examples, MIN_EVAL_EXAMPLES)
! 289 train_examples = sample_and_remove(all_examples, MIN_TRAIN_EXAMPLES)
290
291 # Create TaskRuns for eval examples
! 292 for example in eval_examples:
! 293 task_runs.append(create_task_run_from_sample(example, eval_tag))
294
295 # Create TaskRuns for train examples
! 296 for example in train_examples:
! 297 task_runs.append(create_task_run_from_sample(example, train_tag))
298
299 # Create unrated golden examples from remaining pool if needed
! 300 unrated_golden_count = max(0, MIN_GOLDEN_EXAMPLES - len(reviewed_examples))
! 301 if unrated_golden_count > 0:
! 302 unrated_golden_examples = sample_and_remove(all_examples, unrated_golden_count)
! 303 for example in unrated_golden_examples:
! 304 task_runs.append(create_task_run_from_sample(example, golden_tag))
305
306 # Create TaskRuns for reviewed examples with ratings
! 307 for reviewed in reviewed_examples:
! 308 task_runs.append(create_task_run_from_reviewed(reviewed, golden_tag, spec_name))
309
! 310 return task_runs
|
There was a problem hiding this comment.
Code Review
This pull request refactors the spec creation logic, moving it from the TypeScript frontend (spec_persistence.ts) to the Python backend (spec_api.py). This is a significant and positive change, centralizing business logic on the server. A new endpoint /api/projects/{project_id}/tasks/{task_id}/spec_with_copilot is introduced to handle spec creation with Kiln Copilot, including eval creation, example generation, and judge configuration.
The review identified a few critical issues in the new backend implementation. Specifically, the persistence logic lacks transactionality, which could lead to an inconsistent state if an error occurs during file saving. There's also a bug in how JSON schemas are serialized before being sent to an external API. Additionally, there are some opportunities to improve code clarity and maintainability in both the frontend and backend code.
Overall, this is a great refactoring, but the identified issues, especially the critical ones, should be addressed before merging.
app/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/spec_builder/+page.svelte
Outdated
Show resolved
Hide resolved
| * These examples form the golden dataset for the spec's eval. | ||
| * user_says_meets_spec is optional in the UI (not yet reviewed) but required when sent to backend. | ||
| */ | ||
| export type ReviewRow = { |
There was a problem hiding this comment.
moved from spec_persistence
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@app/web_ui/src/routes/`(app)/specs/[project_id]/[task_id]/spec_utils.ts:
- Around line 22-35: Prettier formatting failed for the file containing the
ReviewRow type; run a code formatter (e.g., run prettier --write on the file) to
fix whitespace/formatting issues in the ReviewRow declaration and surrounding
comments, then stage and re-commit the updated file so the CI pipeline passes.
♻️ Duplicate comments (4)
libs/server/kiln_server/spec_api.py (4)
83-96: Renameinputto avoid shadowing the built‑inShadowing
input()is a frequent lint/source-of-confusion hotspot. Prefer an internal name with an alias so the external API remains unchanged.♻️ Suggested refactor
class ReviewedExample(BaseModel): @@ - input: str = Field(alias="input") + input_str: str = Field(alias="input") @@ def _create_task_run_from_reviewed( example: ReviewedExample, tag: str, spec_name: str ) -> TaskRun: @@ - input=example.input, + input=example.input_str,Pydantic v2 Field alias and populate_by_name usage for renaming internal fields while keeping external namesAlso applies to: 306-324
28-33: Keep libs/server free of desktop-only dependenciesImporting
app.desktop...couples the library to the desktop app and breaks portability/packaging. Please move this endpoint toapp/desktop/studio_serveror extract the copilot client/types into a shared package and depend on that instead.
1-2: Serialize schemas withjson.dumps(str() isn’t JSON)
str(dict)yields single quotes and is not valid JSON; downstream APIs expecting JSON strings will misparse it. Usejson.dumpsinstead.🐛 Proposed fix
-import random +import json +import random @@ - task_input_schema=str(task.input_json_schema) + task_input_schema=json.dumps(task.input_json_schema) if task.input_json_schema else "", - task_output_schema=str(task.output_json_schema) + task_output_schema=json.dumps(task.output_json_schema) if task.output_json_schema else "",Python str(dict) vs json.dumps — why str(dict) output is not valid JSONBased on learnings, avoid
str()for schema serialization.Also applies to: 526-533
563-569: Add rollback to avoid partial persistenceIf any save fails, you can leave orphaned Eval/Config/TaskRun files. A rollback cleanup (like the old TS flow) is needed to keep storage consistent.
🧹 Suggested rollback pattern
- eval_model.save_to_file() - eval_config.save_to_file() - for run in task_runs: - run.save_to_file() - spec.save_to_file() + saved_models: list[Eval | EvalConfig | TaskRun | Spec] = [] + try: + eval_model.save_to_file() + saved_models.append(eval_model) + eval_config.save_to_file() + saved_models.append(eval_config) + for run in task_runs: + run.save_to_file() + saved_models.append(run) + spec.save_to_file() + saved_models.append(spec) + except Exception: + for model in reversed(saved_models): + try: + model.delete() + except Exception: + pass + raise
app/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/spec_utils.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@app/desktop/studio_server/copilot_api.py`:
- Around line 490-498: The code passes Python repr strings to
_generate_copilot_examples by calling str(task.input_json_schema) and
str(task.output_json_schema); replace these with proper JSON serialization
(e.g., json.dumps(task.input_json_schema) and
json.dumps(task.output_json_schema)) and import json at the top, ensuring you
still pass an empty string when the schema is falsy (keep the existing
conditional). Update the arguments in the _generate_copilot_examples call
(task_input_schema and task_output_schema) to use json.dumps so the Copilot API
receives valid JSON strings.
In
`@app/web_ui/src/routes/`(app)/specs/[project_id]/[task_id]/spec_builder/+page.svelte:
- Around line 364-416: Remove the debug console.log calls in the spec creation
flow: delete the console.log statements referencing "use_kiln_copilot",
"judge_info", "data", and "api_error" inside the if (use_kiln_copilot) branch in
the +page.svelte spec builder (they appear around the POST to
"/spec_with_copilot"); leave error handling (throw api_error) intact and, if you
need persisted logging, replace with a proper logger call rather than
console.log.
- Line 39: Update the import of ReviewRow in +page.svelte to point to the parent
directory where spec_utils.ts actually lives; replace the current relative
import "./spec_utils.ts" with the correct "../spec_utils.ts" so the type
ReviewRow resolves from spec_utils.ts used by the spec_builder page.
🧹 Nitpick comments (7)
app/desktop/util/spec_creation.py (4)
34-38: RedundantField(alias="input")when field name matches alias.The
alias="input"is unnecessary here since the field is already namedinput. The alias only has effect when the alias differs from the field name. This can be simplified.Suggested simplification
class SampleApi(BaseModel): """A sample input/output pair.""" - input: str = Field(alias="input") + input: str output: str
41-54: Same redundant alias issue and missingmodel_configinSampleApi.
ReviewedExamplehaspopulate_by_name=Trueconfigured, butSampleApidoesn't. If these models are used with aliased JSON keys, consider addingmodel_configtoSampleApias well for consistency. Also, thealias="input"is redundant when the field name is alreadyinput.Suggested consistency fix
class SampleApi(BaseModel): """A sample input/output pair.""" - input: str = Field(alias="input") + input: str output: str + model_config = {"populate_by_name": True} + class ReviewedExample(BaseModel): """A reviewed example from the spec review process. Extends SampleApi with review-specific fields for tracking model and user judgments on spec compliance. """ - input: str = Field(alias="input") + input: str output: str model_says_meets_spec: bool user_says_meets_spec: bool feedback: str model_config = {"populate_by_name": True}
79-110: Missing default case inmatchstatement.The
spec_eval_templatefunction uses amatchstatement without a wildcardcase _:default. While Python won't raise an error, it will implicitly returnNonefor unhandledSpecTypevalues. If newSpecTypevariants are added in the future, this could silently returnNoneunexpectedly. Consider adding an explicit default case for clarity.Add explicit default case
case ( SpecType.tone | SpecType.formatting | SpecType.localization | SpecType.hallucinations | SpecType.completeness | SpecType.nsfw | SpecType.taboo | SpecType.prompt_leakage ): return None + case _: + return None
133-154: Unnecessarymodel_dumpwhen field name equals alias.Since the field name is
inputand the alias is also"input", you can accesssample.inputdirectly without needingmodel_dump(by_alias=True). The comment suggests uncertainty about alias behavior that doesn't apply here.Simplify direct field access
def create_task_run_from_sample(sample: SampleApi, tag: str) -> TaskRun: """Create a TaskRun from a SampleApi (without parent set).""" data_source = DataSource( type=DataSourceType.synthetic, properties={ "adapter_name": KILN_ADAPTER_NAME, "model_name": KILN_COPILOT_MODEL_NAME, "model_provider": KILN_COPILOT_MODEL_PROVIDER, }, ) - # Access input using model_dump since SampleApi uses alias - sample_dict = sample.model_dump(by_alias=True) return TaskRun( - input=sample_dict["input"], + input=sample.input, input_source=data_source, output=TaskOutput( output=sample.output, source=data_source, ), tags=[tag], )app/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/spec_builder/+page.svelte (2)
350-362: Potential runtime error ifvalueis not a string.The filter uses
value.trim()butvalueis typed asstring | null. While the!== nullcheck filters nulls, ifvaluescontains non-string values at runtime (e.g., numbers or booleans from misconfiguration), this would throw. Consider adding a type guard.Add type safety
// Build properties object with spec_type, filtering out null and empty values const filteredValues = Object.fromEntries( Object.entries(values).filter( - ([_, value]) => value !== null && value.trim() !== "", + ([_, value]) => typeof value === "string" && value.trim() !== "", ), )
394-412: Non-copilot path has incomplete implementation (TODO comment).The TODO at line 406 indicates this endpoint doesn't create the eval with eval tags. This may result in incomplete spec creation for non-copilot flows. Consider either implementing the missing functionality or documenting this as a known limitation.
Would you like me to help implement the non-copilot eval creation, or should this be tracked as a separate issue?
app/desktop/studio_server/copilot_api.py (1)
527-553: Cleanup logic re-raises the original exception without context.The cleanup logic properly attempts to delete saved models on failure, but the bare
raisewill propagate the original exception. Consider wrapping in anHTTPExceptionwith a 500 status to ensure consistent API error responses, or at minimum log the original error before cleanup.Improve error handling
saved_models: list[Eval | EvalConfig | TaskRun | Spec] = [] try: eval_model.save_to_file() saved_models.append(eval_model) eval_config.save_to_file() saved_models.append(eval_config) for run in task_runs: run.save_to_file() saved_models.append(run) spec.save_to_file() saved_models.append(spec) - except Exception: + except Exception as e: + logger.exception("Failed to save spec with copilot, rolling back") # Clean up any models that were successfully saved before the error for model in reversed(saved_models): try: model.delete() except Exception: # Log cleanup error but continue, the original error is more important logger.exception( f"Failed to delete {type(model).__name__} during cleanup" ) - raise + raise HTTPException( + status_code=500, + detail=f"Failed to create spec: {str(e)}", + ) from e
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.