Skip to content

Spec schema update#992

Merged
sfierro merged 8 commits intosfierro/specsfrom
sfierro/spec-update-schema-012726
Jan 28, 2026
Merged

Spec schema update#992
sfierro merged 8 commits intosfierro/specsfrom
sfierro/spec-update-schema-012726

Conversation

@sfierro
Copy link
Contributor

@sfierro sfierro commented Jan 27, 2026

Summary by CodeRabbit

  • Refactor

    • Consolidated synthetic data generation into a session-based config (sdg_session_config), replacing separate topic/input generation fields with a unified session+step config.
  • New Features

    • Added structured synthetic data generation types (session & step configs) across API, UI, and data models to standardize generation workflows.
    • Updated refine output model to a new API-specific refine output used consistently in responses and tests.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

Walkthrough

Consolidates per-task prompt-generation models into session-level synthetic-data configs, renames several models (e.g., RefineSpecOutputRefineSpecApiOutput, PromptGenerationResultSyntheticDataGenerationStepConfig), and replaces separate topic/input generation fields with a unified sdg_session_config across client, server, core datamodel, tests, and web UI types.

Changes

Cohort / File(s) Summary
API Client Refine Spec Endpoints
app/desktop/studio_server/api_client/kiln_ai_server_client/api/copilot/refine_spec_v1_copilot_refine_spec_post.py, app/desktop/studio_server/api_client/kiln_ai_server_client/api/copilot/refine_spec_with_answers_v1_copilot_refine_spec_with_answers_post.py
Swapped RefineSpecOutputRefineSpecApiOutput in imports, parsing, response building, and public function signatures/docstrings.
API Client Models — Synthetic Data Generation
app/desktop/studio_server/api_client/kiln_ai_server_client/models/synthetic_data_generation_session_config.py, .../synthetic_data_generation_session_config_input.py, .../synthetic_data_generation_step_config.py, .../synthetic_data_generation_step_config_input.py
Added session/step config models (attrs-based) with mapping-like additional_properties and to_dict/from_dict support.
API Client Models — Consolidation & Renames
app/desktop/studio_server/api_client/kiln_ai_server_client/models/generate_batch_input.py, .../clarify_spec_output.py, .../refine_spec_api_output.py, .../new_proposed_spec_edit_api.py, .../__init__.py
Removed topic/input task fields in favor of sdg_session_config; renamed several models (RefineSpecOutputRefineSpecApiOutput, NewProposedSpecEditNewProposedSpecEditApi) and updated exports.
Server API Models & Tests
app/desktop/studio_server/api_models/copilot_models.py, app/desktop/studio_server/api_models/test_copilot_models.py
Added SyntheticDataGenerationStepConfigApi and SyntheticDataGenerationSessionConfigApi; updated GenerateBatchApiInput, ClarifySpecApiOutput, and test shapes to use session-level configs.
Copilot Implementation & Utilities
app/desktop/studio_server/copilot_api.py, app/desktop/studio_server/utils/copilot_utils.py
Wired sdg_session_config through spec creation and generate_batch flows; updated CreateSpecWithCopilotRequest and utility signatures to accept sdg_session_config; switched client aliasing to RefineSpecApiOutput variants.
Server Tests
app/desktop/studio_server/test_copilot_api.py
Updated mocks, imports, and assertions to use RefineSpecApiOutput and consolidated sdg_session_config structures.
Web UI Types & Schema
app/web_ui/src/lib/api_schema.d.ts, app/web_ui/src/lib/types.ts
Replaced PromptGeneration* API types with SyntheticDataGenerationStepConfig / SyntheticDataGenerationSessionConfig API variants; removed legacy PromptGeneration types.
Web UI Component
app/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/spec_builder/+page.svelte
Replaced topic/input generation state with sdg_session_config, updated imports, validation, and payload construction to use session config; minor task schema handling simplification.
Core Datamodel & Tests
libs/core/kiln_ai/datamodel/spec.py, libs/core/kiln_ai/datamodel/test_spec.py
Renamed PromptGenerationInfoSyntheticDataGenerationStepConfig; added SyntheticDataGenerationSessionConfig; replaced per-field topic/input info on Spec with synthetic_data_generation_session_config.
UI Presentation / Misc
app/web_ui/src/routes/.../animations/refining_animation.svelte, .../saving_animation.svelte, .../review_examples.svelte
Minor UI sizing and spacing tweaks; review table DOM simplified for content rendering.

Sequence Diagram(s)

(Skipped — changes are model/shape refactors without new multi-component control flow that requires visualization.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • leonardmq
  • chiang-daniel

Poem

🐰 I hop through code and change a name,

From prompts to sessions, configs reclaim.
I stitch the pieces, tidy each field,
One sdg_session — cleanly revealed.
🥕 Hop, serialize, and softly yield.

🚥 Pre-merge checks | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is entirely missing; no content was provided by the author despite a clear template requiring 'What does this PR do?' and other sections. Add a comprehensive description explaining the refactoring from PromptGenerationInfo to SyntheticDataGenerationSessionConfig, the consolidation of per-task configs into session-based configs, and the impact on affected files and APIs.
Docstring Coverage ⚠️ Warning Docstring coverage is 23.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Spec schema update' is vague and generic, describing the general nature of changes without conveying the specific intent or key modifications made to the specification schema. Consider using a more descriptive title that highlights the main structural change, such as 'Refactor synthetic data generation config from per-task to session-based structure' or 'Replace PromptGenerationInfo with SyntheticDataGenerationSessionConfig'.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sfierro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a significant schema update, primarily aimed at enhancing the clarity and structure of data models related to API interactions and synthetic data generation. It involves renaming existing models to better reflect their API context and introducing dedicated configurations for synthetic data generation steps and sessions. This refactoring improves the expressiveness of the codebase and sets a clearer foundation for future development in data generation capabilities.

Highlights

  • Schema Renaming for Clarity: The models RefineSpecOutput and NewProposedSpecEdit have been consistently renamed to RefineSpecApiOutput and NewProposedSpecEditApi respectively across the codebase. This change improves clarity by explicitly indicating their role as API-specific data structures.
  • Introduction of Synthetic Data Generation Models: New schema models, SyntheticDataGenerationSessionConfig and SyntheticDataGenerationStepConfig, have been introduced. These models provide a more structured and explicit way to define configurations for synthetic data generation processes, replacing the more generic PromptGenerationResult and PromptGenerationInfo.
  • Refactoring of API Endpoints and Data Models: Existing API endpoints, data models, and UI components have been updated to utilize the newly named and introduced synthetic data generation models. This includes changes in ClarifySpecOutput, GenerateBatchInput, and the core Spec data model, streamlining how generation configurations are handled.
  • Comprehensive Codebase Update: The changes span across API client definitions, server-side logic, utility functions, TypeScript schema definitions for the frontend, and Svelte components, ensuring full integration and consistency with the updated schema.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Jan 27, 2026

📊 Coverage Report

Overall Coverage: 91%

Diff: origin/sfierro/specs...HEAD

  • app/desktop/studio_server/api_models/copilot_models.py (100%)
  • app/desktop/studio_server/copilot_api.py (55.6%): Missing lines 268,379-381
  • libs/core/kiln_ai/datamodel/spec.py (100%)

Summary

  • Total: 23 lines
  • Missing: 4 lines
  • Coverage: 82%

Line-by-line

View line-by-line diff coverage

app/desktop/studio_server/copilot_api.py

Lines 264-272

  264                 status_code=422,
  265                 detail=f"Validation error: {result.to_dict()}",
  266             )
  267 
! 268         if isinstance(result, RefineSpecApiOutputClient):
  269             return RefineSpecApiOutput.model_validate(result.to_dict())
  270 
  271         raise HTTPException(
  272             status_code=500,

Lines 375-385

  375             run.parent = task
  376         models_to_save.extend(task_runs)
  377 
  378         # 5. Create the Spec using pre-computed definition and properties from client
! 379         topic_generation_config = request.sdg_session_config.topic_generation_config
! 380         input_generation_config = request.sdg_session_config.input_generation_config
! 381         output_generation_config = request.sdg_session_config.output_generation_config
  382 
  383         spec = Spec(
  384             parent=task,
  385             name=request.name,


Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the spec schema by renaming several models related to prompt generation and refining specifications. Specifically, RefineSpecOutput is renamed to RefineSpecApiOutput, and PromptGenerationResult is replaced by new SyntheticDataGenerationStepConfig and SyntheticDataGenerationSessionConfig models. These changes are propagated throughout the API client, API models, and UI components. While the core functionality seems preserved, there are opportunities to enhance the clarity and conciseness of docstrings for the newly introduced or renamed models, particularly in the auto-generated client code.

"""
client = get_authenticated_client(api_key)

generate_input = GenerateBatchInput.from_dict(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: doing model_dump and from_dict is just 2 unnecessary transforms? Or more "slightly different models" issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed a slightly different models issue

provider_name=request.input_generation_info.task_metadata.model_provider_name,
prompt=request.input_generation_info.prompt,
synthetic_data_generation_session_config=SyntheticDataGenerationSessionConfig(
topic_generation_config=SyntheticDataGenerationStepConfig(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't just set "topic_generation_config"?

prompt: str = Field(description="The prompt used for generation.")


class SyntheticDataGenerationSessionConfig(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the studio_server import this? saves us another dupe definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we technically can but the Api models are supposed to match the server API, whereas we can store stuff on spec however we like and it seemed silly to wrap task model/provider in a TaskMetadata on the spec data model side since it's not used anywhere else (yet?) so I flattened it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is that maybe it grows and changes over time. We add a "server_resource_id" or "magic_potion_formula". We just want whatever comes back from clarify to go to generate, and don't really want to own the details in the client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we're in spec.py here, this is just the model object itself. so if server resource id gets added this is exactly why we have this, to NOT store it on the spec model object on disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do pass through exactly the same thing from clarify to generate, that is the API object in a different file (copilot_models)


class PromptGenerationInfo(BaseModel):
"""Information about a prompt generation step during copilot spec creation."""
class SyntheticDataGenerationStepConfig(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the studio_server import this? saves us another dupe definition.

@sfierro sfierro merged commit d2237b7 into sfierro/specs Jan 28, 2026
10 checks passed
@sfierro sfierro deleted the sfierro/spec-update-schema-012726 branch January 28, 2026 05:01
This was referenced Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants