Skip to content

refactor: prune ZIP for prompt optimization job#1031

Merged
leonardmq merged 7 commits intoleonard/kil-348-gepa-billing-payment-credit-2from
leonard/kil-403-gepa-exclude-docs-from-zip
Feb 14, 2026
Merged

refactor: prune ZIP for prompt optimization job#1031
leonardmq merged 7 commits intoleonard/kil-348-gepa-billing-payment-credit-2from
leonard/kil-403-gepa-exclude-docs-from-zip

Conversation

@leonardmq
Copy link
Collaborator

@leonardmq leonardmq commented Feb 14, 2026

What does this PR do?

PR going into #1029

Changes:

  • refactor: filter out artefacts we do not need in ZIP that goes for prompt optimization
  • refactor: error handling in start job endpoint

Specifically, we exclude the following artefacts:

  • documents
  • other tasks
  • eval runs (we need task runs tagged with the eval tags)

Checklists

  • Tests have been run locally and passed
  • New tests have been added to any work in /lib

Summary by CodeRabbit

  • New Features

    • Programmatic packaging API for training packages with options to include documents and exclude task/ eval-config runs.
  • Improvements

    • Unified packaging flow using temporary artifacts and clearer server-response parsing; more specific submission/validation error messages.
    • Non-CLI validation and prompt-building interfaces for library use.
  • Refactor

    • Removed legacy in-process ZIP creation; packaging delegated to new helper.
  • Tests

    • Expanded tests for packaging, exports, and error scenarios; removed token_budget from APIs/tests.

@leonardmq leonardmq changed the title refactor: refactor: ZIP for prompt optimization job Feb 14, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 14, 2026

Walkthrough

Replaces in-file ZIP creation in the GEPA job flow with library packaging via package_project_for_training (temporary-file ZIP), updates GEPA start to call SDK asyncio_detailed with check_response_error, removes token_budget from models and requests, and adds non-CLI packaging/validation helpers and tests.

Changes

Cohort / File(s) Summary
GEPA Job API
app/desktop/studio_server/gepa_job_api.py
Removed internal zip_project; call package_project_for_training to write a ZIP in a temp dir, read bytes to build SDK File payload, switched start path to .asyncio_detailed, and added uniform response validation via check_response_error.
GEPA Job Tests
app/desktop/studio_server/test_gepa_job_api.py
Updated tests to mock package_project_for_training, added helpers _mock_package_for_training and _make_sdk_response, targeted .asyncio_detailed SDK calls, and adjusted expectations for upstream validation and response parsing; removed token_budget assertions.
Packaging Library
libs/core/kiln_ai/cli/commands/package_project.py
Added non-CLI validation/build helpers (validate_tasks_noncli, validate_and_build_prompts_noncli), PackageForTrainingConfig dataclass, export helpers (export_documents, export_task_runs, export_evals), and library-facing package_project_for_training that writes a project ZIP to a provided output path.
Packaging Tests
libs/core/kiln_ai/cli/commands/test_package_project.py
Added tests and fixtures for non-CLI validators and packaging/export flows; cover documents, evals, task runs, and package_project_for_training.
API client models
app/desktop/studio_server/api_client/kiln_ai_server_client/models/__init__.py, .../body_start_gepa_job_v1_jobs_gepa_job_start_post.py, .../body_start_gepa_job_v1_jobs_gepa_job_start_post_token_budget.py
Removed token_budget enum model and its import/export; removed token_budget field from BodyStartGepaJobV1JobsGepaJobStartPost serialization/deserialization and constructor.
Web UI
app/web_ui/src/lib/api_schema.d.ts, app/web_ui/src/routes/(app)/gepa/[project_id]/[task_id]/create_gepa/+page.svelte
Removed token_budget from TypeScript API schema and from the start-job POST payload; UI no longer sends token_budget.
Datamodel & tests
libs/core/kiln_ai/datamodel/gepa_job.py, libs/core/kiln_ai/datamodel/test_gepa_job.py, libs/core/kiln_ai/datamodel/test_task.py
Removed token_budget field from GepaJob datamodel and updated tests to stop constructing or asserting token_budget.

Sequence Diagram(s)

sequenceDiagram
  participant Studio as Studio Server
  participant Packager as package_project_for_training
  participant FS as Temp File (tmp dir)
  participant SDK as Kiln SDK (asyncio_detailed)
  participant Checker as check_response_error

  Studio->>Packager: package_project_for_training(project, task_ids, run_config_id, eval_ids, output, config)
  Packager->>FS: write ZIP to output path
  Packager-->>Studio: returns path/to/zip
  Studio->>FS: read ZIP bytes -> build File payload
  Studio->>SDK: start_gepa_job.asyncio_detailed(File payload, other params)
  SDK-->>Checker: raw SDK response
  Checker-->>Studio: validated/parsed response or raise error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • scosman
  • chiang-daniel
  • sfierro

Poem

🐰 I hopped through code with nimble feet,

swapped zips for packages tidy and neat,
Temp dirs hum as bytes are read,
SDK called, responses checked with thread,
Hop—pack—start—training's on repeat! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: refactoring to prune ZIP artifacts for prompt optimization jobs, which matches the core objective of filtering out unnecessary artifacts.
Description check ✅ Passed The description covers key objectives (artifact exclusion and error handling) and marks both required checklists as complete, though it lacks a dedicated Related Issues section per the template.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into leonard/kil-348-gepa-billing-payment-credit-2

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch leonard/kil-403-gepa-exclude-docs-from-zip

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @leonardmq, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the project packaging mechanism for GEPA (Prompt Optimization) jobs and strengthens the API's error handling. By integrating a new, configurable packaging utility, the system can now precisely control the inclusion of project artifacts in the generated ZIP files, optimizing them for specific training scenarios. Concurrently, the API's robustness is improved through standardized error checking, leading to clearer feedback on server interactions. The frontend has also been updated to reflect these changes by removing reliance on static dummy data.

Highlights

  • Refactored Project Packaging for Prompt Optimization: The process of creating project ZIP files for prompt optimization has been refactored to use a new package_project_for_training utility. This new function allows for fine-grained control over which artifacts are included, specifically excluding documents, other tasks, and eval runs by default for prompt optimization purposes.
  • Improved Error Handling in GEPA Job API: Error handling in the start_gepa_job endpoint has been enhanced. It now uses asyncio_detailed and a new check_response_error utility to better surface upstream validation errors and handle unexpected responses from the server, providing more informative error messages.
  • Removed Dummy Data from GEPA UI: The GEPA Svelte page (+page.svelte) no longer uses dummy data for displaying GEPA jobs. It now consistently relies on actual API calls to fetch and display real job data.
  • New Project Packaging Utility: A new library function package_project_for_training has been introduced in kiln_ai.cli.commands.package_project to provide a configurable way to package projects for training, including options to selectively export evals, documents, and task runs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • app/desktop/studio_server/gepa_job_api.py
    • Replaced zipfile import with tempfile and removed cast import.
    • Added imports for check_response_error, PackageForTrainingConfig, and package_project_for_training.
    • Removed the zip_project function.
    • Modified start_gepa_job to use task.parent_project() for project retrieval.
    • Replaced custom ZIP creation logic with a call to package_project_for_training, configured to exclude documents, other tasks, and eval config runs.
    • Updated error handling in start_gepa_job to use asyncio_detailed and check_response_error for server response validation.
  • app/desktop/studio_server/test_gepa_job_api.py
    • Added imports for json, HTTPStatus, and SdkResponse.
    • Introduced helper functions _mock_package_project_for_training and _make_sdk_response for testing.
    • Removed zip_project from imports and associated tests.
    • Updated test_start_gepa_job_creates_datamodel to use package_project_for_training mock and asyncio_detailed.
    • Added test_start_gepa_job_calls_package_with_correct_params to verify arguments passed to the new packaging function.
    • Modified error handling tests (test_start_gepa_job_server_validation_error, test_start_gepa_job_server_none_response) to use asyncio_detailed and _make_sdk_response, and updated expected error messages.
    • Updated test_start_gepa_job_connection_error, test_start_gepa_job_timeout_error, and test_start_gepa_job_general_exception to mock package_project_for_training and use asyncio_detailed.
  • app/web_ui/src/routes/(app)/gepa/[project_id]/[task_id]/+page.svelte
    • Removed imports for DUMMY_GEPA_JOBS and the USE_DUMMY_DATA constant.
    • Eliminated the conditional block in onMount that previously used dummy data, ensuring real API calls are always made.
  • libs/core/kiln_ai/cli/commands/package_project.py
    • Added dataclass import and defined PackageForTrainingConfig for packaging options.
    • Implemented _ignore_eval_config_runs for selective file copying.
    • Added export_evals, export_documents, and export_task_runs functions for specific artifact export.
    • Introduced _get_run_config_by_id helper function.
    • Implemented package_project_for_training as a new library function for training-specific project packaging with configurable options.
  • libs/core/kiln_ai/cli/commands/test_package_project.py
    • Added imports for various datamodel components related to evals and task runs.
    • Imported new packaging-related functions and classes.
    • Introduced temp_project_with_evals fixture for comprehensive testing scenarios.
    • Added TestExportEvals, TestExportDocuments, TestPackageProjectForTraining, and TestExportTaskRuns classes with extensive tests for the new packaging logic and configurations.
Activity
  • A new utility function package_project_for_training has been introduced to centralize project packaging logic, replacing a custom ZIP creation method.
  • This new utility has been integrated into the GEPA job API, allowing for more controlled inclusion of project artifacts.
  • Error handling for API calls to the server has been standardized using a new check_response_error utility, improving error visibility.
  • Extensive unit tests have been added and updated across multiple files to ensure the correctness and robustness of the new packaging logic and improved error handling.
  • The frontend for GEPA jobs has been updated to remove reliance on dummy data, ensuring it fetches and displays real job data from the API.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@leonardmq
Copy link
Collaborator Author

/gemini review

@leonardmq
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link

github-actions bot commented Feb 14, 2026

📊 Coverage Report

Overall Coverage: 91%

Diff: origin/leonard/kil-348-gepa-billing-payment-credit-2...HEAD

  • app/desktop/studio_server/gepa_job_api.py (100%)
  • libs/core/kiln_ai/cli/commands/package_project.py (92.9%): Missing lines 445,817,829,848,850,868,870,889

Summary

  • Total: 129 lines
  • Missing: 8 lines
  • Coverage: 93%

Line-by-line

View line-by-line diff coverage

libs/core/kiln_ai/cli/commands/package_project.py

Lines 441-449

  441     task_prompts: dict[str, Prompt] = {}
  442 
  443     for task in tasks:
  444         if task.id is None:
! 445             raise ValueError(f"Task '{task.name}' ID is not set")
  446 
  447         run_config = run_configs[task.id]
  448         prompt_id = run_config.run_config_properties.prompt_id

Lines 813-821

  813     Returns:
  814         List of eval IDs that were actually exported (intersection of eval_ids and task's evals)
  815     """
  816     if exported_task.path is None:
! 817         raise ValueError("Exported task path is not set")
  818 
  819     exported_eval_ids: list[str] = []
  820     evals = task.evals(readonly=True)
  821     eval_ids_set = set(eval_ids)

Lines 825-833

  825     for eval_obj in evals:
  826         if eval_obj.id not in eval_ids_set:
  827             continue
  828         if eval_obj.path is None:
! 829             raise ValueError(f"Eval '{eval_obj.name}' path is not set")
  830 
  831         eval_dir = eval_obj.path.parent
  832         dest_dir = exported_task.path.parent / "evals" / eval_dir.name

Lines 844-854

  844         project: The source project
  845         exported_project: The exported project to copy documents into
  846     """
  847     if project.path is None:
! 848         raise ValueError("Project path is not set")
  849     if exported_project.path is None:
! 850         raise ValueError("Exported project path is not set")
  851 
  852     source_docs_dir = project.path.parent / "documents"
  853     if not source_docs_dir.exists() or not source_docs_dir.is_dir():
  854         return

Lines 864-874

  864         task: The source task containing the runs
  865         exported_task: The exported task to copy runs into
  866     """
  867     if task.path is None:
! 868         raise ValueError(f"Task '{task.name}' path is not set")
  869     if exported_task.path is None:
! 870         raise ValueError("Exported task path is not set")
  871 
  872     source_runs_dir = task.path.parent / "runs"
  873     if not source_runs_dir.exists() or not source_runs_dir.is_dir():
  874         return

Lines 885-893

  885     """
  886     run_configs = task.run_configs()
  887     run_config = next((rc for rc in run_configs if rc.id == run_config_id), None)
  888     if not run_config:
! 889         raise ValueError(
  890             f"Run config '{run_config_id}' not found for task '{task.name}' (ID: {task.id})"
  891         )
  892     return run_config


…b.com:Kiln-AI/Kiln into leonard/kil-403-gepa-exclude-docs-from-zip
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the project packaging mechanism for prompt optimization jobs, replacing a manual zipping function with the more robust and configurable package_project_for_training function and improving error handling in the job start endpoint. However, it introduces potential security risks related to memory management and insecure file handling. Specifically, reading large ZIP files entirely into memory could lead to Denial of Service via OOM crashes, and the use of shutil.copytree without explicit symlink handling might accidentally include sensitive local files. Additionally, there are opportunities to reduce code duplication between packaging functions and simplify a redundant check in the API error handling.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@app/desktop/studio_server/gepa_job_api.py`:
- Around line 501-527: The current use of tempfile.NamedTemporaryFile keeps an
open handle (tmp) while package_project_for_training writes to the same path,
causing PermissionError on Windows; change to tempfile.TemporaryDirectory (or
create a path inside it) so you produce an output path (e.g., Path(tmpdir) /
"kiln_gepa_project.zip") and pass that to package_project_for_training, then
read the bytes from that path into zip_bytes and construct the File (keep
references to package_project_for_training, tmp_file/Path, and File in your
edits); this ensures no open handle blocks writing and the temporary directory
is cleaned up after use.

In `@libs/core/kiln_ai/cli/commands/package_project.py`:
- Around line 873-883: The code calls CLI-oriented helpers (validate_tasks and
validate_and_build_prompts) that use typer.Exit/typer.confirm; replace those
interactive flows with non-CLI variants and surface hard errors in
package_project_for_training. Add or call new non-interactive helpers (e.g.,
validate_tasks_noncli and validate_and_build_prompts_noncli) that raise plain
exceptions (ValueError/ValidationError) instead of invoking typer.Exit or
prompting, and update package_project_for_training to use these non-CLI
validators (removing any typer.confirm usage) so it never blocks on stdin or
exits the process. Ensure the new validators accept the same inputs (task_ids,
project, run_config_id/run_configs) and return the same validated_tasks and
task_prompts shapes so the rest of the logic (collect_required_tool_servers,
run_configs population) remains unchanged.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the project packaging logic for prompt optimization jobs by replacing the local zip_project function with a more robust and configurable package_project_for_training from the core library, allowing for more granular control over packaged artifacts. It also significantly improves error handling in the start_gepa_job endpoint using asyncio_detailed and a centralized check_response_error utility. However, a critical security vulnerability has been introduced related to symbolic link following during the packaging process, specifically with shutil.copytree without proper symlink handling, which could lead to arbitrary file exfiltration. The start_gepa_job endpoint is also vulnerable to memory exhaustion when processing large project archives. Furthermore, a critical issue was identified where a non-interactive library function calls an interactive prompt, potentially blocking the server, and a minor redundancy exists in the new error handling logic.

@leonardmq leonardmq changed the title refactor: ZIP for prompt optimization job refactor: prune ZIP for prompt optimization job Feb 14, 2026
…ht-medium-high

refactor: remove token budget from app
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@app/desktop/studio_server/gepa_job_api.py`:
- Around line 529-543: The current flow calls
check_response_error(detailed_response, ...) which raises an HTTPException for
any non-200 status (including 422), so the subsequent isinstance(response,
HTTPValidationError) branch in the start_gepa_job_v1_jobs_gepa_job_start_post
handling is unreachable; fix by handling validation errors before calling
check_response_error: inspect detailed_response.status_code (or
detailed_response.parsed) immediately after awaiting
start_gepa_job_v1_jobs_gepa_job_start_post.asyncio_detailed and if status_code
== 422 and isinstance(detailed_response.parsed, HTTPValidationError) raise a 422
HTTPException with the parsed validation error, otherwise call
check_response_error(detailed_response, ...) and proceed to use response as
before.
🧹 Nitpick comments (1)
app/desktop/studio_server/gepa_job_api.py (1)

496-519: Blocking I/O in async handler — acceptable per past pragmatism, but worth noting.

package_project_for_training and tmp_file.read_bytes() are synchronous/blocking calls inside an async def endpoint. For large projects, this could block the event loop noticeably. Consider wrapping in asyncio.to_thread(...) if ZIP sizes grow.

That said, per your past preference for deferring such fixes for user-triggered, low-concurrency operations, this is fine for now.

Comment on lines +529 to +543
detailed_response = (
await start_gepa_job_v1_jobs_gepa_job_start_post.asyncio_detailed(
client=server_client, body=body
)
raise HTTPException(status_code=422, detail=error_detail)
)
check_response_error(
detailed_response,
default_detail="Failed to start GEPA job: unexpected error from server",
)

if response is None:
response = detailed_response.parsed
if response is None or isinstance(response, HTTPValidationError):
raise HTTPException(
status_code=500,
detail="Failed to start GEPA job: No response from server",
detail="Failed to start GEPA job: unexpected response from server",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find check_response_error function definition
rg -n -A 30 'def check_response_error' --type py

Repository: Kiln-AI/Kiln

Length of output: 1720


HTTPValidationError isinstance check is unreachable for 422 responses

check_response_error raises HTTPException immediately on any non-200 status code (including 422). The isinstance(response, HTTPValidationError) check at line 540 cannot be reached when a 422 status is returned, since the exception is raised first. Remove the unreachable check or restructure the error handling to distinguish between different failure modes.

🤖 Prompt for AI Agents
In `@app/desktop/studio_server/gepa_job_api.py` around lines 529 - 543, The
current flow calls check_response_error(detailed_response, ...) which raises an
HTTPException for any non-200 status (including 422), so the subsequent
isinstance(response, HTTPValidationError) branch in the
start_gepa_job_v1_jobs_gepa_job_start_post handling is unreachable; fix by
handling validation errors before calling check_response_error: inspect
detailed_response.status_code (or detailed_response.parsed) immediately after
awaiting start_gepa_job_v1_jobs_gepa_job_start_post.asyncio_detailed and if
status_code == 422 and isinstance(detailed_response.parsed, HTTPValidationError)
raise a 422 HTTPException with the parsed validation error, otherwise call
check_response_error(detailed_response, ...) and proceed to use response as
before.

@leonardmq leonardmq merged commit 6df2011 into leonard/kil-348-gepa-billing-payment-credit-2 Feb 14, 2026
10 checks passed
@leonardmq leonardmq deleted the leonard/kil-403-gepa-exclude-docs-from-zip branch February 14, 2026 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants