Skip to content

dev#138

Merged
pelikhan merged 166 commits intomainfrom
dev
Jul 1, 2025
Merged

dev#138
pelikhan merged 166 commits intomainfrom
dev

Conversation

@pelikhan
Copy link
Copy Markdown
Member

@pelikhan pelikhan commented Jun 2, 2025

No description provided.

pelikhan and others added 30 commits June 2, 2025 23:45
* 🎛️ feat: switch evalModel to evalModelSet for test evaluation

Replaces evalModel with evalModelSet, allowing multiple evaluation models.

* ✨ feat: add multi-model evaluation to metrics and compliance

Support evaluating metrics and compliance with multiple models via evalModelSet.

* ✨ Refine evaluation model handling and debug logging

Improved evalModelSet defaults, header levels, and debugging output.

* ✨ Enhance evalModelSet sourcing and logging in promptpex

Now supports sourcing evalModelSet from env var, adds validation, and logging.

* ✨ refactor test metric evaluation and overview model handling

Refined evalModelSet parsing and updated test metric iteration logic.

* ✨ feat: Add combined avg metric across eval models

Compute and store average metric score for all evaluation models used.

* ✨ Enhance promptpex test evaluation and script logic

Added separate eval-only/test-run modes, improved metric evaluations

* ♻️ Rename evalModelSet to evalModel throughout codebase

Standardizes config and variable naming from evalModelSet to evalModel.
…#141)

* ✨ Enhance test results saving and eval metrics workflow

Improved control of results file writing and evaluation metrics assignment.

* ✨ Add evals config flag to control evaluation execution

Introduces evals boolean for toggling evaluation of test results.

* ✨: Enable direct context-loading from JSON files

Refactored CLI to load PromptPexContext from JSON, updating file flow.
* ✨ Add scripts and logic for multi-stage sample evaluations

Introduces zx scripts for gen/run/eval sample tests and conditional test executions.

* 🔀 rename: Samples scripts renamed to .zx.mjs extensions

All run-samples-*.mjs scripts updated to .zx.mjs for zx compatibility.

* ♻️ refactor: Rename sample scripts to .zx.mjs extensions

Updated script names in package.json and renamed a sample file for zx compatibility
Introduces groundtruth model option, result tracking, and output storage.
Extended PromptPexTest and PromptPexTestResult with groundtruth support.
Add lmstudio to settings, expand UI model suggestions, tidy runTests.
✨ Add support for groundtruth model and outputs
pelikhan and others added 24 commits June 19, 2025 14:45
Expanded glossary and updated diagrams to standardize GTM terms.
* feat: add groundtruth option and related parameters for test generation

* feat: add model alias for groundtruth evaluation

* feat: add model_under_test alias and update related logic in prompt generation

* feat: update groundtruth model handling and rename constants for clarity

* Update src/genaisrc/src/testrun.mts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: add implementation plan documentation for PromptPex framework

* docs: enhance implementation plan with validation steps for test generation

* new plan

* docs: update implementation plan phases and add additional features
* ✨ Label tests with unique IDs and propagate testuid

Added unique testuid to each test and test result; updated logic to use it.

* ✨ add testuid to test run output and update indexing logic

Test run data now includes testuid; testuid index starts from 0.

* ✨: Unleash Unique IDs in PromptPex Tests with nanoid

Integrated nanoid for generating unique, consistent test UIDs.

* ✨ Fix testuid template and ensure strict equality in search

Corrected testuid generation format and used strict equality for lookup.

* Update src/genaisrc/src/testevalmetric.mts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* ✨ add groundtruth test output support to promptpex

Introduce groundtruth test results file loading and parsing support.

* ✏️ fix typo in PromptPexContext groundtruth comment
Corrected 'Groudtruth' to 'Groundtruth' in the documentation comment.

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Peli de Halleux <pelikhan@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Upgraded genaiscript from version 1.142.15 to 2.0.8
- Updated openai from version 5.5.1 to 5.7.0
- Bumped @types/node from 24.0.3 to 24.0.4
- Upgraded prettier from 3.5.3 to 3.6.1
- Updated zx from 8.5.5 to 8.6.0
- Modified promptpex:demo:github script to include .github.env for environment variables
@pelikhan pelikhan requested review from bzorn and shraddhabarke July 1, 2025 19:50
@pelikhan pelikhan merged commit 3431021 into main Jul 1, 2025
7 of 8 checks passed
@pelikhan pelikhan deleted the dev branch July 1, 2025 19:54
@pelikhan pelikhan restored the dev branch July 1, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants