dev by pelikhan · Pull Request #138 · microsoft/promptpex

pelikhan · 2025-06-02T23:46:11Z

No description provided.

* 🎛️ feat: switch evalModel to evalModelSet for test evaluation Replaces evalModel with evalModelSet, allowing multiple evaluation models. * ✨ feat: add multi-model evaluation to metrics and compliance Support evaluating metrics and compliance with multiple models via evalModelSet. * ✨ Refine evaluation model handling and debug logging Improved evalModelSet defaults, header levels, and debugging output. * ✨ Enhance evalModelSet sourcing and logging in promptpex Now supports sourcing evalModelSet from env var, adds validation, and logging. * ✨ refactor test metric evaluation and overview model handling Refined evalModelSet parsing and updated test metric iteration logic. * ✨ feat: Add combined avg metric across eval models Compute and store average metric score for all evaluation models used. * ✨ Enhance promptpex test evaluation and script logic Added separate eval-only/test-run modes, improved metric evaluations * ♻️ Rename evalModelSet to evalModel throughout codebase Standardizes config and variable naming from evalModelSet to evalModel.

…#141) * ✨ Enhance test results saving and eval metrics workflow Improved control of results file writing and evaluation metrics assignment. * ✨ Add evals config flag to control evaluation execution Introduces evals boolean for toggling evaluation of test results. * ✨: Enable direct context-loading from JSON files Refactored CLI to load PromptPexContext from JSON, updating file flow.

* ✨ Add scripts and logic for multi-stage sample evaluations Introduces zx scripts for gen/run/eval sample tests and conditional test executions. * 🔀 rename: Samples scripts renamed to .zx.mjs extensions All run-samples-*.mjs scripts updated to .zx.mjs for zx compatibility. * ♻️ refactor: Rename sample scripts to .zx.mjs extensions Updated script names in package.json and renamed a sample file for zx compatibility

Introduces groundtruth model option, result tracking, and output storage.

Extended PromptPexTest and PromptPexTestResult with groundtruth support.

Action

Add lmstudio to settings, expand UI model suggestions, tidy runTests.

✨ Add support for groundtruth model and outputs

Create genai-issue-labeller

…dtruth documentation links

Expanded glossary and updated diagrams to standardize GTM terms.

* feat: add groundtruth option and related parameters for test generation * feat: add model alias for groundtruth evaluation * feat: add model_under_test alias and update related logic in prompt generation * feat: update groundtruth model handling and rename constants for clarity * Update src/genaisrc/src/testrun.mts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* feat: add implementation plan documentation for PromptPex framework * docs: enhance implementation plan with validation steps for test generation * new plan * docs: update implementation plan phases and add additional features

* ✨ Label tests with unique IDs and propagate testuid Added unique testuid to each test and test result; updated logic to use it. * ✨ add testuid to test run output and update indexing logic Test run data now includes testuid; testuid index starts from 0. * ✨: Unleash Unique IDs in PromptPex Tests with nanoid Integrated nanoid for generating unique, consistent test UIDs. * ✨ Fix testuid template and ensure strict equality in search Corrected testuid generation format and used strict equality for lookup. * Update src/genaisrc/src/testevalmetric.mts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* ✨ add groundtruth test output support to promptpex Introduce groundtruth test results file loading and parsing support. * ✏️ fix typo in PromptPexContext groundtruth comment Corrected 'Groudtruth' to 'Groundtruth' in the documentation comment. * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Peli de Halleux <pelikhan@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- Upgraded genaiscript from version 1.142.15 to 2.0.8 - Updated openai from version 5.5.1 to 5.7.0 - Bumped @types/node from 24.0.3 to 24.0.4 - Upgraded prettier from 3.5.3 to 3.6.1 - Updated zx from 8.5.5 to 8.6.0 - Modified promptpex:demo:github script to include .github.env for environment variables

…ving

pelikhan and others added 30 commits June 2, 2025 23:45

model names

e543499

removed pull

e5fad83

getting started on github models support

796b27c

passing test data

0057f5b

✨ Add support for groundtruth model and outputs

eb31c1d

Introduces groundtruth model option, result tracking, and output storage.

upgrade deps

44cf117

migrate to node v22

2152903

wiring up action

bb597eb

add files argument

dc5c4e4

✨ feat: add groundtruth fields to test data pipeline

7d5ab36

Extended PromptPexTest and PromptPexTestResult with groundtruth support.

use promt for noe

f425a16

define action

db6d3ea

Merge remote-tracking branch 'origin/main' into action

3e3d469

fid build

a60970b

fix test

9d4c99d

Merge pull request #147 from microsoft/action

3c4c7c8

Action

fix build

e0aedb2

Merge remote-tracking branch 'origin/dev' into add-ground-truth

3af89be

cleanup

0b4440b

✨ Enhance model suggestions and test config options

1ebed36

Add lmstudio to settings, expand UI model suggestions, tidy runTests.

integrate groundtruth in run test

2d400f7

Merge pull request #146 from microsoft/add-ground-truth

1c18cd0

✨ Add support for groundtruth model and outputs

Merge remote-tracking branch 'origin/dev' into githubmodels

5a1437d

Create genai-issue-labeller

1b8ce71

Merge pull request #150 from microsoft/pelikhan-patch-1

cfe006f

Create genai-issue-labeller

Merge remote-tracking branch 'origin/dev' into githubmodels

4e57915

updated typenames

fa1daad

pelikhan and others added 24 commits June 19, 2025 14:45

feat: add front matter schema for prompty definition and update groun…

72ddf67

…dtruth documentation links

refactor: remove npm test step from CI workflow

f014ad8

chore: bump version to 0.0.12

77f3afa

📝 docs: Add and clarify Ground Truth test terminology (#172)

5714aaf

Expanded glossary and updated diagrams to standardize GTM terms.

feat: add GenAI Pull Request Descriptor workflow

89916ad

refactor: update groundtruth handling and improve code clarity

3fe8cd7

refactor: improve code formatting and enhance groundtruth handling

dfe7942

more options work

b252280

fix: update groundtruth assignment in evaluateTestMetric function

109a56a

Planner (#176)

e1f4af3

* feat: add implementation plan documentation for PromptPex framework * docs: enhance implementation plan with validation steps for test generation * new plan * docs: update implementation plan phases and add additional features

refactor: update documentation structure and add implementation plan

7cbe0b3

fix: update implementation plan link in documentation

46d6571

Merge branch 'dev' of https://github.com/microsoft/promptpex into dev

548319b

chore: update genaiscript dependency to version 2.0.9

3373562

upgrade to genaiscript 2.0

fac638b

chore: bump version to 0.0.13

51db6a0

new docker

9c555ca

chore: add Dockerfile for serving promptpex

e7fa425

chore: add Dockerfile for serving genaiscript action

3b23128

chore: update Dockerfile and package.json to expose port 8003 for ser…

7bc4c54

…ving

pelikhan requested review from bzorn and shraddhabarke July 1, 2025 19:50

shraddhabarke approved these changes Jul 1, 2025

View reviewed changes

pelikhan merged commit 3431021 into main Jul 1, 2025
7 of 8 checks passed

pelikhan deleted the dev branch July 1, 2025 19:54

pelikhan restored the dev branch July 1, 2025 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dev#138

dev#138
pelikhan merged 166 commits intomainfrom
dev

pelikhan commented Jun 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pelikhan commented Jun 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants