Skip to content

✨ add groundtruth test output support to promptpex#180

Merged
bzorn merged 3 commits intodevfrom
update-regression-tests
Jun 20, 2025
Merged

✨ add groundtruth test output support to promptpex#180
bzorn merged 3 commits intodevfrom
update-regression-tests

Conversation

@bzorn
Copy link
Copy Markdown
Contributor

@bzorn bzorn commented Jun 20, 2025

Introduce groundtruth test results file loading and parsing support.

Introduce groundtruth test results file loading and parsing support.
@bzorn bzorn requested review from Copilot and pelikhan June 20, 2025 21:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for handling groundtruth test outputs in promptpex by introducing new file-loading, parsing, and processing logic.

  • Adds a new property (groundtruthOutputs) to the PromptPexContext type.
  • Adjusts promptpex generation and test result parsing to utilize the new groundtruth outputs.
  • Updates loaders and package scripts to incorporate and reference groundtruth test result files.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/genaisrc/src/types.mts Introduces the groundtruthOutputs field in the context type
src/genaisrc/src/promptpex.mts Updates output assignment to reference groundtruthOutputs
src/genaisrc/src/parsers.mts Parses and validates groundtruth test results
src/genaisrc/src/loaders.mts Reads the groundtruth test results file from disk
package.json Updates scripts to include groundtruth model inputs and file paths

"promptpex:test-st-min-eval1": "genaiscript run promptpex \"evals/test-st-min-run/speech-tag/promptpex_context.json\" --vars \"evals=true\" --vars \"compliance=true\" --vars \"baselineTests=false\" --vars \"evalModel=azure:gpt-4.1-mini_2025-04-14;ollama:llama3.3\" --vars \"out=evals/test-st-min-eval\"",
"promptpex:test-st-min-runeval": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=true\" --vars \"compliance=true\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b\" --vars \"evalModel=ollama:llama3.3\" --vars \"out=evals/test-st-min-runeval\"",
"promptpex:test-st-min-gen:ollama": "genaiscript run promptpex \"samples/speech-tag/speech-tag.prompty\" --vars \"effort=min\" --vars \"groundtruthModel=azure:gpt-4.1-mini_2025-04-14\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"evalModelGroundtruth=azure:gpt-4.1-mini_2025-04-14;ollama:llama3.3\" --vars \"out=evals/test-st-min-gen\" --env .env.ollama",
"promptpex:test-st-min-run:ollama": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b;ollama:llama3.3\" --vars \"out=evals/test-st-min-run --env .env.ollama\"",
Copy link

Copilot AI Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that the --env argument is included inside the --vars quoted string. Adjust the script so that --env .env.ollama is placed outside the --vars value.

Suggested change
"promptpex:test-st-min-run:ollama": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b;ollama:llama3.3\" --vars \"out=evals/test-st-min-run --env .env.ollama\"",
"promptpex:test-st-min-run:ollama": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b;ollama:llama3.3\" --vars \"out=evals/test-st-min-run\" --env .env.ollama",

Copilot uses AI. Check for mistakes.
bzorn and others added 2 commits June 20, 2025 21:28
Corrected 'Groudtruth' to 'Groundtruth' in the documentation comment.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@bzorn bzorn merged commit 3537e2e into dev Jun 20, 2025
1 check passed
@bzorn bzorn deleted the update-regression-tests branch June 20, 2025 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants