✨ add groundtruth test output support to promptpex by bzorn · Pull Request #180 · microsoft/promptpex

bzorn · 2025-06-20T21:21:17Z

Introduce groundtruth test results file loading and parsing support.

Copilot

Pull Request Overview

This PR adds support for handling groundtruth test outputs in promptpex by introducing new file-loading, parsing, and processing logic.

Adds a new property (groundtruthOutputs) to the PromptPexContext type.
Adjusts promptpex generation and test result parsing to utilize the new groundtruth outputs.
Updates loaders and package scripts to incorporate and reference groundtruth test result files.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/genaisrc/src/types.mts	Introduces the groundtruthOutputs field in the context type
src/genaisrc/src/promptpex.mts	Updates output assignment to reference groundtruthOutputs
src/genaisrc/src/parsers.mts	Parses and validates groundtruth test results
src/genaisrc/src/loaders.mts	Reads the groundtruth test results file from disk
package.json	Updates scripts to include groundtruth model inputs and file paths

src/genaisrc/src/types.mts

src/genaisrc/src/parsers.mts

Copilot · 2025-06-20T21:22:02Z

package.json

-        "promptpex:test-st-min-eval1": "genaiscript run promptpex \"evals/test-st-min-run/speech-tag/promptpex_context.json\" --vars \"evals=true\" --vars \"compliance=true\" --vars \"baselineTests=false\" --vars \"evalModel=azure:gpt-4.1-mini_2025-04-14;ollama:llama3.3\" --vars \"out=evals/test-st-min-eval\"",
-        "promptpex:test-st-min-runeval": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=true\" --vars \"compliance=true\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b\" --vars \"evalModel=ollama:llama3.3\" --vars \"out=evals/test-st-min-runeval\"",
+        "promptpex:test-st-min-gen:ollama": "genaiscript run promptpex \"samples/speech-tag/speech-tag.prompty\"  --vars \"effort=min\" --vars \"groundtruthModel=azure:gpt-4.1-mini_2025-04-14\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"evalModelGroundtruth=azure:gpt-4.1-mini_2025-04-14;ollama:llama3.3\" --vars \"out=evals/test-st-min-gen\" --env .env.ollama",
+        "promptpex:test-st-min-run:ollama": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b;ollama:llama3.3\" --vars \"out=evals/test-st-min-run --env .env.ollama\"",


It appears that the --env argument is included inside the --vars quoted string. Adjust the script so that --env .env.ollama is placed outside the --vars value.

Suggested change

"promptpex:test-st-min-run:ollama": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b;ollama:llama3.3\" --vars \"out=evals/test-st-min-run --env .env.ollama\"",

"promptpex:test-st-min-run:ollama": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b;ollama:llama3.3\" --vars \"out=evals/test-st-min-run\" --env .env.ollama",

Corrected 'Groudtruth' to 'Groundtruth' in the documentation comment.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

✨ add groundtruth test output support to promptpex

0d5bc68

Introduce groundtruth test results file loading and parsing support.

bzorn requested review from Copilot and pelikhan June 20, 2025 21:21

Copilot AI reviewed Jun 20, 2025

View reviewed changes

bzorn and others added 2 commits June 20, 2025 21:28

✏️ fix typo in PromptPexContext groundtruth comment

cfc8a22

Corrected 'Groudtruth' to 'Groundtruth' in the documentation comment.

Apply suggestions from code review

a409405

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

pelikhan approved these changes Jun 20, 2025

View reviewed changes

bzorn merged commit 3537e2e into dev Jun 20, 2025
1 check passed

bzorn deleted the update-regression-tests branch June 20, 2025 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ add groundtruth test output support to promptpex#180

✨ add groundtruth test output support to promptpex#180
bzorn merged 3 commits intodevfrom
update-regression-tests

bzorn commented Jun 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jun 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	"promptpex:test-st-min-run:ollama": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b;ollama:llama3.3\" --vars \"out=evals/test-st-min-run --env .env.ollama\"",
	"promptpex:test-st-min-run:ollama": "genaiscript run promptpex \"evals/test-st-min-gen/speech-tag/promptpex_context.json\" --vars \"evals=false\" --vars \"compliance=false\" --vars \"baselineTests=false\" --vars \"modelsUnderTest=ollama:qwen2.5:3b;ollama:llama3.2:1b;ollama:llama3.3\" --vars \"out=evals/test-st-min-run\" --env .env.ollama",

Conversation

bzorn commented Jun 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants