Skip to content

Conversation

pamelafox
Copy link
Owner

Purpose

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[ ] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[ ] No

Type of change

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

@pamelafox
Copy link
Owner Author

/evaluate

Copy link

Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.

Copy link

Evaluation results

metric stat baseline gpt-4o-mini pr5
gpt_groundedness mean_rating 4.94 4.9 4.82
pass_rate 0.98 0.98 0.98
gpt_relevance mean_rating 4.42 4.54 4.26
pass_rate 0.98 0.96 0.96
answer_length mean 667.7 934.36 618.3
latency mean 2.96 3.8 3.0
citations_matched rate 0.45 0.53 0.43
any_citation rate 1.0 1.0 1.0

Check the workflow run for more details.

@pamelafox
Copy link
Owner Author

/evaluate

Copy link

Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.

1 similar comment
Copy link

Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.

Copy link

Evaluation results

metric stat baseline gpt-4o-mini pr5
gpt_groundedness mean_rating 4.94 4.9 4.9
pass_rate 0.98 0.98 0.98
gpt_relevance mean_rating 4.42 4.54 4.3
pass_rate 0.98 0.96 0.98
answer_length mean 667.7 934.36 461.2
latency mean 2.96 3.8 2.48
citations_matched rate 0.45 0.53 0.45
any_citation rate 1.0 1.0 1.0

Check the workflow run for more details.

Copy link

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed.

@github-actions github-actions bot added the Stale label Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant