test LLM output for semantic similarity using vector embeddings#61
Merged
paulz merged 36 commits intothisisartium:mainfrom Mar 24, 2025
Merged
test LLM output for semantic similarity using vector embeddings#61paulz merged 36 commits intothisisartium:mainfrom
paulz merged 36 commits intothisisartium:mainfrom
Conversation
- update cosine similarity tests
…ignment computation tests
… assertion from less than to higher than for log message
…on with snapshot assertions
…testing for embedding object is not reliable
…plement snapshot loading for embedding equivalence tests
…t and implement snapshot loading for embedding equivalence tests" This reverts commit d72fc74.
…ood_fit_for_project.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ood_fit_for_project.py add tolerance_margin = 0.05 Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…values with enough precision
…n still fails on values close to 0
Signed-off-by: Paul Zabelin <paulzabelin@artium.ai>
Contributor
|
🐻 approve |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add example how to test LLM output for semantic similarity using vector embeddings.
Snapshot testing is allows capture embeddings vector and notice when it changes.
This pull request includes significant updates to the
examples/team_recommender/tests/example_1_text_responsemodule, focusing on enhancing the functionality and improving the accuracy of the embeddings and similarity computations. The most important changes include the addition of new functions for embedding stabilization, new test cases, and updates to existing test cases to ensure robustness.Enhancements to embeddings and similarity computations:
examples/team_recommender/tests/example_1_text_response/openai_embeddings.py: Added functionsstabilize_embedding,stabilize_embedding_object, andstabilize_floatto stabilize embeddings and floating-point numbers.examples/team_recommender/tests/example_1_text_response/cosine_similarity.py: Added a new functioncompute_alignmentto calculate the alignment vector between two lists.Updates to test cases:
examples/team_recommender/tests/example_1_text_response/test_compute_alignment.py: Added a new test casetest_compute_alignmentto verify the functionality of thecompute_alignmentfunction.examples/team_recommender/tests/example_1_text_response/test_compute_cosine_similarity.py: Added multiple test cases to verify the correctness of cosine similarity computations, including tests for aligned vectors, random vectors, and saved responses.examples/team_recommender/tests/example_1_text_response/test_openai_embeddings.py: Added test cases to verify the stabilization functions, ensuring they work correctly with various inputs.Removal of outdated test data:
examples/team_recommender/tests/example_1_text_response/snapshots/test_good_fit_for_project/test_llm_will_hallucinate_given_no_data/hallucination_response.txt: Removed outdated test snapshot data.examples/team_recommender/tests/example_1_text_response/snapshots/test_good_fit_for_project/test_llm_will_hallucinate_given_no_data/please_provide_missing_information_response.txt: Removed outdated test snapshot data.