test LLM output for semantic similarity using vector embeddings by paulz · Pull Request #61 · thisisartium/continuous-alignment-testing

paulz · 2025-03-24T18:41:16Z

Add example how to test LLM output for semantic similarity using vector embeddings.

Snapshot testing is allows capture embeddings vector and notice when it changes.

This pull request includes significant updates to the examples/team_recommender/tests/example_1_text_response module, focusing on enhancing the functionality and improving the accuracy of the embeddings and similarity computations. The most important changes include the addition of new functions for embedding stabilization, new test cases, and updates to existing test cases to ensure robustness.

Enhancements to embeddings and similarity computations:

examples/team_recommender/tests/example_1_text_response/openai_embeddings.py: Added functions stabilize_embedding, stabilize_embedding_object, and stabilize_float to stabilize embeddings and floating-point numbers.
examples/team_recommender/tests/example_1_text_response/cosine_similarity.py: Added a new function compute_alignment to calculate the alignment vector between two lists.

Updates to test cases:

examples/team_recommender/tests/example_1_text_response/test_compute_alignment.py: Added a new test case test_compute_alignment to verify the functionality of the compute_alignment function.
examples/team_recommender/tests/example_1_text_response/test_compute_cosine_similarity.py: Added multiple test cases to verify the correctness of cosine similarity computations, including tests for aligned vectors, random vectors, and saved responses.
examples/team_recommender/tests/example_1_text_response/test_openai_embeddings.py: Added test cases to verify the stabilization functions, ensuring they work correctly with various inputs.

Removal of outdated test data:

examples/team_recommender/tests/example_1_text_response/snapshots/test_good_fit_for_project/test_llm_will_hallucinate_given_no_data/hallucination_response.txt: Removed outdated test snapshot data.
examples/team_recommender/tests/example_1_text_response/snapshots/test_good_fit_for_project/test_llm_will_hallucinate_given_no_data/please_provide_missing_information_response.txt: Removed outdated test snapshot data.

- update cosine similarity tests

…ignment computation tests

… assertion from less than to higher than for log message

…on with snapshot assertions

…testing for embedding object is not reliable

…plement snapshot loading for embedding equivalence tests

…t and implement snapshot loading for embedding equivalence tests" This reverts commit d72fc74.

…test

…similarity test

…ood_fit_for_project.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…arity test

…ood_fit_for_project.py add tolerance_margin = 0.05 Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…dding function

… values

…eeds cleanup

…values with enough precision

…dings

…n still fails on values close to 0

Signed-off-by: Paul Zabelin <paulzabelin@artium.ai>

carl · 2025-03-24T18:55:45Z

🐻 approve

carl and others added 30 commits March 19, 2025 16:57

adjust expectation for hallucination example to 64%

4b864e0

- fix pycharm warning about file type object

5dac65a

- implement compute_alignment function

f34c043

- update cosine similarity tests

Refactor: update embedding creation and similarity computation in tests

a6327d1

Enhance: modify cosine similarity function to return lists and add al…

09bc6c1

…ignment computation tests

Refactor: moved tests to test_helpers.py, and fixed language for test…

7739f1c

… assertion from less than to higher than for log message

Refactor: update tests for embedding creation and alignment computati…

89596fc

…on with snapshot assertions

Clearly separate fixture naming from snapshot naming, wip - snapshot …

1e099d5

…testing for embedding object is not reliable

Enhance: switch to base64 encoding on OpenAI embedding request and im…

d72fc74

…plement snapshot loading for embedding equivalence tests

Revert "Enhance: switch to base64 encoding on OpenAI embedding reques…

4cca7a7

…t and implement snapshot loading for embedding equivalence tests" This reverts commit d72fc74.

Reproduce comparison of the embedding failures, snapshots unstable

66a44bb

Add assertion to validate embedding differences in cosine similarity …

7d2221a

…test

Add assertion to check count of elements outside tolerance in cosine …

fee7f63

…similarity test

Update examples/team_recommender/tests/example_1_text_response/test_g…

07b22e7

…ood_fit_for_project.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Refactor: remove unused import of compute_alignment from cosine_simil…

143f012

…arity test

Refactor: remove unused imports from test_good_fit_for_project.py

d452a37

- add variant embeddings

5a3d300

Update examples/team_recommender/tests/example_1_text_response/test_g…

4f4fda1

…ood_fit_for_project.py add tolerance_margin = 0.05 Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- lint changes and add assertion for 900 outliers with 0.0001 tolerance

375b569

- snapshot up to 4 digits precision

6d0058f

- clearly see the difference with 4 digits precision

31f7a4b

- optimize imports

7d8c95e

update alignment vector snapshot and refactor rounding to stable_embe…

695c5b8

…dding function

use 2 digit precision

07bed50

use 1 digit precision

faf86ed

refactor stable_embedding to use 3 digit precision for non-negligible…

c2ff171

… values

fix stability of a snapshot for alignment_vector using bit massage, n…

9b26f58

…eeds cleanup

test_stabilize_float shows that stabilize_float creates stable float …

583e592

…values with enough precision

add stabilize_embedding_object to ensure stable float values in embed…

09cfb11

…dings

fix stabilize_float to improve precision by adjusting bit manipulatio…

be0e207

…n still fails on values close to 0

paulz and others added 6 commits March 20, 2025 18:14

fix: 32 bit shift to align floats

f5b27e3

add tests for confidence ranges and success rate calculations

f2af6c1

add test for next_success_rate with additional case

0bdfa7e

refactor: remove redundant test case

1eed439

refactor: extract tests for openai embeddings

77db665

Signed-off-by: Paul Zabelin <paulzabelin@artium.ai>

Merge branch 'main' into fix_example_1

2d1d789

paulz marked this pull request as ready for review March 24, 2025 18:42

paulz changed the title ~~Fix example 1~~ test LLM output for semantic similarity using vector embeddings Mar 24, 2025

paulz mentioned this pull request Mar 24, 2025

test LLM output for semantic similarity using vector embeddings #59

Closed

paulz merged commit e4ca299 into thisisartium:main Mar 24, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test LLM output for semantic similarity using vector embeddings#61

test LLM output for semantic similarity using vector embeddings#61
paulz merged 36 commits intothisisartium:mainfrom
paulz:fix_example_1

paulz commented Mar 24, 2025 •

edited

Loading

Uh oh!

carl commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

paulz commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add example how to test LLM output for semantic similarity using vector embeddings.

Snapshot testing is allows capture embeddings vector and notice when it changes.

Enhancements to embeddings and similarity computations:

Updates to test cases:

Removal of outdated test data:

Uh oh!

carl commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

paulz commented Mar 24, 2025 •

edited

Loading