Skip to content

test LLM output for semantic similarity using vector embeddings#59

Closed
carl wants to merge 36 commits intothisisartium:mainfrom
carl:fix_example_1
Closed

test LLM output for semantic similarity using vector embeddings#59
carl wants to merge 36 commits intothisisartium:mainfrom
carl:fix_example_1

Conversation

@carl
Copy link
Contributor

@carl carl commented Mar 19, 2025

Add example how to test LLM output for semantic similarity using vector embeddings.

Snapshot testing is allows capture embeddings vector and notice when it changes.

This pull request includes multiple changes to enhance the functionality of the team_recommender module, particularly focusing on embedding stabilization, alignment computation, and testing improvements. The most important changes include adding new functions for embedding stabilization, implementing alignment computation, and updating tests to reflect these new functionalities.

Embedding Stabilization and Alignment Computation:

Testing Enhancements:

Snapshot Updates:

@tkersey tkersey requested a review from Copilot March 20, 2025 00:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adjusts the similarity threshold in the hallucination test example from 70% to 64% to better align with updated expectations. Key changes include:

  • Lowering the cosine similarity threshold in the if condition and assert statement.
  • Renaming file handle variables from "f" to "fp" and adding noinspection comments.
Files not reviewed (2)
  • examples/team_recommender/tests/example_1_text_response/snapshots/test_good_fit_for_project/test_llm_will_hallucinate_given_no_data/hallucination_response.txt: Language not supported
  • examples/team_recommender/tests/fixtures/hallucination_response.json: Language not supported
Comments suppressed due to low confidence (2)

examples/team_recommender/tests/example_1_text_response/test_good_fit_for_project.py:124

  • Verify that lowering the threshold to 0.64 accurately reflects the intended test behavior and does not inadvertently allow borderline cases to pass.
if cosine_similarity < 0.64:

examples/team_recommender/tests/example_1_text_response/test_good_fit_for_project.py:146

  • [nitpick] Consider using a more descriptive variable name instead of 'fp' for the file handle to enhance readability.
) as fp:

@tkersey tkersey requested a review from Copilot March 20, 2025 17:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adjusts the expectation for hallucination detection in the team recommender tests by updating the similarity comparison logic and adding a new helper function.

  • Updated the test to create an embedding object using a specified model ("text-embedding-3-large").
  • Replaced a cosine similarity threshold check with comparison of semantic similarity scores.
  • Added a new function, compute_alignment, to the cosine_similarity module.

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File Description
examples/team_recommender/tests/example_1_text_response/test_good_fit_for_project.py Modified test to use semantic similarity scores for hallucination detection and updated variable names and embedding model.
examples/team_recommender/tests/example_1_text_response/cosine_similarity.py Added a compute_alignment function to normalize the difference vector.
Files not reviewed (1)
  • examples/team_recommender/tests/example_1_text_response/snapshots/test_good_fit_for_project/test_llm_will_hallucinate_given_no_data/hallucination_response.txt: Language not supported

@paulz paulz changed the title adjust expectation for hallucination example to 64% test LLM output for semantic similarity using vector embeddings Mar 24, 2025
carl and others added 24 commits March 24, 2025 11:30
- update cosine similarity tests
… assertion from less than to higher than for log message

# Conflicts:
#	examples/team_recommender/tests/helpers.py
…plement snapshot loading for embedding equivalence tests
…t and implement snapshot loading for embedding equivalence tests"

This reverts commit d72fc74.
…ood_fit_for_project.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ood_fit_for_project.py


add tolerance_margin = 0.05

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@paulz
Copy link
Contributor

paulz commented Mar 24, 2025

closing in favor of: #61

@paulz paulz closed this Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants