-
Notifications
You must be signed in to change notification settings - Fork 167
tests: introduce model cache to tests #573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@joein has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 3 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR introduces a module-scoped pytest fixture Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Files/areas to inspect closely:
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
tests/test_attention_embeddings.py(3 hunks)tests/test_custom_models.py(5 hunks)tests/test_image_onnx_embeddings.py(3 hunks)tests/test_late_interaction_embeddings.py(4 hunks)tests/test_sparse_embeddings.py(3 hunks)tests/test_text_cross_encoder.py(3 hunks)tests/test_text_onnx_embeddings.py(3 hunks)tests/utils.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
tests/test_late_interaction_embeddings.py (3)
tests/test_attention_embeddings.py (2)
model_cache(15-36)get_model(20-29)fastembed/late_interaction/late_interaction_text_embedding.py (2)
LateInteractionTextEmbedding(14-153)embed(117-139)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_text_onnx_embeddings.py (3)
tests/test_attention_embeddings.py (2)
model_cache(15-36)get_model(20-29)fastembed/text/text_embedding.py (2)
TextEmbedding(16-214)embed(165-187)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_custom_models.py (3)
tests/utils.py (1)
delete_model_cache(11-39)fastembed/text/custom_text_embedding.py (1)
CustomTextEmbedding(25-98)fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (1)
CustomTextCrossEncoder(8-46)
tests/test_text_cross_encoder.py (2)
tests/test_attention_embeddings.py (2)
model_cache(15-36)get_model(20-29)tests/utils.py (2)
delete_model_cache(11-39)should_test_model(42-67)
tests/test_attention_embeddings.py (3)
tests/test_image_onnx_embeddings.py (2)
model_cache(34-55)get_model(39-48)fastembed/sparse/sparse_text_embedding.py (2)
SparseTextEmbedding(17-130)embed(94-116)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_sparse_embeddings.py (2)
tests/test_attention_embeddings.py (2)
model_cache(15-36)get_model(20-29)tests/utils.py (2)
delete_model_cache(11-39)should_test_model(42-67)
tests/test_image_onnx_embeddings.py (3)
tests/test_attention_embeddings.py (2)
model_cache(15-36)get_model(20-29)fastembed/image/image_embedding.py (1)
embed(114-135)tests/utils.py (1)
delete_model_cache(11-39)
🪛 Ruff (0.14.3)
tests/test_late_interaction_embeddings.py
177-177: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
194-194: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
226-226: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
246-246: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
tests/test_text_onnx_embeddings.py
97-97: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
tests/test_text_cross_encoder.py
42-42: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
tests/test_attention_embeddings.py
34-34: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
113-113: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous Ο (GREEK CAPITAL LETTER OMICRON). Did you mean O (LATIN CAPITAL LETTER O)?
(RUF001)
149-149: String contains ambiguous γ (GREEK SMALL LETTER GAMMA). Did you mean y (LATIN SMALL LETTER Y)?
(RUF001)
149-149: String contains ambiguous ι (GREEK SMALL LETTER IOTA). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous α (GREEK SMALL LETTER ALPHA). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
149-149: String contains ambiguous γ (GREEK SMALL LETTER GAMMA). Did you mean y (LATIN SMALL LETTER Y)?
(RUF001)
149-149: String contains ambiguous ι (GREEK SMALL LETTER IOTA). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous α (GREEK SMALL LETTER ALPHA). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
tests/test_sparse_embeddings.py
103-103: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
172-174: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
tests/test_image_onnx_embeddings.py
53-53: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
74-74: Probable use of requests call without timeout
(S113)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
tests/test_late_interaction_embeddings.py (2)
176-178: Rename unused loop variable to keep Ruff quiet
nameis never consumed, so Ruff (B007) will flag this loop and fail lint. Please switch to_nameor iterate overcache.values()to keep the pipeline green.
193-194: Drop the unusedtoken_numbindingOnly
abridged_dimis read, so Ruff (RUF059) complains abouttoken_num. Assign it to_(and mirror that change in the other similar blocks) to silence the lint check without altering behavior.tests/test_text_cross_encoder.py (1)
42-44: Clean up unused loop variable
nameisn’t used in this loop, so Ruff (B007) will complain. Rename it to_nameor iterate overcache.values()to keep lint happy.tests/test_text_onnx_embeddings.py (1)
96-99: Tidy unused loop variable
nameis unused, tripping Ruff’s B007 rule. Please rename it to_name(or iterate directly overcache.values()) so lint passes.tests/test_image_onnx_embeddings.py (1)
52-54: Rename unused loop variable
nameisn’t referenced, so Ruff (B007) will fail the lint stage. Swap it for_nameor iterate overcache.values()to resolve the warning.tests/test_sparse_embeddings.py (2)
102-104: Silence Ruff’s unused loop warning
nameisn’t used, so Ruff (B007) will flag it. Rename to_nameor iterate overcache.values()to satisfy the lint rule.
171-173: Usestrict=Trueonzipto catch mismatchesWe expect all three embedding iterables to stay aligned. Adding
strict=Trueto thiszipwill surface any length drift immediately instead of silently truncating, giving the test better failure signalling.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
tests/test_attention_embeddings.py(3 hunks)tests/test_image_onnx_embeddings.py(3 hunks)tests/test_late_interaction_embeddings.py(4 hunks)tests/test_sparse_embeddings.py(3 hunks)tests/test_text_cross_encoder.py(3 hunks)tests/test_text_onnx_embeddings.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/test_text_cross_encoder.py (3)
tests/test_attention_embeddings.py (2)
model_cache(15-36)get_model(20-29)fastembed/rerank/cross_encoder/text_cross_encoder.py (2)
rerank(86-99)rerank_pairs(101-132)tests/utils.py (2)
delete_model_cache(11-39)should_test_model(42-67)
tests/test_text_onnx_embeddings.py (3)
tests/test_attention_embeddings.py (2)
model_cache(15-36)get_model(20-29)tests/test_image_onnx_embeddings.py (2)
model_cache(34-54)get_model(39-47)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_late_interaction_embeddings.py (2)
tests/test_attention_embeddings.py (3)
model_cache(15-36)get_model(20-29)test_parallel_processing(98-117)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_attention_embeddings.py (3)
tests/test_image_onnx_embeddings.py (2)
model_cache(34-54)get_model(39-47)tests/test_sparse_embeddings.py (2)
model_cache(84-104)get_model(89-97)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_sparse_embeddings.py (1)
tests/utils.py (2)
delete_model_cache(11-39)should_test_model(42-67)
tests/test_image_onnx_embeddings.py (3)
tests/test_text_onnx_embeddings.py (3)
model_cache(79-99)get_model(84-92)test_embedding(103-127)fastembed/image/image_embedding.py (2)
ImageEmbedding(11-135)embed(114-135)tests/utils.py (1)
delete_model_cache(11-39)
🪛 Ruff (0.14.3)
tests/test_text_cross_encoder.py
42-42: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
tests/test_text_onnx_embeddings.py
97-97: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
tests/test_late_interaction_embeddings.py
176-176: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
193-193: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
225-225: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
245-245: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
tests/test_attention_embeddings.py
34-34: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
113-113: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
148-148: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous Ο (GREEK CAPITAL LETTER OMICRON). Did you mean O (LATIN CAPITAL LETTER O)?
(RUF001)
149-149: String contains ambiguous γ (GREEK SMALL LETTER GAMMA). Did you mean y (LATIN SMALL LETTER Y)?
(RUF001)
149-149: String contains ambiguous ι (GREEK SMALL LETTER IOTA). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous α (GREEK SMALL LETTER ALPHA). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
149-149: String contains ambiguous γ (GREEK SMALL LETTER GAMMA). Did you mean y (LATIN SMALL LETTER Y)?
(RUF001)
149-149: String contains ambiguous ι (GREEK SMALL LETTER IOTA). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous α (GREEK SMALL LETTER ALPHA). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
tests/test_sparse_embeddings.py
102-102: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
171-173: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
tests/test_image_onnx_embeddings.py
52-52: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
73-73: Probable use of requests call without timeout
(S113)
| "In the beginning, the universe was created." | ||
| " This has made a lot of people very angry and been widely regarded as a bad move.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore missing comma between adjacent literals
The two strings are now concatenated by Python because there’s no comma between them, so we silently lose a separate test input. Please add the comma back so both quotes are exercised independently.
- "In the beginning, the universe was created."
- " This has made a lot of people very angry and been widely regarded as a bad move.",
+ "In the beginning, the universe was created.",
+ " This has made a lot of people very angry and been widely regarded as a bad move.",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "In the beginning, the universe was created." | |
| " This has made a lot of people very angry and been widely regarded as a bad move.", | |
| "In the beginning, the universe was created.", | |
| " This has made a lot of people very angry and been widely regarded as a bad move.", |
🤖 Prompt for AI Agents
In tests/test_attention_embeddings.py around lines 61 to 62 the two adjacent
string literals are missing a separating comma and are being implicitly
concatenated; restore the missing comma between the two quoted strings so they
become two separate list entries (i.e., insert a comma after the first string)
so both test inputs are preserved and exercised independently.
| Image.open(BytesIO(requests.get("https://qdrant.tech/img/logo.png").content)), | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a timeout to external HTTP call
The requests.get call hits the public logo URL without a timeout. If that endpoint stalls, the whole test suite can hang indefinitely. Please pass an explicit timeout (e.g. requests.get(..., timeout=30)) so the tests fail fast instead of blocking.
🧰 Tools
🪛 Ruff (0.14.3)
73-73: Probable use of requests call without timeout
(S113)
🤖 Prompt for AI Agents
In tests/test_image_onnx_embeddings.py around lines 73 to 74, the requests.get
call to fetch the public logo has no timeout and can hang; update the call to
pass an explicit timeout (for example
requests.get("https://qdrant.tech/img/logo.png", timeout=30).content) so the
test fails fast instead of blocking; ensure the timeout value is reasonable for
CI (e.g., 30s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/python-tests.yml (1)
27-40: Add inline documentation explaining why these test matrix exclusions exist.The exclusion of Python 3.10–3.12 on macOS and Windows without an inline explanation creates a maintenance burden. Future contributors won't understand why these specific combinations are excluded or when to remove them.
Add a comment above the
excludeblock linking to the underlying issue:os: - ubuntu-latest - macos-latest - windows-latest + # Exclude 3.10–3.12 on macOS/Windows due to Poetry dependency resolution issues + # See: https://github.com/python-poetry/poetry/issues/10615 exclude: # Exclude 3.10–3.12 for macOS and Windows - os: macos-latestAlso, please verify:
- Is this a temporary workaround pending a Poetry fix, or a permanent compatibility constraint? If temporary, create a follow-up issue to revisit and remove these exclusions once the Poetry issue is resolved.
- Were alternative mitigations considered, such as running the test step conditionally with a skip rather than excluding the entire job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_image_onnx_embeddings.py (1)
74-74: Add timeout to external HTTP call.The
requests.getcall can hang indefinitely if the endpoint stalls, blocking the test suite.Apply this diff:
- Image.open(BytesIO(requests.get("https://qdrant.tech/img/logo.png").content)), + Image.open(BytesIO(requests.get("https://qdrant.tech/img/logo.png", timeout=30).content)),
🧹 Nitpick comments (9)
tests/test_late_interaction_embeddings.py (2)
177-177: Rename unused loop variable.The loop variable
nameis not used within the cleanup loop.Apply this diff:
if is_ci: - for name, model in cache.items(): + for _name, model in cache.items(): delete_model_cache(model.model._model_dir)
194-194: Prefix unused unpacked variable with underscore.The
token_numvariable is unpacked but never used in the assertion logic.Apply this diff:
- token_num, abridged_dim = expected_result.shape + _token_num, abridged_dim = expected_result.shape assert np.allclose(value[:, :abridged_dim], expected_result, atol=2e-3)Apply the same change at lines 226 and 246.
Also applies to: 226-226, 246-246
tests/test_image_onnx_embeddings.py (1)
53-53: Rename unused loop variable.The loop variable
nameis not used within the cleanup loop.Apply this diff:
if is_ci: - for name, model in cache.items(): + for _name, model in cache.items(): delete_model_cache(model.model._model_dir)tests/test_text_onnx_embeddings.py (1)
98-98: Rename unused loop variable.The loop variable
nameis not used within the cleanup loop.Apply this diff:
if is_ci: - for name, model in cache.items(): + for _name, model in cache.items(): delete_model_cache(model.model._model_dir)tests/test_text_cross_encoder.py (1)
43-43: Rename unused loop variable.The loop variable
nameis not used within the cleanup loop.Apply this diff:
if is_ci: - for name, model in cache.items(): + for _name, model in cache.items(): delete_model_cache(model.model._model_dir)tests/test_attention_embeddings.py (2)
35-35: Rename unused loop variable.The loop variable
nameis not used within the cleanup loop.Apply this diff:
if is_ci: - for name, model in cache.items(): + for _name, model in cache.items(): delete_model_cache(model.model._model_dir)
114-114: Consider adding strict parameter to zip().Without
strict=True, mismatched iterables will silently truncate to the shortest length, potentially hiding bugs.Apply this diff:
- for emb_1, emb_2, emb_3 in zip(embeddings, embeddings_2, embeddings_3): + for emb_1, emb_2, emb_3 in zip(embeddings, embeddings_2, embeddings_3, strict=True): assert np.allclose(emb_1.indices, emb_2.indices)tests/test_sparse_embeddings.py (2)
103-103: Rename unused loop variable.The loop variable
nameis not used within the cleanup loop.Apply this diff:
if is_ci: - for name, model in cache.items(): + for _name, model in cache.items(): delete_model_cache(model.model._model_dir)
172-174: Consider adding strict parameter to zip().Without
strict=True, mismatched iterables will silently truncate to the shortest length, potentially hiding bugs.Apply this diff:
- for sparse_embedding, sparse_embedding_duo, sparse_embedding_all in zip( - sparse_embeddings, sparse_embeddings_duo, sparse_embeddings_all - ): + for sparse_embedding, sparse_embedding_duo, sparse_embedding_all in zip( + sparse_embeddings, sparse_embeddings_duo, sparse_embeddings_all, strict=True + ): assert (
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
tests/test_attention_embeddings.py(3 hunks)tests/test_image_onnx_embeddings.py(3 hunks)tests/test_late_interaction_embeddings.py(4 hunks)tests/test_sparse_embeddings.py(3 hunks)tests/test_text_cross_encoder.py(3 hunks)tests/test_text_onnx_embeddings.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/test_image_onnx_embeddings.py (4)
tests/test_attention_embeddings.py (2)
model_cache(15-37)get_model(20-30)fastembed/image/image_embedding.py (2)
ImageEmbedding(11-135)embed(114-135)tests/utils.py (1)
delete_model_cache(11-39)fastembed/image/onnx_embedding.py (1)
embed(149-184)
tests/test_attention_embeddings.py (2)
tests/test_image_onnx_embeddings.py (2)
model_cache(34-55)get_model(39-48)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_late_interaction_embeddings.py (3)
tests/test_attention_embeddings.py (2)
model_cache(15-37)get_model(20-30)fastembed/late_interaction/late_interaction_text_embedding.py (3)
LateInteractionTextEmbedding(14-153)embed(117-139)query_embed(141-153)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_text_onnx_embeddings.py (4)
tests/test_attention_embeddings.py (2)
model_cache(15-37)get_model(20-30)tests/test_image_onnx_embeddings.py (2)
model_cache(34-55)get_model(39-48)fastembed/text/text_embedding.py (2)
TextEmbedding(16-214)embed(165-187)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_sparse_embeddings.py (4)
tests/test_attention_embeddings.py (2)
model_cache(15-37)get_model(20-30)fastembed/sparse/sparse_text_embedding.py (3)
SparseTextEmbedding(17-130)embed(94-116)query_embed(118-130)tests/utils.py (2)
delete_model_cache(11-39)should_test_model(42-67)fastembed/sparse/bm25.py (3)
embed(206-236)query_embed(305-323)_stem(238-256)
tests/test_text_cross_encoder.py (2)
tests/test_attention_embeddings.py (2)
model_cache(15-37)get_model(20-30)tests/utils.py (2)
delete_model_cache(11-39)should_test_model(42-67)
🪛 Ruff (0.14.4)
tests/test_image_onnx_embeddings.py
53-53: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
74-74: Probable use of requests call without timeout
(S113)
tests/test_attention_embeddings.py
35-35: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
114-114: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
149-149: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
149-149: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
150-150: String contains ambiguous Ο (GREEK CAPITAL LETTER OMICRON). Did you mean O (LATIN CAPITAL LETTER O)?
(RUF001)
150-150: String contains ambiguous γ (GREEK SMALL LETTER GAMMA). Did you mean y (LATIN SMALL LETTER Y)?
(RUF001)
150-150: String contains ambiguous ι (GREEK SMALL LETTER IOTA). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
150-150: String contains ambiguous α (GREEK SMALL LETTER ALPHA). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
150-150: String contains ambiguous γ (GREEK SMALL LETTER GAMMA). Did you mean y (LATIN SMALL LETTER Y)?
(RUF001)
150-150: String contains ambiguous ι (GREEK SMALL LETTER IOTA). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
150-150: String contains ambiguous α (GREEK SMALL LETTER ALPHA). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
tests/test_late_interaction_embeddings.py
177-177: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
194-194: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
226-226: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
246-246: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
tests/test_text_onnx_embeddings.py
98-98: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
tests/test_sparse_embeddings.py
103-103: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
172-174: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
tests/test_text_cross_encoder.py
43-43: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Python 3.13.x on macos-latest test
- GitHub Check: Python 3.11.x on ubuntu-latest test
- GitHub Check: Python 3.12.x on ubuntu-latest test
- GitHub Check: Python 3.9.x on windows-latest test
- GitHub Check: Python 3.13.x on windows-latest test
- GitHub Check: Python 3.13.x on ubuntu-latest test
- GitHub Check: Python 3.10.x on ubuntu-latest test
- GitHub Check: Python 3.9.x on macos-latest test
- GitHub Check: Python 3.9.x on ubuntu-latest test
🔇 Additional comments (6)
tests/test_late_interaction_embeddings.py (1)
157-179: LGTM! Model cache fixture is well-structured.The fixture correctly implements module-scoped caching with contextmanager pattern, enabling model reuse across tests and centralizing CI cleanup.
tests/test_image_onnx_embeddings.py (1)
33-55: LGTM! Model cache fixture follows the established pattern.The fixture implementation is consistent with other test modules and correctly handles ImageEmbedding lifecycle.
tests/test_text_onnx_embeddings.py (1)
78-100: LGTM! Model cache fixture is correctly implemented.The fixture properly manages TextEmbedding instances with appropriate cleanup logic.
tests/test_text_cross_encoder.py (1)
23-45: LGTM! Model cache fixture follows the established pattern.The fixture correctly handles TextCrossEncoder caching and cleanup.
tests/test_attention_embeddings.py (1)
14-37: LGTM! Model cache fixture is well-implemented.The fixture correctly manages SparseTextEmbedding instances with appropriate lifecycle handling.
tests/test_sparse_embeddings.py (1)
83-105: LGTM! Model cache fixture is correctly structured.The fixture properly implements the caching pattern for SparseTextEmbedding instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/test_sparse_embeddings.py (2)
103-103: Consider renaming unused loop variable.The loop variable
nameis not used in the loop body. Consider using_or_namefor clarity.Apply this diff:
if is_ci: - for name, model in cache.items(): + for _, model in cache.items(): delete_model_cache(model.model._model_dir)
175-185: Consider adding strict parameter to zip.For Python 3.10+, adding
strict=Trueto thezip()call would help catch length mismatches between the iterables.Apply this diff:
- for sparse_embedding, sparse_embedding_duo, sparse_embedding_all in zip( + for sparse_embedding, sparse_embedding_duo in zip( sparse_embeddings, sparse_embeddings_duo, - # sparse_embeddings_all + strict=True, ): assert ( sparse_embedding.indices.tolist() == sparse_embedding_duo.indices.tolist() - # == sparse_embedding_all.indices.tolist() ) assert np.allclose(sparse_embedding.values, sparse_embedding_duo.values, atol=1e-3) - # assert np.allclose(sparse_embedding.values, sparse_embedding_all.values, atol=1e-3)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_sparse_embeddings.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_sparse_embeddings.py (4)
tests/test_text_onnx_embeddings.py (2)
model_cache(79-100)get_model(84-93)fastembed/sparse/sparse_text_embedding.py (3)
SparseTextEmbedding(17-130)embed(94-116)query_embed(118-130)tests/utils.py (2)
delete_model_cache(11-39)should_test_model(42-67)fastembed/sparse/bm25.py (3)
embed(206-236)query_embed(305-323)_stem(238-256)
🪛 Ruff (0.14.4)
tests/test_sparse_embeddings.py
103-103: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
175-175: Loop control variable sparse_embedding_all not used within loop body
Rename unused sparse_embedding_all to _sparse_embedding_all
(B007)
175-179: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🔇 Additional comments (6)
tests/test_sparse_embeddings.py (6)
2-2: LGTM: Contextmanager import and cache configuration added.The import and
MODELS_TO_CACHEconstant are properly defined to support the new caching fixture.Also applies to: 80-81
115-124: LGTM: Test correctly refactored to use model cache.The test properly uses the
model_cachefixture via the context manager pattern, and the test logic remains intact.
127-151: LGTM: Test correctly refactored to use model cache.The test properly integrates the
model_cachefixture while preserving the original test logic and assertions.
158-174: LGTM: Test correctly refactored with intentional scope reduction.The test properly uses the
model_cachefixture. The commented-outparallel=0test is intentionally removed to reduce HuggingFace requests, which aligns with the PR objectives.
188-210: LGTM: Test correctly refactored to use model cache.The test properly uses the
model_cachefixture and correctly accesses the underlying BM25 model instance viamodel.model.
212-232: LGTM: Test correctly refactored to use model cache.The test properly integrates the
model_cachefixture while maintaining the original test logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
tests/test_image_onnx_embeddings.py (1)
75-76: Add a timeout to the external logo downloadThe
requests.get("https://qdrant.tech/img/logo.png")call still runs without a timeout, so a slow or stalled CDN can hang the entire test module indefinitely. Please supply an explicit timeout suited for CI (e.g., 30 seconds) so the suite fails fast instead of blocking.- Image.open(BytesIO(requests.get("https://qdrant.tech/img/logo.png").content)), + Image.open( + BytesIO( + requests.get("https://qdrant.tech/img/logo.png", timeout=30).content + ) + ),tests/test_attention_embeddings.py (1)
63-64: Reinstate the comma so we keep both quotes as separate inputsWithout the comma Python concatenates these two literals, silently dropping a test case. This was flagged earlier and the regression is back—restore the comma so we cover both sentences independently.
Apply this diff:
- "In the beginning, the universe was created." - " This has made a lot of people very angry and been widely regarded as a bad move.", + "In the beginning, the universe was created.", + " This has made a lot of people very angry and been widely regarded as a bad move.",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
tests/test_attention_embeddings.py(3 hunks)tests/test_image_onnx_embeddings.py(3 hunks)tests/test_late_interaction_embeddings.py(4 hunks)tests/test_sparse_embeddings.py(3 hunks)tests/test_text_cross_encoder.py(3 hunks)tests/test_text_onnx_embeddings.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/test_text_onnx_embeddings.py (2)
tests/test_attention_embeddings.py (2)
model_cache(16-38)get_model(21-31)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_attention_embeddings.py (5)
tests/test_image_onnx_embeddings.py (2)
model_cache(35-56)get_model(40-49)tests/test_late_interaction_embeddings.py (2)
model_cache(159-180)get_model(164-173)tests/test_sparse_embeddings.py (2)
model_cache(85-106)get_model(90-99)tests/test_text_onnx_embeddings.py (2)
model_cache(80-101)get_model(85-94)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_image_onnx_embeddings.py (3)
tests/test_text_onnx_embeddings.py (2)
model_cache(80-101)get_model(85-94)fastembed/image/image_embedding.py (2)
ImageEmbedding(11-135)embed(114-135)tests/utils.py (1)
delete_model_cache(11-39)
tests/test_text_cross_encoder.py (3)
tests/test_attention_embeddings.py (2)
model_cache(16-38)get_model(21-31)fastembed/rerank/cross_encoder/text_cross_encoder.py (4)
TextCrossEncoder(15-163)_list_supported_models(47-51)rerank(86-99)rerank_pairs(101-132)tests/utils.py (2)
delete_model_cache(11-39)should_test_model(42-67)
tests/test_sparse_embeddings.py (4)
tests/test_attention_embeddings.py (2)
model_cache(16-38)get_model(21-31)fastembed/sparse/sparse_text_embedding.py (2)
SparseTextEmbedding(17-130)embed(94-116)tests/utils.py (2)
delete_model_cache(11-39)should_test_model(42-67)fastembed/sparse/bm25.py (2)
embed(206-236)_stem(238-256)
tests/test_late_interaction_embeddings.py (3)
tests/test_attention_embeddings.py (2)
model_cache(16-38)get_model(21-31)fastembed/late_interaction/late_interaction_text_embedding.py (3)
LateInteractionTextEmbedding(14-153)embed(117-139)query_embed(141-153)tests/utils.py (1)
delete_model_cache(11-39)
🪛 GitHub Actions: Tests
tests/test_attention_embeddings.py
[error] Pytest run detected 2 test failures in this file.
[error] Pytest run detected additional test failures (overall fail state reported in test suite).
tests/test_sparse_embeddings.py
[error] 176-176: ValueError: not enough values to unpack (expected 3, got 2) in test_parallel_processing; zip yielded 2 items for sparse_embeddings and sparse_embeddings_duo.
🪛 Ruff (0.14.4)
tests/test_text_onnx_embeddings.py
99-99: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
tests/test_attention_embeddings.py
36-36: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
115-115: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
150-150: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
150-150: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
150-150: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
150-150: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
150-150: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
150-150: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
150-150: String contains ambiguous ı (LATIN SMALL LETTER DOTLESS I). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
151-151: String contains ambiguous Ο (GREEK CAPITAL LETTER OMICRON). Did you mean O (LATIN CAPITAL LETTER O)?
(RUF001)
151-151: String contains ambiguous γ (GREEK SMALL LETTER GAMMA). Did you mean y (LATIN SMALL LETTER Y)?
(RUF001)
151-151: String contains ambiguous ι (GREEK SMALL LETTER IOTA). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
151-151: String contains ambiguous α (GREEK SMALL LETTER ALPHA). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
151-151: String contains ambiguous γ (GREEK SMALL LETTER GAMMA). Did you mean y (LATIN SMALL LETTER Y)?
(RUF001)
151-151: String contains ambiguous ι (GREEK SMALL LETTER IOTA). Did you mean i (LATIN SMALL LETTER I)?
(RUF001)
151-151: String contains ambiguous α (GREEK SMALL LETTER ALPHA). Did you mean a (LATIN SMALL LETTER A)?
(RUF001)
tests/test_image_onnx_embeddings.py
54-54: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
75-75: Probable use of requests call without timeout
(S113)
tests/test_text_cross_encoder.py
44-44: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
tests/test_sparse_embeddings.py
104-104: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
176-176: Loop control variable sparse_embedding_all not used within loop body
Rename unused sparse_embedding_all to _sparse_embedding_all
(B007)
176-180: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
tests/test_late_interaction_embeddings.py
178-178: Loop control variable name not used within loop body
Rename unused name to _name
(B007)
195-195: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
227-227: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
247-247: Unpacked variable token_num is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
| lowercase_model_name = model_name.lower() | ||
| if lowercase_model_name not in cache: | ||
| cache[lowercase_model_name] = SparseTextEmbedding(lowercase_model_name) | ||
| yield cache[lowercase_model_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not lowercase the model ID before constructing SparseTextEmbedding
Lowercasing the Hugging Face ID makes the registry lookup fail—SparseTextEmbedding("qdrant/bm25") raises because the canonical key is "Qdrant/bm25". That’s why CI now reports failures in this module. Keep the cache keyed by lowercase if you like, but instantiate with the original model_name so the loader still finds the model.
Apply this diff:
- if lowercase_model_name not in cache:
- cache[lowercase_model_name] = SparseTextEmbedding(lowercase_model_name)
+ if lowercase_model_name not in cache:
+ cache[lowercase_model_name] = SparseTextEmbedding(model_name)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| lowercase_model_name = model_name.lower() | |
| if lowercase_model_name not in cache: | |
| cache[lowercase_model_name] = SparseTextEmbedding(lowercase_model_name) | |
| yield cache[lowercase_model_name] | |
| lowercase_model_name = model_name.lower() | |
| if lowercase_model_name not in cache: | |
| cache[lowercase_model_name] = SparseTextEmbedding(model_name) | |
| yield cache[lowercase_model_name] |
🧰 Tools
🪛 GitHub Actions: Tests
[error] Pytest run detected 2 test failures in this file.
[error] Pytest run detected additional test failures (overall fail state reported in test suite).
🤖 Prompt for AI Agents
In tests/test_attention_embeddings.py around lines 22 to 25, the code lowercases
model_name before constructing SparseTextEmbedding which breaks model registry
lookups (e.g., "Qdrant/bm25"); change it so the cache key remains lowercase but
pass the original model_name (not lowercased) when instantiating
SparseTextEmbedding, i.e., use cache[lowercase_model_name] =
SparseTextEmbedding(model_name) while keeping cache keyed by
lowercase_model_name.
* tests: introduce model cache to tests * fix: fix not cached model deletion * new: do not run CI tests on mac os and windows on python 3.10-3.12 * fix: lowercase cache keys, bm25 caching * tests: do not run parallel processing on all cpus in sparse text embed * fix: fix models to cache names, do not run parallel=0 * fix: fix sparse embedding tests * fix: bm42 language by lower case model name
trying to reduce the number of requests to hf