new: use cuda if available by joein · Pull Request #537 · qdrant/fastembed

joein · 2025-06-24T14:14:35Z

No description provided.

coderabbitai · 2025-06-24T14:22:00Z

📝 Walkthrough

Walkthrough

Adds a Device enum (CPU, CUDA, AUTO) and updates many public constructors and ONNX-related method signatures to accept cuda: bool | Device = Device.AUTO instead of a boolean with default False. Type imports and docstrings were updated across text, image, sparse, rerank, late-interaction, multimodal, parallel, and ONNX modules. In fastembed/common/onnx_model.py, available providers and CUDA availability are computed up-front, an explicit_cuda distinction introduced, provider-selection and warning messages refined, device_id is propagated to CUDA providers when provided, and a redundant providers query was removed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Widespread but largely homogeneous signature/type updates.
Focused logic changes in ONNX loader require careful review.

Areas to pay extra attention:

fastembed/common/onnx_model.py — provider selection, explicit_cuda handling, device_id propagation, and warning messages.
ONNX-backed model classes that forward cuda — ensure Device.AUTO / Device.CUDA / True semantics are consistent when propagated.
Public API typing/docstring changes — check for typing regressions and default-behavior implications.

Possibly related PRs

new: expose some onnx session options #578 — modifies fastembed/common/onnx_model.py and _load_onnx_model; strongly related to ONNX provider/session selection and session options.

Suggested reviewers

tbung
generall

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 29.63% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	❓ Inconclusive	No description was provided by the author, making it impossible to assess whether the description relates to the changeset.	Please add a pull request description explaining the motivation, implementation details, and impact of the Device enum and automatic CUDA detection feature.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'new: use cuda if available' accurately describes the main change: introducing Device.AUTO to enable automatic CUDA detection when available, replacing the previous default of cuda=False.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch detect-gpu

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

fastembed/sparse/minicoil.py (1)

95-99: Excellent docstring updates!

The docstring has been properly updated to reflect the new parameter type and clearly explains when device_ids should be used with the different cuda values. This level of documentation consistency should be applied to other files as well.

Consider updating docstrings in the other files (custom_text_cross_encoder.py, sparse_text_embedding.py, custom_text_embedding.py) to match this level of detail for the cuda parameter documentation.
fastembed/sparse/splade_pp.py (1)
88-89: Fix incomplete docstring.

The docstring for the cuda parameter is incomplete - it says "Defaults to Device." but should specify "Defaults to Device.AUTO."
-            cuda (Union[bool, Device], optional): Whether to use cuda for inference. Mutually exclusive with `providers`
-                Defaults to Device.
+            cuda (Union[bool, Device], optional): Whether to use cuda for inference. Mutually exclusive with `providers`
+                Defaults to Device.AUTO.
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1)
65-65: ```shell
#!/bin/bash

Locate Device enum/class definition across codebase

rg -n "class Device" -C5

Show any Enum base classes to confirm how Device is defined

rg -n "class .*Enum" -C3 --type py

Show all occurrences of Device.AUTO to see its context

rg -n "Device.AUTO" -C3

Inspect cuda parameter usage in onnx_multimodal_model.py

rg -n "def init" -C5 fastembed/late_interaction_multimodal/onnx_multimodal_model.py
rg -n "cuda" -C3 fastembed/late_interaction_multimodal/onnx_multimodal_model.py
</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between acec31277b78b24200ca900ba58c69d729abb663 and 1c438fccf8dd870a2a5bebef16410f48b9278af8.

</details>

<details>
<summary>📒 Files selected for processing (23)</summary>

* `fastembed/common/onnx_model.py` (2 hunks)
* `fastembed/common/types.py` (2 hunks)
* `fastembed/image/image_embedding.py` (2 hunks)
* `fastembed/image/onnx_embedding.py` (3 hunks)
* `fastembed/image/onnx_image_model.py` (3 hunks)
* `fastembed/late_interaction/colbert.py` (3 hunks)
* `fastembed/late_interaction/late_interaction_text_embedding.py` (2 hunks)
* `fastembed/late_interaction_multimodal/colpali.py` (3 hunks)
* `fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py` (2 hunks)
* `fastembed/late_interaction_multimodal/onnx_multimodal_model.py` (4 hunks)
* `fastembed/parallel_processor.py` (2 hunks)
* `fastembed/rerank/cross_encoder/custom_text_cross_encoder.py` (2 hunks)
* `fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py` (3 hunks)
* `fastembed/rerank/cross_encoder/onnx_text_model.py` (4 hunks)
* `fastembed/rerank/cross_encoder/text_cross_encoder.py` (2 hunks)
* `fastembed/sparse/bm42.py` (3 hunks)
* `fastembed/sparse/minicoil.py` (3 hunks)
* `fastembed/sparse/sparse_text_embedding.py` (2 hunks)
* `fastembed/sparse/splade_pp.py` (3 hunks)
* `fastembed/text/custom_text_embedding.py` (3 hunks)
* `fastembed/text/onnx_embedding.py` (3 hunks)
* `fastembed/text/onnx_text_model.py` (3 hunks)
* `fastembed/text/text_embedding.py` (2 hunks)

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms (15)</summary>

* GitHub Check: Python 3.9.x on windows-latest test
* GitHub Check: Python 3.11.x on macos-latest test
* GitHub Check: Python 3.11.x on windows-latest test
* GitHub Check: Python 3.10.x on windows-latest test
* GitHub Check: Python 3.13.x on windows-latest test
* GitHub Check: Python 3.11.x on ubuntu-latest test
* GitHub Check: Python 3.13.x on macos-latest test
* GitHub Check: Python 3.13.x on ubuntu-latest test
* GitHub Check: Python 3.12.x on windows-latest test
* GitHub Check: Python 3.10.x on macos-latest test
* GitHub Check: Python 3.12.x on macos-latest test
* GitHub Check: Python 3.9.x on ubuntu-latest test
* GitHub Check: Python 3.10.x on ubuntu-latest test
* GitHub Check: Python 3.12.x on ubuntu-latest test
* GitHub Check: Python 3.9.x on macos-latest test

</details>

<details>
<summary>🔇 Additional comments (56)</summary><blockquote>

<details>
<summary>fastembed/common/types.py (1)</summary>

`29-32`: **LGTM! Clean Device enum implementation.**

The enum correctly inherits from both `str` and `Enum`, which allows values to be used as strings while maintaining type safety. The three device options (CPU, CUDA, AUTO) provide clear and explicit device specification.

</details>
<details>
<summary>fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (2)</summary>

`5-5`: **LGTM! Correct import of Device enum.**

---

`18-18`: ```shell
#!/bin/bash
set -e

# Locate the Device class definition
rg -n "class Device" .

# Find all references to Device.AUTO
rg -n "Device.AUTO" .

# Inspect usage of the `cuda` parameter in the rerank module
rg -n "cuda" fastembed/rerank
fastembed/sparse/sparse_text_embedding.py (2)

5-5: LGTM! Correct import of Device enum.

60-60: Consistent parameter update with other files.

The cuda parameter type change maintains the same pattern as other files in the codebase. The Union type preserves backward compatibility while enabling the new Device enum functionality.

fastembed/text/custom_text_embedding.py (2)

1-1: LGTM! Correct imports for the new parameter type.

Both Union and Device are properly imported to support the new cuda parameter type.

Also applies to: 14-14

35-35: Consistent parameter type update.

The cuda parameter follows the same Union type pattern established across the codebase, maintaining backward compatibility.

fastembed/sparse/minicoil.py (2)

13-13: LGTM! Correct import of Device enum.

75-75: Consistent parameter type update.

The cuda parameter follows the established Union type pattern across the codebase.
fastembed/sparse/bm42.py (2)

65-65: LGTM! Improved device specification flexibility.

The change from a boolean cuda parameter to Union[bool, Device] with Device.AUTO default provides more explicit device control while maintaining backward compatibility. The updated docstrings correctly reflect the new parameter types and usage.

Also applies to: 83-87

12-12: Verify the Device enum definition and usage.

The import looks correct, but let's verify the Device enum exists with the expected values and is properly used throughout the codebase.
#!/bin/bash
# Verify Device enum exists and has expected values
echo "Checking Device enum definition..."
ast-grep --pattern 'class Device($$$)'

echo -e "\nChecking Device enum values..."
rg -A 10 "class Device" --type py

echo -e "\nVerifying Device.AUTO, Device.CPU, Device.CUDA usage..."
rg "Device\.(AUTO|CPU|CUDA)" --type py -A 2 -B 2
fastembed/text/onnx_embedding.py (2)

3-3: LGTM! Clean import of Device enum.

The import follows the established pattern and properly includes the Device enum alongside other type definitions.

205-205: LGTM! Consistent implementation of device enum.

The parameter type change and documentation updates are well-implemented. The docstring clearly explains when device_ids should be used with the new Device enum values.

Also applies to: 221-225

fastembed/image/onnx_image_model.py (2)

11-11: LGTM! Proper Device enum import.

The import is correctly placed and follows the established pattern for including the Device enum.

56-56: LGTM! Consistent method signature updates.

Both _load_onnx_model and _embed_images methods have been consistently updated to use the new Device enum. The type changes maintain backward compatibility while providing enhanced device specification capabilities.

Also applies to: 98-98

fastembed/sparse/splade_pp.py (2)

6-6: LGTM! Correct Device enum import.

The import follows the established pattern and properly includes the Device enum.

72-72: LGTM! Proper parameter type update.

The cuda parameter type change and device_ids documentation are correctly implemented and consistent with the pattern across other files.

Also applies to: 91-92
fastembed/late_interaction/late_interaction_text_embedding.py (2)

5-5: LGTM! Consistent Device enum import.

The import properly includes the Device enum alongside other type definitions following the established pattern.

57-57: ```shell
#!/bin/bash

Show the full signature of Colbert.init to verify cuda and device_ids support

echo ">> Full Colbert.init signature and parameters:"
rg -A15 "def init" -n fastembed/late_interaction/colbert.py
</details>
<details>
<summary>fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (2)</summary>

`5-5`: **LGTM: Import statement updated correctly.**

The `Device` enum is properly imported alongside the existing `NumpyArray` type.

---

`60-60`: **Verify Device enum definition and backward compatibility.**

The change from `bool` to `Union[bool, Device]` with default `Device.AUTO` looks correct and maintains backward compatibility. However, please verify that the `Device` enum is properly defined and that `Device.AUTO` is an appropriate default value.



```shell
#!/bin/bash
# Verify Device enum definition and usage
echo "Searching for Device enum definition..."
ast-grep --pattern 'class Device'

echo "Checking Device enum values..."
rg -A 10 "class Device" --type py

echo "Verifying Device.AUTO usage..."
rg "Device\.AUTO" --type py
fastembed/parallel_processor.py (3)

11-11: LGTM: Union import added correctly.

The Union type is properly imported to support the new parameter type annotation.

13-13: LGTM: Device import added correctly.

The Device enum is properly imported from the common types module.

99-99: LGTM: Parameter type updated consistently.

The cuda parameter type change from bool to Union[bool, Device] with default Device.AUTO is consistent with the broader refactoring pattern. The parameter is correctly passed to worker processes.

fastembed/image/image_embedding.py (2)

4-4: LGTM: Device import added alongside existing types.

The Device enum is properly imported alongside NumpyArray from the common types module.

54-54: LGTM: Parameter type updated consistently.

The cuda parameter type change maintains backward compatibility while enabling the new Device enum usage. The parameter is correctly passed to the underlying embedding model.

fastembed/rerank/cross_encoder/text_cross_encoder.py (3)

1-1: LGTM: Union import added for type annotation.

The Union type is properly imported to support the new parameter type annotation.

5-5: LGTM: Device import added correctly.

The Device enum is properly imported from the common types module.

60-60: LGTM: Parameter type updated consistently.

The cuda parameter type change from bool to Union[bool, Device] with default Device.AUTO follows the established pattern and is correctly passed to the underlying cross encoder model.

fastembed/text/text_embedding.py (2)

5-5: LGTM: Device import added alongside existing types.

The Device enum is properly imported alongside other common types like NumpyArray and OnnxProvider.

85-85: LGTM: Parameter type updated consistently.

The cuda parameter type change maintains backward compatibility while enabling the new Device enum usage. The parameter is correctly passed to the underlying embedding model.

fastembed/image/onnx_embedding.py (3)

4-4: LGTM: Device enum import added correctly.

The import statement properly adds the Device enum from the common types module.

69-69: LGTM: Parameter type and default updated appropriately.

The change from bool = False to Union[bool, Device] = Device.AUTO maintains backward compatibility while providing more explicit device control. The Device.AUTO default is more user-friendly than requiring explicit False.

85-89: LGTM: Documentation updated accurately.

The docstring updates correctly reflect the new parameter types and usage conditions, clearly stating when device_ids should be used with the various cuda values.

fastembed/late_interaction/colbert.py (3)

7-7: LGTM: Consistent Device enum import.

Import follows the same pattern as other updated files.

125-125: LGTM: Parameter signature updated consistently.

The cuda parameter change maintains the same pattern as other files in this refactor.

141-145: LGTM: Documentation consistently updated.

Docstrings match the pattern established in other files, correctly documenting the new parameter usage.

fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (3)

7-7: LGTM: Device import added consistently.

Follows the established pattern for this refactor.

84-84: LGTM: Parameter type updated consistently.

Maintains the same signature pattern as other refactored classes.

100-104: LGTM: Documentation updated consistently.

Docstring changes match the pattern and accurately describe the new parameter usage.

fastembed/common/onnx_model.py (3)

12-12: LGTM: Device enum import added to core module.

Properly imports the Device enum for use in the core ONNX model logic.

61-61: LGTM: Core parameter signature updated.

The updated signature in the core _load_onnx_model method establishes the foundation for the Device enum usage across the codebase.

66-81: LGTM: Enhanced device selection logic.

The updated logic provides several improvements:

Queries available providers once at the start for efficiency

Clearly distinguishes explicit CUDA requests (True or Device.CUDA) from auto-detection

Provides helpful warnings for mutually exclusive parameters

Handles Device.AUTO mode intelligently by checking CUDA availability

Maintains backward compatibility with boolean values

The logic correctly prioritizes explicit user choices while providing smart defaults.

fastembed/late_interaction_multimodal/colpali.py (3)

8-8: LGTM: Device import added consistently.

Follows the established import pattern for this refactor.

52-52: LGTM: Parameter signature updated consistently.

Maintains the same pattern as all other refactored classes in the codebase.

68-72: LGTM: Documentation updated consistently.

Docstring changes follow the established pattern and accurately describe the parameter usage conditions.
fastembed/text/onnx_text_model.py (3)

10-10: LGTM: Import addition is consistent.

The Device import is correctly added to support the new union type parameter.

109-109: ```shell
#!/bin/bash

Show the full init signature of ParallelWorkerPool in parallel_processor.py

sed -n '90,120p' fastembed/parallel_processor.py
---

`55-55`: ```shell
#!/bin/bash
# Display the full _load_onnx_model signature in the parent class
rg -n "def _load_onnx_model" -C20 fastembed/common/onnx_model.py
fastembed/rerank/cross_encoder/onnx_text_model.py (3)

4-4: LGTM: Import additions are correct.

Both Union and Device imports are properly added to support the new parameter type.

Also applies to: 15-15

34-34: LGTM: Consistent parameter type update.

The parameter type change is consistent with the broader refactoring and provides better device specification flexibility.

95-95: LGTM: Consistent parameter type update.

The parameter type change follows the established pattern and maintains consistency across the codebase.

fastembed/late_interaction_multimodal/onnx_multimodal_model.py (4)

14-14: LGTM: Import addition is consistent.

The Device import correctly supports the new union type parameter.

65-65: LGTM: Consistent parameter type update.

The parameter type change maintains consistency with the broader refactoring effort.

121-121: LGTM: Consistent parameter type update.

The parameter type change follows the established pattern across the codebase.

188-188: LGTM: Consistent parameter type update.

The parameter type change maintains consistency with other methods in the refactoring.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c438fc and 256443e.

📒 Files selected for processing (23)

fastembed/common/onnx_model.py (2 hunks)
fastembed/common/types.py (2 hunks)
fastembed/image/image_embedding.py (2 hunks)
fastembed/image/onnx_embedding.py (3 hunks)
fastembed/image/onnx_image_model.py (3 hunks)
fastembed/late_interaction/colbert.py (3 hunks)
fastembed/late_interaction/late_interaction_text_embedding.py (2 hunks)
fastembed/late_interaction_multimodal/colpali.py (3 hunks)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (2 hunks)
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (4 hunks)
fastembed/parallel_processor.py (2 hunks)
fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (2 hunks)
fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (3 hunks)
fastembed/rerank/cross_encoder/onnx_text_model.py (4 hunks)
fastembed/rerank/cross_encoder/text_cross_encoder.py (2 hunks)
fastembed/sparse/bm42.py (3 hunks)
fastembed/sparse/minicoil.py (3 hunks)
fastembed/sparse/sparse_text_embedding.py (2 hunks)
fastembed/sparse/splade_pp.py (3 hunks)
fastembed/text/custom_text_embedding.py (3 hunks)
fastembed/text/onnx_embedding.py (3 hunks)
fastembed/text/onnx_text_model.py (3 hunks)
fastembed/text/text_embedding.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (8)

fastembed/text/onnx_text_model.py
fastembed/image/onnx_image_model.py
fastembed/late_interaction_multimodal/colpali.py
fastembed/rerank/cross_encoder/text_cross_encoder.py
fastembed/parallel_processor.py
fastembed/text/onnx_embedding.py
fastembed/common/types.py
fastembed/sparse/sparse_text_embedding.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-12T10:48:30.978Z

Learnt from: joein
Repo: qdrant/fastembed PR: 574
File: fastembed/sparse/sparse_embedding_base.py:2-2
Timestamp: 2025-11-12T10:48:30.978Z
Learning: In fastembed codebase, when using numpy NDArray types in dataclass fields, keep Union syntax instead of PEP 604 pipe operator (|) because dataclasses evaluate annotations at runtime and NDArray types don't support the __or__ operator. Add a comment explaining the constraint.

Applied to files:

fastembed/common/onnx_model.py
fastembed/image/image_embedding.py
fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py
fastembed/rerank/cross_encoder/custom_text_cross_encoder.py
fastembed/text/text_embedding.py
fastembed/text/custom_text_embedding.py

🧬 Code graph analysis (15)

fastembed/sparse/minicoil.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/image/onnx_embedding.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/common/onnx_model.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/image/image_embedding.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (2)

fastembed/common/onnx_model.py (1)

OnnxOutputContext (20-23)

fastembed/common/types.py (1)

Device (29-32)

fastembed/sparse/bm42.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/sparse/splade_pp.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (2)

fastembed/common/model_description.py (1)

BaseModelDescription (24-31)

fastembed/common/types.py (1)

Device (29-32)

fastembed/text/text_embedding.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/text/custom_text_embedding.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/rerank/cross_encoder/onnx_text_model.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/late_interaction/colbert.py (1)

fastembed/common/types.py (1)

Device (29-32)

fastembed/late_interaction/late_interaction_text_embedding.py (1)

fastembed/common/types.py (1)

Device (29-32)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Python 3.13.x on windows-latest test
GitHub Check: Python 3.13.x on macos-latest test
GitHub Check: Python 3.9.x on windows-latest test
GitHub Check: Python 3.9.x on macos-latest test

🔇 Additional comments (16)

fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (1)

1-1: Device enum integration for cuda looks consistent

Importing Union/Device, updating cuda to Union[bool, Device] = Device.AUTO, and aligning the docstrings with the new cuda/device_ids semantics are all coherent and match how cuda is passed into _load_onnx_model and _rerank_pairs. No issues from this change on this path.

Also applies to: 7-7, 84-85, 100-104

fastembed/text/custom_text_embedding.py (1)

1-1: CUDA Device support wired correctly into CustomTextEmbedding

The updated typing/imports and the new cuda: Union[bool, Device] = Device.AUTO parameter are correctly forwarded to OnnxTextEmbedding, so this class stays aligned with the new device-selection API.

Also applies to: 14-14, 35-35

fastembed/late_interaction/late_interaction_text_embedding.py (1)

5-5: Late-interaction wrapper correctly adopts Device-typed cuda

Importing Device and changing cuda to Union[bool, Device] = Device.AUTO in the constructor, then passing it through to the selected embedding model, keeps this facade consistent with the rest of the API without altering other behavior.

Also applies to: 57-57

fastembed/sparse/minicoil.py (1)

13-13: MiniCOIL CUDA configuration matches new Device semantics

The switch to cuda: Union[bool, Device] = Device.AUTO plus the updated docstrings (including valid combinations for device_ids) are consistent with how cuda/device_ids are later passed into _load_onnx_model and _embed_documents. No further changes needed here.

Also applies to: 82-82, 102-106

fastembed/sparse/bm42.py (1)

12-12: Bm42 now correctly exposes Device-aware cuda configuration

The new cuda: Union[bool, Device] = Device.AUTO signature and corresponding docstring updates (including allowed values for use with device_ids) are coherent with how cuda is later consumed. Looks good.

Also applies to: 73-73, 91-95

fastembed/late_interaction/colbert.py (1)

8-8: Colbert constructor correctly aligned with Device-based cuda API

The import of Device, the cuda: Union[bool, Device] = Device.AUTO signature, and the refreshed docstrings for cuda/device_ids are all in sync with the usage in load_onnx_model and embed/query_embed. No issues spotted.

Also applies to: 146-146, 162-166

fastembed/text/text_embedding.py (1)

5-5: TextEmbedding now defaults to automatic device selection

Importing Device and updating cuda to Union[bool, Device] = Device.AUTO, then passing it through to all registered embedding implementations, cleanly centralizes the “use CUDA if available” behavior at this facade. Callers who need CPU-only can still pass cuda=False or Device.CPU.

Also applies to: 85-85

fastembed/common/onnx_model.py (1)

63-90: LGTM on the device selection logic.

The provider selection logic correctly handles all combinations:

Explicit CUDA request (cuda=True or cuda=Device.CUDA) → CUDAExecutionProvider

Auto-detection (cuda=Device.AUTO) with available CUDA → CUDAExecutionProvider

Explicit CPU (cuda=Device.CPU or cuda=False) or AUTO without CUDA → CPUExecutionProvider

The backward compatibility with boolean values is preserved.

fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (1)

54-79: LGTM!

The cuda parameter type update is consistent with the broader API changes, and the value is correctly forwarded to the underlying model implementation.

fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (1)

12-36: LGTM!

The parameter type change aligns with the codebase-wide update, and all arguments are correctly forwarded to the parent class.

fastembed/image/onnx_embedding.py (1)

63-106: LGTM!

The parameter type change, docstring updates, and forwarding logic are all correctly implemented and consistent with the PR-wide changes.

fastembed/image/image_embedding.py (1)

48-73: LGTM!

The cuda parameter type update follows the consistent pattern across the codebase, with correct forwarding to the underlying embedding model.

fastembed/rerank/cross_encoder/onnx_text_model.py (2)

4-4: LGTM! Import additions support the new device typing.

The addition of Union and Device imports is correct and necessary for the updated type annotations.

Also applies to: 15-15

28-46: No verification issues found — parent class and ParallelWorkerPool already support Union[bool, Device].

The cuda: Union[bool, Device] = Device.AUTO parameter signature is consistent across the codebase. The parent class OnnxModel._load_onnx_model() and ParallelWorkerPool.__init__() already use the same type signature, and all type handling is correct (proper enum comparisons at lines 71 and 84 in OnnxModel). No breaking boolean checks exist.

fastembed/late_interaction_multimodal/onnx_multimodal_model.py (2)

14-14: LGTM! Device import added correctly.

The Device import is properly added to support the updated type annotations. Union was already imported on line 5.

59-77: LGTM! Method signatures updated consistently.

All three methods (_load_onnx_model, _embed_documents, _embed_images) correctly update the cuda parameter to Union[bool, Device] = Device.AUTO. The parameters are properly passed through to parent class and ParallelWorkerPool.

Note: Type compatibility verification requested in the previous file review applies here as well.

Also applies to: 115-170, 186-241

fastembed/common/onnx_model.py

fastembed/sparse/splade_pp.py

tbung

Needs rebase and all Union removed, I didn't mark them all.

tbung · 2025-12-10T12:03:16Z

fastembed/common/onnx_model.py

 from dataclasses import dataclass
 from pathlib import Path
-from typing import Any, Generic, Iterable, Optional, Sequence, Type, TypeVar
+from typing import Any, Generic, Iterable, Optional, Sequence, Type, TypeVar, Union


needs rebase and removal of Union and Optional

tbung · 2025-12-10T12:06:16Z

fastembed/common/onnx_model.py

        threads: Optional[int],
        providers: Optional[Sequence[OnnxProvider]] = None,
-        cuda: bool = False,
+        cuda: Union[bool, Device] = Device.AUTO,


tbung · 2025-12-10T12:06:30Z

fastembed/image/image_embedding.py

        threads: Optional[int] = None,
        providers: Optional[Sequence[OnnxProvider]] = None,
-        cuda: bool = False,
+        cuda: Union[bool, Device] = Device.AUTO,


tbung · 2025-12-10T12:06:42Z

fastembed/image/onnx_embedding.py

        threads: Optional[int] = None,
        providers: Optional[Sequence[OnnxProvider]] = None,
-        cuda: bool = False,
+        cuda: Union[bool, Device] = Device.AUTO,


tbung · 2025-12-10T12:06:53Z

fastembed/image/onnx_image_model.py

        threads: Optional[int],
        providers: Optional[Sequence[OnnxProvider]] = None,
-        cuda: bool = False,
+        cuda: Union[bool, Device] = Device.AUTO,


tbung · 2025-12-10T12:07:01Z

fastembed/image/onnx_image_model.py

        parallel: Optional[int] = None,
        providers: Optional[Sequence[OnnxProvider]] = None,
-        cuda: bool = False,
+        cuda: Union[bool, Device] = Device.AUTO,


tbung · 2025-12-10T12:07:12Z

fastembed/late_interaction/colbert.py

        threads: Optional[int] = None,
        providers: Optional[Sequence[OnnxProvider]] = None,
-        cuda: bool = False,
+        cuda: Union[bool, Device] = Device.AUTO,


tbung · 2025-12-10T12:07:20Z

fastembed/late_interaction/late_interaction_text_embedding.py

        threads: Optional[int] = None,
        providers: Optional[Sequence[OnnxProvider]] = None,
-        cuda: bool = False,
+        cuda: Union[bool, Device] = Device.AUTO,


tbung · 2025-12-10T12:07:31Z

fastembed/late_interaction_multimodal/colpali.py

        threads: Optional[int] = None,
        providers: Optional[Sequence[OnnxProvider]] = None,
-        cuda: bool = False,
+        cuda: Union[bool, Device] = Device.AUTO,


tbung · 2025-12-10T12:07:45Z

fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py

        threads: Optional[int] = None,
        providers: Optional[Sequence[OnnxProvider]] = None,
-        cuda: bool = False,
+        cuda: Union[bool, Device] = Device.AUTO,


coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

fastembed/sparse/splade_pp.py (1)
93-94: Fix docstring: "Defaults to Device." should be "Defaults to Device.AUTO."

The docstring states "Defaults to Device." but the code at line 77 shows the default is Device.AUTO. This inconsistency was already flagged in a previous review.

Apply this diff to fix the docstring:
-            cuda (Union[bool, Device], optional): Whether to use cuda for inference. Mutually exclusive with `providers`
-                Defaults to Device.
+            cuda (Union[bool, Device], optional): Whether to use cuda for inference. Mutually exclusive with `providers`.
+                Defaults to Device.AUTO.
fastembed/text/onnx_embedding.py (1)

3-241: OnnxTextEmbedding correctly centralizes Device-aware cuda (minor docstyle nit)

The switch to cuda: bool | Device = Device.AUTO, storing self.cuda, and passing it through in both embed and load_onnx_model is consistent and matches the intended “auto CUDA if available” behavior across the stack. The OnnxProvider/Device import consolidation also looks fine.

Minor nit: the docstring still refers to cuda (Union[bool, Device], ...) while the code uses bool | Device. Consider aligning the docstring type notation with the code style for clarity.

🧹 Nitpick comments (4)

fastembed/text/onnx_text_model.py (2)

49-68: Consider updating method docstrings.

If _load_onnx_model has a docstring documenting the cuda parameter, it should be updated to reflect the new type (Union[bool, Device]) and default value (Device.AUTO). Similar to the comprehensive docstring updates in fastembed/image/onnx_embedding.py (lines 86-90).

103-117: Consider updating method docstrings.

If _embed_documents has a docstring documenting the cuda parameter, it should be updated to reflect the new type (Union[bool, Device]) and default value (Device.AUTO).

fastembed/image/onnx_image_model.py (2)

50-69: Consider updating method docstrings.

If _load_onnx_model has a docstring documenting the cuda parameter, it should be updated to reflect the new type and default value, similar to the updates in fastembed/image/onnx_embedding.py.

92-106: Consider updating method docstrings.

If _embed_images has a docstring documenting the cuda parameter, it should be updated to reflect the new type (Union[bool, Device]) and default value (Device.AUTO).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44c4d2b and e69a58e.

📒 Files selected for processing (23)

fastembed/common/onnx_model.py (2 hunks)
fastembed/common/types.py (1 hunks)
fastembed/image/image_embedding.py (2 hunks)
fastembed/image/onnx_embedding.py (3 hunks)
fastembed/image/onnx_image_model.py (3 hunks)
fastembed/late_interaction/colbert.py (3 hunks)
fastembed/late_interaction/late_interaction_text_embedding.py (2 hunks)
fastembed/late_interaction_multimodal/colpali.py (3 hunks)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (2 hunks)
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (4 hunks)
fastembed/parallel_processor.py (2 hunks)
fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (2 hunks)
fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (3 hunks)
fastembed/rerank/cross_encoder/onnx_text_model.py (3 hunks)
fastembed/rerank/cross_encoder/text_cross_encoder.py (2 hunks)
fastembed/sparse/bm42.py (3 hunks)
fastembed/sparse/minicoil.py (3 hunks)
fastembed/sparse/sparse_text_embedding.py (2 hunks)
fastembed/sparse/splade_pp.py (3 hunks)
fastembed/text/custom_text_embedding.py (2 hunks)
fastembed/text/onnx_embedding.py (3 hunks)
fastembed/text/onnx_text_model.py (3 hunks)
fastembed/text/text_embedding.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (11)

fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py
fastembed/image/image_embedding.py
fastembed/common/onnx_model.py
fastembed/rerank/cross_encoder/onnx_text_model.py
fastembed/common/types.py
fastembed/rerank/cross_encoder/text_cross_encoder.py
fastembed/text/text_embedding.py
fastembed/late_interaction/late_interaction_text_embedding.py
fastembed/late_interaction_multimodal/onnx_multimodal_model.py
fastembed/rerank/cross_encoder/custom_text_cross_encoder.py

🧰 Additional context used

🧬 Code graph analysis (12)

fastembed/sparse/bm42.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/parallel_processor.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/image/onnx_image_model.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/late_interaction/colbert.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/sparse/minicoil.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/text/onnx_embedding.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/text/onnx_text_model.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/sparse/sparse_text_embedding.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/sparse/splade_pp.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/image/onnx_embedding.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/late_interaction_multimodal/colpali.py (1)

fastembed/common/types.py (1)

Device (10-13)

fastembed/text/custom_text_embedding.py (1)

fastembed/common/types.py (1)

Device (10-13)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Python 3.13.x on ubuntu-latest test
GitHub Check: Python 3.12.x on ubuntu-latest test
GitHub Check: Python 3.13.x on windows-latest test
GitHub Check: Python 3.11.x on ubuntu-latest test
GitHub Check: Python 3.13.x on macos-latest test

🔇 Additional comments (16)

fastembed/sparse/splade_pp.py (2)

6-6: LGTM!

The Device import is correctly added to support the new type annotation.

77-77: LGTM!

The parameter signature correctly updates the cuda parameter to accept both bool and Device types with Device.AUTO as the default, enabling automatic GPU detection.

fastembed/text/custom_text_embedding.py (1)

13-35: CustomTextEmbedding cuda signature aligns with new Device API

Importing Device and widening cuda to bool | Device = Device.AUTO, then forwarding it unchanged to OnnxTextEmbedding, is consistent and preserves existing behavior for bool callers while enabling the enum. No issues spotted.

fastembed/sparse/sparse_text_embedding.py (1)

5-61: SparseTextEmbedding correctly forwards Device-aware cuda to underlying models

The Device import and cuda: bool | Device = Device.AUTO parameter, forwarded as cuda=cuda when instantiating registry models, look correct and keep the wrapper thin and consistent with the new device API.

fastembed/sparse/minicoil.py (1)

13-121: MiniCOIL Device-aware cuda integration looks consistent

The cuda: bool | Device = Device.AUTO parameter, updated docstring, and propagation of self.cuda into load_onnx_model and _embed_documents all look coherent and in line with the rest of the refactor. No issues found.

fastembed/sparse/bm42.py (1)

12-113: Bm42 cuda parameter refactor is correctly wired

The change to cuda: bool | Device = Device.AUTO, along with storing self.cuda and passing it through to load_onnx_model and _embed_documents, is consistent with the new Device enum usage. The updated docstring matches the signature. Looks good.

fastembed/late_interaction/colbert.py (1)

8-184: Colbert’s Device-aware cuda parameter is consistent with other ONNX text models

The import of Device, the widened cuda: bool | Device = Device.AUTO signature, and its propagation into load_onnx_model and _embed_documents all look correct and match the pattern used in other text embeddings. No additional issues detected.

fastembed/late_interaction_multimodal/colpali.py (1)

8-89: ColPali Device-aware cuda wiring for text and image embeddings looks good

Using cuda: bool | Device = Device.AUTO, storing it on self.cuda, and forwarding it consistently in load_onnx_model, embed_text, and embed_image keeps the multimodal behavior intact while enabling the new Device enum. No issues noticed.

fastembed/parallel_processor.py (1)

13-111: ParallelWorkerPool's cuda default is part of a consistent internal API design – no public API impact

ParallelWorkerPool is an internal class not exported in the public API, and Worker is an abstract base for internal use only. The single internal Worker subclass (Bm25Worker) receives the cuda parameter via **kwargs and forwards it correctly, making it compatible with the Device.AUTO default regardless of parameter type. This change aligns with the library's systematic adoption of Device.AUTO across all public embedding APIs, and poses no breaking changes to external code.

Likely an incorrect or invalid review comment.

fastembed/text/onnx_text_model.py (2)

10-10: LGTM! Device import added correctly.

The import statement properly includes the new Device enum type needed for the cuda parameter updates.

55-55: No issue found. The project's minimum Python version is 3.10.0 (python = ">=3.10.0" in pyproject.toml), which fully supports PEP 604 union syntax. The type annotation bool | Device is compatible with the project's Python version requirement.

Likely an incorrect or invalid review comment.

fastembed/image/onnx_image_model.py (2)

11-11: LGTM! Device import added correctly.

The import statement properly includes the new Device enum type. The past review comments suggesting Union are addressed by using the modern PEP 604 syntax (bool | Device).

56-56: LGTM! Type signatures updated consistently.

The cuda parameter type has been updated from bool to bool | Device with a default of Device.AUTO in both _load_onnx_model and _embed_images. The parameters are properly propagated through the call chains.

Also applies to: 100-100

fastembed/image/onnx_embedding.py (3)

4-4: LGTM! Device import added correctly.

The import statement properly includes the new Device enum type needed for the cuda parameter updates.

70-70: LGTM! Type signature updated consistently.

The cuda parameter type has been properly updated to bool | Device with a default of Device.AUTO, consistent with the changes across other modules.

86-90: Excellent docstring updates!

The docstring updates are comprehensive and clearly document:

The new Union[bool, Device] type

Valid values including Device.AUTO and Device.CUDA

Mutual exclusivity with the providers parameter

Integration with device_ids parameter

This is a good model for docstring updates in the other files.

* new: use cuda if available * fix: fix warning msg * fix: add missing import

coderabbitai bot reviewed Jun 24, 2025

View reviewed changes

joein force-pushed the detect-gpu branch from 1c438fc to 256443e Compare December 8, 2025 10:27

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

fastembed/common/onnx_model.py Show resolved Hide resolved

fastembed/sparse/splade_pp.py Show resolved Hide resolved

joein requested review from generall and tbung December 8, 2025 11:40

tbung requested changes Dec 10, 2025

View reviewed changes

joein added 3 commits December 10, 2025 20:14

new: use cuda if available

899b92f

fix: fix warning msg

dfdf986

fix: add missing import

e69a58e

joein force-pushed the detect-gpu branch from 44c4d2b to e69a58e Compare December 10, 2025 13:14

joein requested a review from tbung December 10, 2025 13:15

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

tbung approved these changes Dec 10, 2025

View reviewed changes

joein merged commit 685fd9b into main Dec 10, 2025
13 checks passed

joein deleted the detect-gpu branch December 10, 2025 13:23

kacperlukawski pushed a commit to kacperlukawski/fastembed that referenced this pull request Dec 15, 2025

new: use cuda if available (qdrant#537)

e0b20f0

* new: use cuda if available * fix: fix warning msg * fix: add missing import

Conversation

joein commented Jun 24, 2025

Uh oh!

coderabbitai bot commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Locate Device enum/class definition across codebase

Show any Enum base classes to confirm how Device is defined

Show all occurrences of Device.AUTO to see its context

Inspect cuda parameter usage in onnx_multimodal_model.py

Show the full signature of Colbert.init to verify cuda and device_ids support

Show the full init signature of ParallelWorkerPool in parallel_processor.py

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tbung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jun 24, 2025 •

edited

Loading