Skip to content

Conversation

@joein
Copy link
Member

@joein joein commented Jun 24, 2025

No description provided.

@coderabbitai
Copy link

coderabbitai bot commented Jun 24, 2025

📝 Walkthrough

Walkthrough

Adds a Device enum (CPU, CUDA, AUTO) and updates many public constructors and ONNX-related method signatures to accept cuda: bool | Device = Device.AUTO instead of a boolean with default False. Type imports and docstrings were updated across text, image, sparse, rerank, late-interaction, multimodal, parallel, and ONNX modules. In fastembed/common/onnx_model.py, available providers and CUDA availability are computed up-front, an explicit_cuda distinction introduced, provider-selection and warning messages refined, device_id is propagated to CUDA providers when provided, and a redundant providers query was removed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Widespread but largely homogeneous signature/type updates.
  • Focused logic changes in ONNX loader require careful review.

Areas to pay extra attention:

  • fastembed/common/onnx_model.py — provider selection, explicit_cuda handling, device_id propagation, and warning messages.
  • ONNX-backed model classes that forward cuda — ensure Device.AUTO / Device.CUDA / True semantics are consistent when propagated.
  • Public API typing/docstring changes — check for typing regressions and default-behavior implications.

Possibly related PRs

Suggested reviewers

  • tbung
  • generall

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.63% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive No description was provided by the author, making it impossible to assess whether the description relates to the changeset. Please add a pull request description explaining the motivation, implementation details, and impact of the Device enum and automatic CUDA detection feature.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'new: use cuda if available' accurately describes the main change: introducing Device.AUTO to enable automatic CUDA detection when available, replacing the previous default of cuda=False.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch detect-gpu

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
fastembed/sparse/minicoil.py (1)

95-99: Excellent docstring updates!

The docstring has been properly updated to reflect the new parameter type and clearly explains when device_ids should be used with the different cuda values. This level of documentation consistency should be applied to other files as well.

Consider updating docstrings in the other files (custom_text_cross_encoder.py, sparse_text_embedding.py, custom_text_embedding.py) to match this level of detail for the cuda parameter documentation.

fastembed/sparse/splade_pp.py (1)

88-89: Fix incomplete docstring.

The docstring for the cuda parameter is incomplete - it says "Defaults to Device." but should specify "Defaults to Device.AUTO."

-            cuda (Union[bool, Device], optional): Whether to use cuda for inference. Mutually exclusive with `providers`
-                Defaults to Device.
+            cuda (Union[bool, Device], optional): Whether to use cuda for inference. Mutually exclusive with `providers`
+                Defaults to Device.AUTO.
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1)

65-65: ```shell
#!/bin/bash

Locate Device enum/class definition across codebase

rg -n "class Device" -C5

Show any Enum base classes to confirm how Device is defined

rg -n "class .*Enum" -C3 --type py

Show all occurrences of Device.AUTO to see its context

rg -n "Device.AUTO" -C3

Inspect cuda parameter usage in onnx_multimodal_model.py

rg -n "def init" -C5 fastembed/late_interaction_multimodal/onnx_multimodal_model.py
rg -n "cuda" -C3 fastembed/late_interaction_multimodal/onnx_multimodal_model.py


</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between acec31277b78b24200ca900ba58c69d729abb663 and 1c438fccf8dd870a2a5bebef16410f48b9278af8.

</details>

<details>
<summary>📒 Files selected for processing (23)</summary>

* `fastembed/common/onnx_model.py` (2 hunks)
* `fastembed/common/types.py` (2 hunks)
* `fastembed/image/image_embedding.py` (2 hunks)
* `fastembed/image/onnx_embedding.py` (3 hunks)
* `fastembed/image/onnx_image_model.py` (3 hunks)
* `fastembed/late_interaction/colbert.py` (3 hunks)
* `fastembed/late_interaction/late_interaction_text_embedding.py` (2 hunks)
* `fastembed/late_interaction_multimodal/colpali.py` (3 hunks)
* `fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py` (2 hunks)
* `fastembed/late_interaction_multimodal/onnx_multimodal_model.py` (4 hunks)
* `fastembed/parallel_processor.py` (2 hunks)
* `fastembed/rerank/cross_encoder/custom_text_cross_encoder.py` (2 hunks)
* `fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py` (3 hunks)
* `fastembed/rerank/cross_encoder/onnx_text_model.py` (4 hunks)
* `fastembed/rerank/cross_encoder/text_cross_encoder.py` (2 hunks)
* `fastembed/sparse/bm42.py` (3 hunks)
* `fastembed/sparse/minicoil.py` (3 hunks)
* `fastembed/sparse/sparse_text_embedding.py` (2 hunks)
* `fastembed/sparse/splade_pp.py` (3 hunks)
* `fastembed/text/custom_text_embedding.py` (3 hunks)
* `fastembed/text/onnx_embedding.py` (3 hunks)
* `fastembed/text/onnx_text_model.py` (3 hunks)
* `fastembed/text/text_embedding.py` (2 hunks)

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms (15)</summary>

* GitHub Check: Python 3.9.x on windows-latest test
* GitHub Check: Python 3.11.x on macos-latest test
* GitHub Check: Python 3.11.x on windows-latest test
* GitHub Check: Python 3.10.x on windows-latest test
* GitHub Check: Python 3.13.x on windows-latest test
* GitHub Check: Python 3.11.x on ubuntu-latest test
* GitHub Check: Python 3.13.x on macos-latest test
* GitHub Check: Python 3.13.x on ubuntu-latest test
* GitHub Check: Python 3.12.x on windows-latest test
* GitHub Check: Python 3.10.x on macos-latest test
* GitHub Check: Python 3.12.x on macos-latest test
* GitHub Check: Python 3.9.x on ubuntu-latest test
* GitHub Check: Python 3.10.x on ubuntu-latest test
* GitHub Check: Python 3.12.x on ubuntu-latest test
* GitHub Check: Python 3.9.x on macos-latest test

</details>

<details>
<summary>🔇 Additional comments (56)</summary><blockquote>

<details>
<summary>fastembed/common/types.py (1)</summary>

`29-32`: **LGTM! Clean Device enum implementation.**

The enum correctly inherits from both `str` and `Enum`, which allows values to be used as strings while maintaining type safety. The three device options (CPU, CUDA, AUTO) provide clear and explicit device specification.

</details>
<details>
<summary>fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (2)</summary>

`5-5`: **LGTM! Correct import of Device enum.**

---

`18-18`: ```shell
#!/bin/bash
set -e

# Locate the Device class definition
rg -n "class Device" .

# Find all references to Device.AUTO
rg -n "Device.AUTO" .

# Inspect usage of the `cuda` parameter in the rerank module
rg -n "cuda" fastembed/rerank
fastembed/sparse/sparse_text_embedding.py (2)

5-5: LGTM! Correct import of Device enum.


60-60: Consistent parameter update with other files.

The cuda parameter type change maintains the same pattern as other files in the codebase. The Union type preserves backward compatibility while enabling the new Device enum functionality.

fastembed/text/custom_text_embedding.py (2)

1-1: LGTM! Correct imports for the new parameter type.

Both Union and Device are properly imported to support the new cuda parameter type.

Also applies to: 14-14


35-35: Consistent parameter type update.

The cuda parameter follows the same Union type pattern established across the codebase, maintaining backward compatibility.

fastembed/sparse/minicoil.py (2)

13-13: LGTM! Correct import of Device enum.


75-75: Consistent parameter type update.

The cuda parameter follows the established Union type pattern across the codebase.

fastembed/sparse/bm42.py (2)

65-65: LGTM! Improved device specification flexibility.

The change from a boolean cuda parameter to Union[bool, Device] with Device.AUTO default provides more explicit device control while maintaining backward compatibility. The updated docstrings correctly reflect the new parameter types and usage.

Also applies to: 83-87


12-12: Verify the Device enum definition and usage.

The import looks correct, but let's verify the Device enum exists with the expected values and is properly used throughout the codebase.

#!/bin/bash
# Verify Device enum exists and has expected values
echo "Checking Device enum definition..."
ast-grep --pattern 'class Device($$$)'

echo -e "\nChecking Device enum values..."
rg -A 10 "class Device" --type py

echo -e "\nVerifying Device.AUTO, Device.CPU, Device.CUDA usage..."
rg "Device\.(AUTO|CPU|CUDA)" --type py -A 2 -B 2
fastembed/text/onnx_embedding.py (2)

3-3: LGTM! Clean import of Device enum.

The import follows the established pattern and properly includes the Device enum alongside other type definitions.


205-205: LGTM! Consistent implementation of device enum.

The parameter type change and documentation updates are well-implemented. The docstring clearly explains when device_ids should be used with the new Device enum values.

Also applies to: 221-225

fastembed/image/onnx_image_model.py (2)

11-11: LGTM! Proper Device enum import.

The import is correctly placed and follows the established pattern for including the Device enum.


56-56: LGTM! Consistent method signature updates.

Both _load_onnx_model and _embed_images methods have been consistently updated to use the new Device enum. The type changes maintain backward compatibility while providing enhanced device specification capabilities.

Also applies to: 98-98

fastembed/sparse/splade_pp.py (2)

6-6: LGTM! Correct Device enum import.

The import follows the established pattern and properly includes the Device enum.


72-72: LGTM! Proper parameter type update.

The cuda parameter type change and device_ids documentation are correctly implemented and consistent with the pattern across other files.

Also applies to: 91-92

fastembed/late_interaction/late_interaction_text_embedding.py (2)

5-5: LGTM! Consistent Device enum import.

The import properly includes the Device enum alongside other type definitions following the established pattern.


57-57: ```shell
#!/bin/bash

Show the full signature of Colbert.init to verify cuda and device_ids support

echo ">> Full Colbert.init signature and parameters:"
rg -A15 "def init" -n fastembed/late_interaction/colbert.py


</details>
<details>
<summary>fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (2)</summary>

`5-5`: **LGTM: Import statement updated correctly.**

The `Device` enum is properly imported alongside the existing `NumpyArray` type.

---

`60-60`: **Verify Device enum definition and backward compatibility.**

The change from `bool` to `Union[bool, Device]` with default `Device.AUTO` looks correct and maintains backward compatibility. However, please verify that the `Device` enum is properly defined and that `Device.AUTO` is an appropriate default value.



```shell
#!/bin/bash
# Verify Device enum definition and usage
echo "Searching for Device enum definition..."
ast-grep --pattern 'class Device'

echo "Checking Device enum values..."
rg -A 10 "class Device" --type py

echo "Verifying Device.AUTO usage..."
rg "Device\.AUTO" --type py
fastembed/parallel_processor.py (3)

11-11: LGTM: Union import added correctly.

The Union type is properly imported to support the new parameter type annotation.


13-13: LGTM: Device import added correctly.

The Device enum is properly imported from the common types module.


99-99: LGTM: Parameter type updated consistently.

The cuda parameter type change from bool to Union[bool, Device] with default Device.AUTO is consistent with the broader refactoring pattern. The parameter is correctly passed to worker processes.

fastembed/image/image_embedding.py (2)

4-4: LGTM: Device import added alongside existing types.

The Device enum is properly imported alongside NumpyArray from the common types module.


54-54: LGTM: Parameter type updated consistently.

The cuda parameter type change maintains backward compatibility while enabling the new Device enum usage. The parameter is correctly passed to the underlying embedding model.

fastembed/rerank/cross_encoder/text_cross_encoder.py (3)

1-1: LGTM: Union import added for type annotation.

The Union type is properly imported to support the new parameter type annotation.


5-5: LGTM: Device import added correctly.

The Device enum is properly imported from the common types module.


60-60: LGTM: Parameter type updated consistently.

The cuda parameter type change from bool to Union[bool, Device] with default Device.AUTO follows the established pattern and is correctly passed to the underlying cross encoder model.

fastembed/text/text_embedding.py (2)

5-5: LGTM: Device import added alongside existing types.

The Device enum is properly imported alongside other common types like NumpyArray and OnnxProvider.


85-85: LGTM: Parameter type updated consistently.

The cuda parameter type change maintains backward compatibility while enabling the new Device enum usage. The parameter is correctly passed to the underlying embedding model.

fastembed/image/onnx_embedding.py (3)

4-4: LGTM: Device enum import added correctly.

The import statement properly adds the Device enum from the common types module.


69-69: LGTM: Parameter type and default updated appropriately.

The change from bool = False to Union[bool, Device] = Device.AUTO maintains backward compatibility while providing more explicit device control. The Device.AUTO default is more user-friendly than requiring explicit False.


85-89: LGTM: Documentation updated accurately.

The docstring updates correctly reflect the new parameter types and usage conditions, clearly stating when device_ids should be used with the various cuda values.

fastembed/late_interaction/colbert.py (3)

7-7: LGTM: Consistent Device enum import.

Import follows the same pattern as other updated files.


125-125: LGTM: Parameter signature updated consistently.

The cuda parameter change maintains the same pattern as other files in this refactor.


141-145: LGTM: Documentation consistently updated.

Docstrings match the pattern established in other files, correctly documenting the new parameter usage.

fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (3)

7-7: LGTM: Device import added consistently.

Follows the established pattern for this refactor.


84-84: LGTM: Parameter type updated consistently.

Maintains the same signature pattern as other refactored classes.


100-104: LGTM: Documentation updated consistently.

Docstring changes match the pattern and accurately describe the new parameter usage.

fastembed/common/onnx_model.py (3)

12-12: LGTM: Device enum import added to core module.

Properly imports the Device enum for use in the core ONNX model logic.


61-61: LGTM: Core parameter signature updated.

The updated signature in the core _load_onnx_model method establishes the foundation for the Device enum usage across the codebase.


66-81: LGTM: Enhanced device selection logic.

The updated logic provides several improvements:

  • Queries available providers once at the start for efficiency
  • Clearly distinguishes explicit CUDA requests (True or Device.CUDA) from auto-detection
  • Provides helpful warnings for mutually exclusive parameters
  • Handles Device.AUTO mode intelligently by checking CUDA availability
  • Maintains backward compatibility with boolean values

The logic correctly prioritizes explicit user choices while providing smart defaults.

fastembed/late_interaction_multimodal/colpali.py (3)

8-8: LGTM: Device import added consistently.

Follows the established import pattern for this refactor.


52-52: LGTM: Parameter signature updated consistently.

Maintains the same pattern as all other refactored classes in the codebase.


68-72: LGTM: Documentation updated consistently.

Docstring changes follow the established pattern and accurately describe the parameter usage conditions.

fastembed/text/onnx_text_model.py (3)

10-10: LGTM: Import addition is consistent.

The Device import is correctly added to support the new union type parameter.


109-109: ```shell
#!/bin/bash

Show the full init signature of ParallelWorkerPool in parallel_processor.py

sed -n '90,120p' fastembed/parallel_processor.py


---

`55-55`: ```shell
#!/bin/bash
# Display the full _load_onnx_model signature in the parent class
rg -n "def _load_onnx_model" -C20 fastembed/common/onnx_model.py
fastembed/rerank/cross_encoder/onnx_text_model.py (3)

4-4: LGTM: Import additions are correct.

Both Union and Device imports are properly added to support the new parameter type.

Also applies to: 15-15


34-34: LGTM: Consistent parameter type update.

The parameter type change is consistent with the broader refactoring and provides better device specification flexibility.


95-95: LGTM: Consistent parameter type update.

The parameter type change follows the established pattern and maintains consistency across the codebase.

fastembed/late_interaction_multimodal/onnx_multimodal_model.py (4)

14-14: LGTM: Import addition is consistent.

The Device import correctly supports the new union type parameter.


65-65: LGTM: Consistent parameter type update.

The parameter type change maintains consistency with the broader refactoring effort.


121-121: LGTM: Consistent parameter type update.

The parameter type change follows the established pattern across the codebase.


188-188: LGTM: Consistent parameter type update.

The parameter type change maintains consistency with other methods in the refactoring.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c438fc and 256443e.

📒 Files selected for processing (23)
  • fastembed/common/onnx_model.py (2 hunks)
  • fastembed/common/types.py (2 hunks)
  • fastembed/image/image_embedding.py (2 hunks)
  • fastembed/image/onnx_embedding.py (3 hunks)
  • fastembed/image/onnx_image_model.py (3 hunks)
  • fastembed/late_interaction/colbert.py (3 hunks)
  • fastembed/late_interaction/late_interaction_text_embedding.py (2 hunks)
  • fastembed/late_interaction_multimodal/colpali.py (3 hunks)
  • fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (2 hunks)
  • fastembed/late_interaction_multimodal/onnx_multimodal_model.py (4 hunks)
  • fastembed/parallel_processor.py (2 hunks)
  • fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (2 hunks)
  • fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (3 hunks)
  • fastembed/rerank/cross_encoder/onnx_text_model.py (4 hunks)
  • fastembed/rerank/cross_encoder/text_cross_encoder.py (2 hunks)
  • fastembed/sparse/bm42.py (3 hunks)
  • fastembed/sparse/minicoil.py (3 hunks)
  • fastembed/sparse/sparse_text_embedding.py (2 hunks)
  • fastembed/sparse/splade_pp.py (3 hunks)
  • fastembed/text/custom_text_embedding.py (3 hunks)
  • fastembed/text/onnx_embedding.py (3 hunks)
  • fastembed/text/onnx_text_model.py (3 hunks)
  • fastembed/text/text_embedding.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • fastembed/text/onnx_text_model.py
  • fastembed/image/onnx_image_model.py
  • fastembed/late_interaction_multimodal/colpali.py
  • fastembed/rerank/cross_encoder/text_cross_encoder.py
  • fastembed/parallel_processor.py
  • fastembed/text/onnx_embedding.py
  • fastembed/common/types.py
  • fastembed/sparse/sparse_text_embedding.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-12T10:48:30.978Z
Learnt from: joein
Repo: qdrant/fastembed PR: 574
File: fastembed/sparse/sparse_embedding_base.py:2-2
Timestamp: 2025-11-12T10:48:30.978Z
Learning: In fastembed codebase, when using numpy NDArray types in dataclass fields, keep Union syntax instead of PEP 604 pipe operator (|) because dataclasses evaluate annotations at runtime and NDArray types don't support the __or__ operator. Add a comment explaining the constraint.

Applied to files:

  • fastembed/common/onnx_model.py
  • fastembed/image/image_embedding.py
  • fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py
  • fastembed/rerank/cross_encoder/custom_text_cross_encoder.py
  • fastembed/text/text_embedding.py
  • fastembed/text/custom_text_embedding.py
🧬 Code graph analysis (15)
fastembed/sparse/minicoil.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/image/onnx_embedding.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/common/onnx_model.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/image/image_embedding.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (2)
fastembed/common/onnx_model.py (1)
  • OnnxOutputContext (20-23)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/sparse/bm42.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/sparse/splade_pp.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (2)
fastembed/common/model_description.py (1)
  • BaseModelDescription (24-31)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/text/text_embedding.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/text/custom_text_embedding.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/rerank/cross_encoder/onnx_text_model.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/late_interaction/colbert.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
fastembed/late_interaction/late_interaction_text_embedding.py (1)
fastembed/common/types.py (1)
  • Device (29-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Python 3.13.x on windows-latest test
  • GitHub Check: Python 3.13.x on macos-latest test
  • GitHub Check: Python 3.9.x on windows-latest test
  • GitHub Check: Python 3.9.x on macos-latest test
🔇 Additional comments (16)
fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (1)

1-1: Device enum integration for cuda looks consistent

Importing Union/Device, updating cuda to Union[bool, Device] = Device.AUTO, and aligning the docstrings with the new cuda/device_ids semantics are all coherent and match how cuda is passed into _load_onnx_model and _rerank_pairs. No issues from this change on this path.

Also applies to: 7-7, 84-85, 100-104

fastembed/text/custom_text_embedding.py (1)

1-1: CUDA Device support wired correctly into CustomTextEmbedding

The updated typing/imports and the new cuda: Union[bool, Device] = Device.AUTO parameter are correctly forwarded to OnnxTextEmbedding, so this class stays aligned with the new device-selection API.

Also applies to: 14-14, 35-35

fastembed/late_interaction/late_interaction_text_embedding.py (1)

5-5: Late-interaction wrapper correctly adopts Device-typed cuda

Importing Device and changing cuda to Union[bool, Device] = Device.AUTO in the constructor, then passing it through to the selected embedding model, keeps this facade consistent with the rest of the API without altering other behavior.

Also applies to: 57-57

fastembed/sparse/minicoil.py (1)

13-13: MiniCOIL CUDA configuration matches new Device semantics

The switch to cuda: Union[bool, Device] = Device.AUTO plus the updated docstrings (including valid combinations for device_ids) are consistent with how cuda/device_ids are later passed into _load_onnx_model and _embed_documents. No further changes needed here.

Also applies to: 82-82, 102-106

fastembed/sparse/bm42.py (1)

12-12: Bm42 now correctly exposes Device-aware cuda configuration

The new cuda: Union[bool, Device] = Device.AUTO signature and corresponding docstring updates (including allowed values for use with device_ids) are coherent with how cuda is later consumed. Looks good.

Also applies to: 73-73, 91-95

fastembed/late_interaction/colbert.py (1)

8-8: Colbert constructor correctly aligned with Device-based cuda API

The import of Device, the cuda: Union[bool, Device] = Device.AUTO signature, and the refreshed docstrings for cuda/device_ids are all in sync with the usage in load_onnx_model and embed/query_embed. No issues spotted.

Also applies to: 146-146, 162-166

fastembed/text/text_embedding.py (1)

5-5: TextEmbedding now defaults to automatic device selection

Importing Device and updating cuda to Union[bool, Device] = Device.AUTO, then passing it through to all registered embedding implementations, cleanly centralizes the “use CUDA if available” behavior at this facade. Callers who need CPU-only can still pass cuda=False or Device.CPU.

Also applies to: 85-85

fastembed/common/onnx_model.py (1)

63-90: LGTM on the device selection logic.

The provider selection logic correctly handles all combinations:

  • Explicit CUDA request (cuda=True or cuda=Device.CUDA) → CUDAExecutionProvider
  • Auto-detection (cuda=Device.AUTO) with available CUDA → CUDAExecutionProvider
  • Explicit CPU (cuda=Device.CPU or cuda=False) or AUTO without CUDA → CPUExecutionProvider

The backward compatibility with boolean values is preserved.

fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (1)

54-79: LGTM!

The cuda parameter type update is consistent with the broader API changes, and the value is correctly forwarded to the underlying model implementation.

fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (1)

12-36: LGTM!

The parameter type change aligns with the codebase-wide update, and all arguments are correctly forwarded to the parent class.

fastembed/image/onnx_embedding.py (1)

63-106: LGTM!

The parameter type change, docstring updates, and forwarding logic are all correctly implemented and consistent with the PR-wide changes.

fastembed/image/image_embedding.py (1)

48-73: LGTM!

The cuda parameter type update follows the consistent pattern across the codebase, with correct forwarding to the underlying embedding model.

fastembed/rerank/cross_encoder/onnx_text_model.py (2)

4-4: LGTM! Import additions support the new device typing.

The addition of Union and Device imports is correct and necessary for the updated type annotations.

Also applies to: 15-15


28-46: No verification issues found — parent class and ParallelWorkerPool already support Union[bool, Device].

The cuda: Union[bool, Device] = Device.AUTO parameter signature is consistent across the codebase. The parent class OnnxModel._load_onnx_model() and ParallelWorkerPool.__init__() already use the same type signature, and all type handling is correct (proper enum comparisons at lines 71 and 84 in OnnxModel). No breaking boolean checks exist.

fastembed/late_interaction_multimodal/onnx_multimodal_model.py (2)

14-14: LGTM! Device import added correctly.

The Device import is properly added to support the updated type annotations. Union was already imported on line 5.


59-77: LGTM! Method signatures updated consistently.

All three methods (_load_onnx_model, _embed_documents, _embed_images) correctly update the cuda parameter to Union[bool, Device] = Device.AUTO. The parameters are properly passed through to parent class and ParallelWorkerPool.

Note: Type compatibility verification requested in the previous file review applies here as well.

Also applies to: 115-170, 186-241

@joein joein requested review from generall and tbung December 8, 2025 11:40
Copy link

@tbung tbung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rebase and all Union removed, I didn't mark them all.

from dataclasses import dataclass
from pathlib import Path
from typing import Any, Generic, Iterable, Optional, Sequence, Type, TypeVar
from typing import Any, Generic, Iterable, Optional, Sequence, Type, TypeVar, Union
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs rebase and removal of Union and Optional

threads: Optional[int],
providers: Optional[Sequence[OnnxProvider]] = None,
cuda: bool = False,
cuda: Union[bool, Device] = Device.AUTO,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

threads: Optional[int] = None,
providers: Optional[Sequence[OnnxProvider]] = None,
cuda: bool = False,
cuda: Union[bool, Device] = Device.AUTO,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

threads: Optional[int] = None,
providers: Optional[Sequence[OnnxProvider]] = None,
cuda: bool = False,
cuda: Union[bool, Device] = Device.AUTO,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

threads: Optional[int],
providers: Optional[Sequence[OnnxProvider]] = None,
cuda: bool = False,
cuda: Union[bool, Device] = Device.AUTO,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

parallel: Optional[int] = None,
providers: Optional[Sequence[OnnxProvider]] = None,
cuda: bool = False,
cuda: Union[bool, Device] = Device.AUTO,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

threads: Optional[int] = None,
providers: Optional[Sequence[OnnxProvider]] = None,
cuda: bool = False,
cuda: Union[bool, Device] = Device.AUTO,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

threads: Optional[int] = None,
providers: Optional[Sequence[OnnxProvider]] = None,
cuda: bool = False,
cuda: Union[bool, Device] = Device.AUTO,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

threads: Optional[int] = None,
providers: Optional[Sequence[OnnxProvider]] = None,
cuda: bool = False,
cuda: Union[bool, Device] = Device.AUTO,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

threads: Optional[int] = None,
providers: Optional[Sequence[OnnxProvider]] = None,
cuda: bool = False,
cuda: Union[bool, Device] = Device.AUTO,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
fastembed/sparse/splade_pp.py (1)

93-94: Fix docstring: "Defaults to Device." should be "Defaults to Device.AUTO."

The docstring states "Defaults to Device." but the code at line 77 shows the default is Device.AUTO. This inconsistency was already flagged in a previous review.

Apply this diff to fix the docstring:

-            cuda (Union[bool, Device], optional): Whether to use cuda for inference. Mutually exclusive with `providers`
-                Defaults to Device.
+            cuda (Union[bool, Device], optional): Whether to use cuda for inference. Mutually exclusive with `providers`.
+                Defaults to Device.AUTO.
fastembed/text/onnx_embedding.py (1)

3-241: OnnxTextEmbedding correctly centralizes Device-aware cuda (minor docstyle nit)

The switch to cuda: bool | Device = Device.AUTO, storing self.cuda, and passing it through in both embed and load_onnx_model is consistent and matches the intended “auto CUDA if available” behavior across the stack. The OnnxProvider/Device import consolidation also looks fine.

Minor nit: the docstring still refers to cuda (Union[bool, Device], ...) while the code uses bool | Device. Consider aligning the docstring type notation with the code style for clarity.

🧹 Nitpick comments (4)
fastembed/text/onnx_text_model.py (2)

49-68: Consider updating method docstrings.

If _load_onnx_model has a docstring documenting the cuda parameter, it should be updated to reflect the new type (Union[bool, Device]) and default value (Device.AUTO). Similar to the comprehensive docstring updates in fastembed/image/onnx_embedding.py (lines 86-90).


103-117: Consider updating method docstrings.

If _embed_documents has a docstring documenting the cuda parameter, it should be updated to reflect the new type (Union[bool, Device]) and default value (Device.AUTO).

fastembed/image/onnx_image_model.py (2)

50-69: Consider updating method docstrings.

If _load_onnx_model has a docstring documenting the cuda parameter, it should be updated to reflect the new type and default value, similar to the updates in fastembed/image/onnx_embedding.py.


92-106: Consider updating method docstrings.

If _embed_images has a docstring documenting the cuda parameter, it should be updated to reflect the new type (Union[bool, Device]) and default value (Device.AUTO).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44c4d2b and e69a58e.

📒 Files selected for processing (23)
  • fastembed/common/onnx_model.py (2 hunks)
  • fastembed/common/types.py (1 hunks)
  • fastembed/image/image_embedding.py (2 hunks)
  • fastembed/image/onnx_embedding.py (3 hunks)
  • fastembed/image/onnx_image_model.py (3 hunks)
  • fastembed/late_interaction/colbert.py (3 hunks)
  • fastembed/late_interaction/late_interaction_text_embedding.py (2 hunks)
  • fastembed/late_interaction_multimodal/colpali.py (3 hunks)
  • fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (2 hunks)
  • fastembed/late_interaction_multimodal/onnx_multimodal_model.py (4 hunks)
  • fastembed/parallel_processor.py (2 hunks)
  • fastembed/rerank/cross_encoder/custom_text_cross_encoder.py (2 hunks)
  • fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py (3 hunks)
  • fastembed/rerank/cross_encoder/onnx_text_model.py (3 hunks)
  • fastembed/rerank/cross_encoder/text_cross_encoder.py (2 hunks)
  • fastembed/sparse/bm42.py (3 hunks)
  • fastembed/sparse/minicoil.py (3 hunks)
  • fastembed/sparse/sparse_text_embedding.py (2 hunks)
  • fastembed/sparse/splade_pp.py (3 hunks)
  • fastembed/text/custom_text_embedding.py (2 hunks)
  • fastembed/text/onnx_embedding.py (3 hunks)
  • fastembed/text/onnx_text_model.py (3 hunks)
  • fastembed/text/text_embedding.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • fastembed/rerank/cross_encoder/onnx_text_cross_encoder.py
  • fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py
  • fastembed/image/image_embedding.py
  • fastembed/common/onnx_model.py
  • fastembed/rerank/cross_encoder/onnx_text_model.py
  • fastembed/common/types.py
  • fastembed/rerank/cross_encoder/text_cross_encoder.py
  • fastembed/text/text_embedding.py
  • fastembed/late_interaction/late_interaction_text_embedding.py
  • fastembed/late_interaction_multimodal/onnx_multimodal_model.py
  • fastembed/rerank/cross_encoder/custom_text_cross_encoder.py
🧰 Additional context used
🧬 Code graph analysis (12)
fastembed/sparse/bm42.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/parallel_processor.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/image/onnx_image_model.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/late_interaction/colbert.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/sparse/minicoil.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/text/onnx_embedding.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/text/onnx_text_model.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/sparse/sparse_text_embedding.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/sparse/splade_pp.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/image/onnx_embedding.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/late_interaction_multimodal/colpali.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
fastembed/text/custom_text_embedding.py (1)
fastembed/common/types.py (1)
  • Device (10-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Python 3.13.x on ubuntu-latest test
  • GitHub Check: Python 3.12.x on ubuntu-latest test
  • GitHub Check: Python 3.13.x on windows-latest test
  • GitHub Check: Python 3.11.x on ubuntu-latest test
  • GitHub Check: Python 3.13.x on macos-latest test
🔇 Additional comments (16)
fastembed/sparse/splade_pp.py (2)

6-6: LGTM!

The Device import is correctly added to support the new type annotation.


77-77: LGTM!

The parameter signature correctly updates the cuda parameter to accept both bool and Device types with Device.AUTO as the default, enabling automatic GPU detection.

fastembed/text/custom_text_embedding.py (1)

13-35: CustomTextEmbedding cuda signature aligns with new Device API

Importing Device and widening cuda to bool | Device = Device.AUTO, then forwarding it unchanged to OnnxTextEmbedding, is consistent and preserves existing behavior for bool callers while enabling the enum. No issues spotted.

fastembed/sparse/sparse_text_embedding.py (1)

5-61: SparseTextEmbedding correctly forwards Device-aware cuda to underlying models

The Device import and cuda: bool | Device = Device.AUTO parameter, forwarded as cuda=cuda when instantiating registry models, look correct and keep the wrapper thin and consistent with the new device API.

fastembed/sparse/minicoil.py (1)

13-121: MiniCOIL Device-aware cuda integration looks consistent

The cuda: bool | Device = Device.AUTO parameter, updated docstring, and propagation of self.cuda into load_onnx_model and _embed_documents all look coherent and in line with the rest of the refactor. No issues found.

fastembed/sparse/bm42.py (1)

12-113: Bm42 cuda parameter refactor is correctly wired

The change to cuda: bool | Device = Device.AUTO, along with storing self.cuda and passing it through to load_onnx_model and _embed_documents, is consistent with the new Device enum usage. The updated docstring matches the signature. Looks good.

fastembed/late_interaction/colbert.py (1)

8-184: Colbert’s Device-aware cuda parameter is consistent with other ONNX text models

The import of Device, the widened cuda: bool | Device = Device.AUTO signature, and its propagation into load_onnx_model and _embed_documents all look correct and match the pattern used in other text embeddings. No additional issues detected.

fastembed/late_interaction_multimodal/colpali.py (1)

8-89: ColPali Device-aware cuda wiring for text and image embeddings looks good

Using cuda: bool | Device = Device.AUTO, storing it on self.cuda, and forwarding it consistently in load_onnx_model, embed_text, and embed_image keeps the multimodal behavior intact while enabling the new Device enum. No issues noticed.

fastembed/parallel_processor.py (1)

13-111: ParallelWorkerPool's cuda default is part of a consistent internal API design – no public API impact

ParallelWorkerPool is an internal class not exported in the public API, and Worker is an abstract base for internal use only. The single internal Worker subclass (Bm25Worker) receives the cuda parameter via **kwargs and forwards it correctly, making it compatible with the Device.AUTO default regardless of parameter type. This change aligns with the library's systematic adoption of Device.AUTO across all public embedding APIs, and poses no breaking changes to external code.

Likely an incorrect or invalid review comment.

fastembed/text/onnx_text_model.py (2)

10-10: LGTM! Device import added correctly.

The import statement properly includes the new Device enum type needed for the cuda parameter updates.


55-55: No issue found. The project's minimum Python version is 3.10.0 (python = ">=3.10.0" in pyproject.toml), which fully supports PEP 604 union syntax. The type annotation bool | Device is compatible with the project's Python version requirement.

Likely an incorrect or invalid review comment.

fastembed/image/onnx_image_model.py (2)

11-11: LGTM! Device import added correctly.

The import statement properly includes the new Device enum type. The past review comments suggesting Union are addressed by using the modern PEP 604 syntax (bool | Device).


56-56: LGTM! Type signatures updated consistently.

The cuda parameter type has been updated from bool to bool | Device with a default of Device.AUTO in both _load_onnx_model and _embed_images. The parameters are properly propagated through the call chains.

Also applies to: 100-100

fastembed/image/onnx_embedding.py (3)

4-4: LGTM! Device import added correctly.

The import statement properly includes the new Device enum type needed for the cuda parameter updates.


70-70: LGTM! Type signature updated consistently.

The cuda parameter type has been properly updated to bool | Device with a default of Device.AUTO, consistent with the changes across other modules.


86-90: Excellent docstring updates!

The docstring updates are comprehensive and clearly document:

  • The new Union[bool, Device] type
  • Valid values including Device.AUTO and Device.CUDA
  • Mutual exclusivity with the providers parameter
  • Integration with device_ids parameter

This is a good model for docstring updates in the other files.

@joein joein merged commit 685fd9b into main Dec 10, 2025
13 checks passed
@joein joein deleted the detect-gpu branch December 10, 2025 13:23
kacperlukawski pushed a commit to kacperlukawski/fastembed that referenced this pull request Dec 15, 2025
* new: use cuda if available

* fix: fix warning msg

* fix: add missing import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants