Draft: QuantizedDType for normal tensors #3174

LLukas22 · 2025-11-09T13:19:56Z

Disclaimer

This PR is still a work in progress and needs some polishing.
The main goal of opening it early is to gather feedback on whether this approach is heading in the right direction.

Overview

The primary goal of this PR is to introduce QuantizedDType support.
These data types allow regular tensors to store and operate on quantized data, making it much easier to add new quantization schemes to Candle in the future without relying on custom ops.

What’s Included

1. New `candle-macros` crates

candle-macros: contains the procedural macros that generate the dispatch code for QuantizedDTypes.
candle-macros-types: defines the traits that quantized types can implement to provide backend-specific support.

2. `QuantizedType` trait

Each quantization implements the QuantizedType trait, which defines:

a static NAME
functions to calculate storage size for the quantized format

Quantizations can optionally implement one or more of the following backend traits:

QuantizedCpuOps
QuantizedCudaOps
QuantizedMetalOps

These traits define the de/quantization logic and backend-specific matmul implementations (e.g., f32 × quantized).

3. The `register_quantized_types!` macro

This macro generates:

a QuantizedDType enum
all required dispatch functions to call backend ops efficiently (minimizing runtime overhead)

The enum is then integrated into Candle Core as a new DType::Quantized(QuantizedDType) variant.

Tensors using this type:

support most operations through implicit dequantization (currently to f32)
dispatch directly to backend-specific matmul implementations when data types match

4. External Quantization Support

A register_external_quantized_type! macro is also included.
This will allow external crates to register their own quantization types without modifying Candle Core directly.

Current Limitations

Quantized tensors can currently only be created via .to_dtype() — I haven’t yet found a clean way to load them from files.
Using f32 as the intermediate type isn’t ideal for some backends (like CUDA) and may need refinement.
The Metal backend implementation is not yet complete, as I don’t have access to Apple hardware for testing.

LLukas22 added 18 commits October 21, 2025 17:15

Add candle-macros

fa52380

Try external registry

e31685f

get core building

06b2b0f

Wire up ggml types

7a99d9b

get matmul fastpath working

ce05995

compiles + clippy

43f2170

unittest macros

cb49c55

Merge remote-tracking branch 'upstream/main' into quantized-dtypes-main

a824f8e

Get cuda compiling

e922c37

Switch to trait based approach

940dc23

Try specialized resolution

d72f741

compiles for cuda

1d1744e

Pass device to cuda functions

7a35a0c

Get quantized cuda actually working

106fd0f

Make CudaDevice Generic

e38a07b

make quantized_dispatch private

bbcbd75

Merge remote-tracking branch 'upstream/main' into quantized-dtypes-main

61704f7

clippy

cf97e80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: QuantizedDType for normal tensors #3174

Draft: QuantizedDType for normal tensors #3174

LLukas22 commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Draft: QuantizedDType for normal tensors #3174

Are you sure you want to change the base?

Draft: QuantizedDType for normal tensors #3174

Conversation

LLukas22 commented Nov 9, 2025

Disclaimer

Overview

What’s Included

1. New candle-macros crates

2. QuantizedType trait

3. The register_quantized_types! macro

4. External Quantization Support

Current Limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. New `candle-macros` crates

2. `QuantizedType` trait

3. The `register_quantized_types!` macro