You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -176,7 +180,7 @@ Benchmarked on RTX 3090 (24GB), F32 + TF32 tensor cores. Uses a local candle pat
176
180
177
181
</details>
178
182
179
-
Rust wins at short/medium durations. At longer durations PyTorch's Cutlass tensor-core VAE kernels (cuDNN v9 engine API) give it an edge. Without the candle patch, VAE decode is ~3s (100x slower ConvTranspose1d).
183
+
Rust wins at short/medium durations. At longer durations PyTorch's Cutlass tensor-core VAE kernels (cuDNN v9 engine API) give it an edge. Without the [candle patch](https://github.com/huggingface/candle/pull/3383), VAE decode is ~3s (100x slower ConvTranspose1d).
0 commit comments