L40 fails to load PIXTRAL_12B_TRANSFORMERS with OOM #1731

geoHeil · 2025-06-07T11:25:22Z

geoHeil
Jun 7, 2025

I am trying to execute I am trying to execute https://docling-project.github.io/docling/examples/compare_vlm_models/ for the

vlm_model_specs.PIXTRAL_12B_TRANSFORMERS

https://huggingface.co/mistralai/Pixtral-12B-2409

But loading this model fails with:

CUDA out of memory. Tried to allocate 280.00 MiB. GPU 0 has a total capacity of 44.42 GiB of which 59.38 MiB is free. Including non-PyTorch memory, this process has 44.36 GiB memory in use. Of the allocated memory 43.13 GiB is allocated by PyTorch, and 745.14 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Given the gpu is a l40 with about 46GB RAM I was hoping to be able to load this 12B model -- I usually am able to load much larger (but quantized) models (30 billion easily - sometimes even larger ones).

So far, I have not yet set PYTORCH_CUDA_ALLOC_CONF.

It looks like this model might be loaded twice - could this be true?

dolfim-ibm · 2025-06-08T13:36:46Z

dolfim-ibm
Jun 8, 2025
Maintainer

Interesting findings. Which code are you running? I'm wondering if setting the format_options for both PDF and Image could end up with twice the models. In case we should investigate and fix it.

cc @cau-git

8 replies

dolfim-ibm Jun 9, 2025
Maintainer

oh, if your input is a PNG file, then you should do the opposite: keep IMAGE and remove PDF.

geoHeil Jun 9, 2025
Author

I see - thanks for the clarification- but even in that case as you see above: CUDA out of memory.

geoHeil Jun 10, 2025
Author

It looks like pinning the fp16 #1730 (comment) seems to be useful

geoHeil Jun 10, 2025
Author

But only

vlm_model_specs.PIXTRAL_12B_TRANSFORMERS.torch_dtype = 'float16'

a

vlm_model_specs.PIXTRAL_12B_TRANSFORMERS.torch_dtype = 'auto'

again fails with OOM

geoHeil Jun 10, 2025
Author

I think the weights need to be modified as well - something like

import types
from transformers.image_processing_utils import BaseImageProcessor

def _cast_pixel_values(proc_self, *args, **kwargs):
    out = BaseImageProcessor.__call__(proc_self, *args, **kwargs, return_tensors="pt")
    if "pixel_values" in out:
        # cast to the weight dtype (half or bfloat16)
        target_dtype = proc_self.torch_dtype or torch.float16
        out["pixel_values"] = out["pixel_values"].to(target_dtype)
    return out

# patch every image processor class you care about
from transformers import PixtralProcessor, LlavaProcessor
for _cls in (PixtralProcessor, LlavaProcessor):
    _cls.__call__ = _cast_pixel_values

however, taht is now again on my L40 OOMing for vlm_model_specs.PIXTRAL_12B_TRANSFORMERS - seems to work fine for the other models even without this extra.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

L40 fails to load PIXTRAL_12B_TRANSFORMERS with OOM #1731

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 8 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

L40 fails to load PIXTRAL_12B_TRANSFORMERS with OOM #1731

Uh oh!

Uh oh!

geoHeil Jun 7, 2025

Replies: 1 comment · 8 replies

Uh oh!

dolfim-ibm Jun 8, 2025 Maintainer

Uh oh!

dolfim-ibm Jun 9, 2025 Maintainer

Uh oh!

geoHeil Jun 9, 2025 Author

Uh oh!

geoHeil Jun 10, 2025 Author

Uh oh!

geoHeil Jun 10, 2025 Author

Uh oh!

geoHeil Jun 10, 2025 Author

geoHeil
Jun 7, 2025

Replies: 1 comment 8 replies

dolfim-ibm
Jun 8, 2025
Maintainer

dolfim-ibm Jun 9, 2025
Maintainer

geoHeil Jun 9, 2025
Author

geoHeil Jun 10, 2025
Author

geoHeil Jun 10, 2025
Author

geoHeil Jun 10, 2025
Author