"Fine-tuning MiniCPM-V-2.6 with QLoRA on a single 12GB GPU fails with TypeError: device() received an invalid combination of arguments - got (NoneType)

Hello,
I am trying to fine-tune the openbmb/MiniCPM-V-2_6 model on a custom handwriting dataset (GNHK) using a single NVIDIA RTX 3060 with 12GB of VRAM.
I am running into a TypeError that seems to be caused by a conflict between using QLoRA with a CPU device map and the accelerate library. The fine-tuning script crashes because it cannot handle a model that is not on a GPU device.
Here is the final error log, my configuration, and my environment details.
Final Error Log
code
Code
/home/engineeringpc/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:28: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import packaging  # type: ignore[attr-defined]
2025-11-13 12:52:36.762313: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-11-13 12:52:36.789539: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-11-13 12:52:37.226369: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|████████████████████| 2/2 [00:00<00:00,  2.49it/s]
Currently using LoRA for fine-tuning the MiniCPM-V model.
{'Total': 4676436720, 'Trainable': 682268912}
llm_type=qwen2
Loading data...
/home/engineeringpc/Desktop/OCR_minicpm_v1.2_finetune/MiniCPM-V/finetune/finetune.py:279: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `CPMTrainer.__init__`. Use `processing_class` instead.
  trainer = CPMTrainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 151644, 'pad_token_id': 151643}.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/engineeringpc/Desktop/OCR_minicpm_v1.2_finetune/MiniCPM-V/finetune/finetune.py", line 296, in <module>
[rank0]:     train()
[rank0]:   File "/home/engineeringpc/Desktop/OCR_minicpm_v1.2_finetune/MiniCPM-V/finetune/finetune.py", line 286, in train
[rank0]:     trainer.train()
[rank0]:   File "/home/engineeringpc/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2325, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/home/engineeringpc/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2480, in _inner_training_loop
[rank0]:     model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
[rank0]:   File "/home/engineeringpc/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1559, in prepare
[rank0]:     result = tuple(
[rank0]:   File "/home/engineeringpc/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1560, in <genexpr>
[rank0]:     self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
[rank0]:   File "/home/engineeringpc/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1402, in _prepare_one
[rank0]:     return self.prepare_model(obj, device_placement=device_placement)
[rank0]:   File "/home/engineeringpc/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1789, in prepare_model
[rank0]:     elif torch.device(current_device_index) != self.device:
[rank0]: TypeError: device() received an invalid combination of arguments - got (NoneType), but expected one of:
[rank0]:  * (torch.device device)
[rank0]:       didn't match because some of the arguments have invalid types: (NoneType)
[rank0]:  * (str type, int index = -1)

E1113 12:52:41.988000 326883 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 326942) of binary: /usr/bin/python3
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
finetune/finetune.py FAILED
My start_qlora.sh script:
code
Bash
#!/bin/bash
torchrun --nproc_per_node=1 --master_port=6001 finetune/finetune.py \
    --model_name_or_path model \
    --llm_type qwen2 \
    --data_path train.json \
    --eval_data_path test.json \
    --fp16 true \
    --do_train \
    --do_eval \
    --tune_vision true \
    --tune_llm false \
    --use_lora true \
    --q_lora true \
    --model_max_length 2048 \
    --max_steps 10000 \
    --eval_steps 1000 \
    --output_dir output/gnhk_qlora \
    --logging_dir output/gnhk_qlora/logs \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 3 \
    --learning_rate 1e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 10 \
    --gradient_checkpointing true \
    --report_to "tensorboard"
My finetune.py edits:
To solve an earlier OutOfMemoryError, I was advised to load the model on the CPU first. I edited the AutoModel.from_pretrained call in finetune/finetune.py to be:
code
Python
model = AutoModel.from_pretrained(
        model_args.model_name_or_path,
        trust_remote_code=True,
        torch_dtype=compute_dtype,
        device_map={"":"cpu"},
    )
My Environment (nvidia-smi):
code
Code
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        On  |   00000000:01:00.0 Off |                  N/A |
|  0%   41C    P8             18W /  170W |     324MiB /  12288MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+
It seems the accelerate library cannot handle the model being on the CPU during the prepare step.
Could you please provide the correct configuration or code edits to successfully fine-tune with QLoRA on a single 12GB GPU?
Thank you so much for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"Fine-tuning MiniCPM-V-2.6 with QLoRA on a single 12GB GPU fails with TypeError: device() received an invalid combination of arguments - got (NoneType) #1040

E1113 12:52:41.988000 326883 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 326942) of binary: /usr/bin/python3
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"Fine-tuning MiniCPM-V-2.6 with QLoRA on a single 12GB GPU fails with TypeError: device() received an invalid combination of arguments - got (NoneType) #1040

Description

E1113 12:52:41.988000 326883 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 326942) of binary: /usr/bin/python3 torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

E1113 12:52:41.988000 326883 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 326942) of binary: /usr/bin/python3
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: