Skip to content

Error loading optimizer state with torch.load weights_only=True default in PyTorch 2.6 #3539

@luiz0992

Description

@luiz0992

System Info

- `Accelerate` version: 1.6.0
- Platform: Linux-5.10.0-34-cloud-amd64-x86_64-with-glibc2.31
- `accelerate` bash location: /xxx/training/.venv/bin/accelerate
- Python version: 3.12.4
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch SDAA available: False
- PyTorch MUSA available: False
- System RAM: 1842.60 GB
- GPU type: NVIDIA H100 80GB HBM3
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI-GPU
        - mixed_precision: bf16
        - use_cpu: False
        - debug: False
        - num_processes: 8
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: all
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: False
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

Error Message:

  [rank7]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. 
  [rank7]:        (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will 
  likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
  [rank7]:        (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
  [rank7]:        WeightsUnpickler error: Unsupported global: GLOBAL omegaconf.listconfig.ListConfig was not an allowed global by default. Please use `torch.serialization.add_safe_globals([ListConfig])` or
   the `torch.serialization.safe_globals([ListConfig])` context manager to allowlist this global if you trust this class/function.

Reproduce:

  1. Save a state using Accelerator.save_state() with an optimizer that includes custom objects in its state (like omegaconf.listconfig.ListConfig)
  2. Try to load the state using Accelerator.load_state() on PyTorch 2.6+

Current behavior:

When calling load_state(), it fails with an unpickling error because torch.load() now uses weights_only=True by default in PyTorch 2.6, which restricts loading certain custom objects.

Expected behavior

The load_state() method should provide a way to pass custom parameters to the underlying torch.load() calls, specifically to set weights_only=False or to add safe globals when needed.

Proposed solution

Add parameters to the load_state() method that allow passing keyword arguments to torch.load() for each component:

  • optimizer_load_kwargs: Pass to optimizer's load function
  • scheduler_load_kwargs: Pass to scheduler's load function
  • dataloader_load_kwargs: Pass to dataloader's load function

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions