Skip to content

tests/test_accelerator.py: Random weights test flakes (possibly aarch64 specific) #3792

@xanderlent

Description

@xanderlent

System Info

Using the system `python-accelerate` v1.10.1 package from Fedora Rawhide (currently Fedora 44) on aarch64/ARM64.

Found by Fedora's Koschei CI system periodically rebuilding the package on different architectures, which includes running some tests. (Koji build system task 137140142)

python3-accelerate version was 1.10.1-1.fc44 
python3-torch version was 2.8.0-3.fc44
python3-numpy version was 1:2.3.3-1.fc44

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

Run the test test_save_load_model_use_safetensors from tests/test_accelerator.py. (see also rhbz#2396305 which includes build.log and root.log from the build which show the build/tests and dependency versions, respectively).

Occasionally, at least, the following result appears:

=================================== FAILURES ===================================
____________ AcceleratorTester.test_save_load_model_use_safetensors ____________
a = (<test_accelerator.AcceleratorTester testMethod=test_save_load_model_use_safetensors>,)
kw = {}
    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)
/usr/lib/python3.14/site-packages/parameterized/parameterized.py:620: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <test_accelerator.AcceleratorTester testMethod=test_save_load_model_use_safetensors>
use_safetensors = True, tied_weights = False
    @parameterized.expand([(True, True), (True, False), (False, False)], name_func=parameterized_custom_name_func)
    def test_save_load_model(self, use_safetensors, tied_weights):
        accelerator = Accelerator()
        model, optimizer, scheduler, train_dl, valid_dl = create_components(tied_weights)
        accelerator.prepare(model, optimizer, scheduler, train_dl, valid_dl)
    
        model_signature = get_signature(model)
    
        with tempfile.TemporaryDirectory() as tmpdirname:
            accelerator.save_state(tmpdirname, safe_serialization=use_safetensors)
    
            # make sure random weights don't match
            load_random_weights(model)
>           assert abs(model_signature - get_signature(model)) > 1e-3
E           assert 0.0004030466079711914 > 0.001
E            +  where 0.0004030466079711914 = abs((3.495155453681946 - 3.495558500289917))
E            +    where 3.495558500289917 = get_signature(Linear(in_features=2, out_features=4, bias=True))
tests/test_accelerator.py:275: AssertionError

Expected behavior

The test_save_load_model_use_safetensors should always succeed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions