- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I am using groupoffloading for saving gpu memory. I got worse results with a cosine similarity aboud 0.934 on A800, which is unexpected. And I got results with a cosine similarity about 0.78 on 4090, which is worse.
Could anyone give me any suggestions to fix the precision?
Reproduction
apply_group_offloading(
    transformer,
    onload_device=torch.device(f"cuda:{self.local_rank}"),
    offload_device=torch.device("cpu"),
    offload_type="block_level",
    num_blocks_per_group=1,
    non_blocking=True,
    use_stream=True,
)
### Logs
```shell
System Info
I tried diffusers 0.33.1 and 0.34.
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working