Skip to content

Gradient accumulation calcluation may be incorrect #20350

@tyler-rt

Description

@tyler-rt

Bug description

See https://unsloth.ai/blog/gradient for in depth explanation, and this PR for how huggingface fixed it.

I verified that I see worse performance with gradient accumulation than multiple devices, so I suspect this bug also applies to Lightning

What version are you seeing the problem on?

v2.4

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.4.0):
#- PyTorch Version (e.g., 2.4):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):

More info

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions