Skip to content

Have you faced the bp incorrect in training #4

@jerrywind

Description

@jerrywind

Hi all,
I recently take some efforts on training paddleOCR-VL-0.9B , and I found a wired thing ,the gradiants passed from a transform layer to the previous one will lost part of gradiants (Maybe due to flash mask tech) with transformers family.

How to resolve it ? Do you have any idea ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions