Skip to content

RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22) #11

@yqwu94

Description

@yqwu94

Hi, I met a cuda runtime error as following:
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22)
Recently, I am studying normalizing flow, such as Glow, however, a strange svd problem has arisen when I try to train Glow from scratch. In my opinion, due to Glow contains “tensor.slogdet()” operation in affine coupling layer, it may involve SVD decomposition, and thus casue above problem.
Specifically, I first use a small learning rate, such as 1e-6, the training loss begins to fall slowly. However, when the learning rate reaches 0.0004, the training loss has a sudden rise (inf) and the error information is presented as above.
How can I avoid this error during training process of Glow?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions