Skip to content
This repository was archived by the owner on Apr 6, 2025. It is now read-only.

Model Seemingly Refuses to Learn? #17

@bryandam

Description

@bryandam

I've been trying for the better part of a week but I can't get the model to train properly and I'm starting to get skeptical that this code, as is, produced the model that Sean Vasquez included. My last run maxed out at 9100 steps but at no point did either of the training or validation losses trend downwards:
image

I got a checkpoint at 7580 but it's output is pure garbage unless you're really into abstract art:
image

Anyone have any luck getting this thing to train? I'm going to try and play with the training parameters a bit, see if some magic starts happening above 10k steps but based on the data above it's just not trending at all so I'm not confident that twice as much would make the difference there. Does anyone know if we can extract the training parameters from Sean's included model? Am I just being impatient, is this a hit-or-miss kind of thing where I just have to keep trying and wait until I get a lucky run?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions