-
Notifications
You must be signed in to change notification settings - Fork 329
Open
Description
Hi, I'm running RL training with a custom environment that's based on the search tool environment. My current logs look something like this:
====== End Trajectory Group ======
tinker_cookbook.utils.misc_utils:20 [INFO] Starting assemble_training_data
tinker_cookbook.utils.misc_utils:23 [INFO] assemble_training_data took 1.07 seconds
tinker_cookbook.utils.misc_utils:20 [INFO] Starting train
The trajectories are sampled, but it's stuck on the "starting train" step, where it's been hours for relatively few trajectories (<100). From what I understand, the optimization step is done on the Tinker servers, but I'm not sure what went wrong or how to debug this. Please let me know what's the best way to fix this! Happy to provide more information.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels