-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Bug Description
Please provide a detailed description of the issue you encountered.
Environment Information
- Operating System: [e.g., Ubuntu 20.04]
- Python Version: [e.g., 3.10.12]
- GPU: [e.g., NVIDIA A100-80G]
- CUDA Version: [e.g., 12.4]
- Installation Method: [e.g., pypi, source]
- Trinity-RFT Version: [e.g., 0.2.1]
- Other relevant dependencies or configurations you think might be helpful
Actual Behavior
(pid=101782) INFO 10-30 11:10:30 [importing.py:53] Triton module has been replaced with a placeholder. [repeated 31x across cluster]
(pid=101782) INFO 10-30 11:10:30 [init.py:239] Automatically detected platform cuda. [repeated 31x across cluster]
(Explorer pid=89687) INFO 10-30 11:10:38 [explorer.py:337] Log metrics of step 0
(Trainer pid=89688) INFO 10-30 11:10:38 [trainer.py:152] Trainer synchronizing weights at step 0 starting..
(WorkflowRunner pid=101780) Using blocking ray.get inside async actor. This blocks the event loop. Please use await on object ref with asyncio.gather if you want to yield execution to the event loop instead.
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s] [repeated 3x across cluster]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00, 1.42it/s] [repeated 11x across cluster]
(vLLMRolloutModel pid=91482) 2025-10-30 11:09:56,729 INFO worker.py:1694 -- Connecting to existing Ray cluster at address: 10.166.70.58:6379... [repeated 3x across cluster]
(vLLMRolloutModel pid=91476) 2025-10-30 11:09:56,790 INFO worker.py:1879 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265 [repeated 3x across cluster]
(Synchronizer pid=90977) ERROR 10-30 11:30:38 [synchronizer.py:302] Explorer is not ready for model weight sync.
(Trainer pid=89688) ERROR 10-30 11:30:38 [trainer.py:160] Trainer synchronizing weights failed.
(Trainer pid=89688) INFO 10-30 11:30:39 [trainer.py:170] Trainer synchronizing weights at step 0 end.