System Info
I am using the one-step-off-policy method for multi-machine training, but I encounter the following error during execution:
After investigating, I traced the error to the following code path:
File: ./recipe/one_step_off_policy/distributed_util.py
Line: 61

At this line, the code calls a utility function from vLLM:
File: ./vllm/distributed/utils.py
However, the implementation in vllm/distributed/utils only supports IPv4 addresses. When the training environment uses IPv6, this results in a failure during distributed initialization.
Therefore, I guess the one-step-off-policy distributed training pipeline currently does not support IPv6?