Skip to content

RL training: Base64 image decode errors (Incorrect padding / Invalid base64-encoded string) #31

@Ryann-Ran

Description

@Ryann-Ran

Hi, thanks for your great work!

I’m trying to reproduce the RL training and I’m consistently seeing errors like:

  • [ERROR code] Failed to decode image 0: Incorrect padding
  • [ERROR code] Failed to decode image 0: Invalid base64-encoded string: number of data characters (336877) cannot be 1 more than a multiple of 4

My main question is whether these errors correspond to the issue described in the README (i.e., caused by network pressure):

During RL training, transmitting a large number of high-resolution images to a single node in a short period may saturate the bandwidth, leading to retransmissions and timeouts.

I tried launching 3 code servers, each with 4 processes, on a single machine with 8× A800 (80GB), but it didn’t work out. I used commands like:

IMAGE=chenshawn6915/multimodal-ipython-sandbox:oss-v2
docker run -p 18901:18901 -p 18902:18902 -p 18903:18903 -p 18904:18904 $IMAGE
docker run -p 28901:18901 -p 28902:18902 -p 28903:18903 -p 28904:18904 $IMAGE
docker run -p 38901:18901 -p 38902:18902 -p 38903:18903 -p 38904:18904 $IMAGE

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions