Simplifying the Docker Stack #946

bbergeron0 · 2025-08-20T16:41:52Z

bbergeron0
Aug 20, 2025

This is a follow-up to #926 and the suggestion I made there, since @dxqb asked me to open a PR for it.

I’d like to clarify the current status of the Dockerfiles and discuss whether it would be possible to unify them into a single Dockerfile. That way, my PR could improve all the existing use cases in one go.

I can provide a base image for OneTrainer with a leaner, better-documented Dockerfile, including file ownership synchronization and automatic volumes for user-generated data. However, I’m not sure what “Vast” (added in #894) and “RunPod” are, or how they should be integrated into the upcoming changeset. Also, I would suggest removing NVIDIA-UI.dockerfile, since CUDA dependencies are already bundled with the PyTorch & related Linux wheels—so a dedicated image for that use case doesn’t seem necessary.

For context, I’m currently working on Comfy-Org/ComfyUI#9305 to implement Docker support for ComfyUI, which has lots of similarities with this project, so this is effectively a two-birds-one-stone effort.

Best case, we can provide a one-size-fits-all image to avoid duplicate maintenance work.

Thoughts? Can someone fill me in on RunPod and Vast?

O-J1 · 2025-08-20T16:51:04Z

O-J1
Aug 20, 2025
Collaborator

I'll add my points here:

NVIDIA-UI.dockerfile is old and hasnt really been maintained as far as I am aware. Im in favour of removing it and also closing updated NVIDIA-UI.Dockerfile #926 as your solution is cleaner
OneTrainer has support through the Cloud tab for baremetal linux systems as well as Runpod and Vast, which are GPU rental providers.
My docker knowledge is C minus, so I will defer to you.
Take note of Install.sh and how we install etc, want to retain that

@dxqb can provide more insights into the specifics of Runpod and Vast images.

Links for reference:
https://www.runpod.io/
https://vast.ai/

0 replies

dxqb · 2025-08-21T07:06:14Z

dxqb
Aug 21, 2025
Collaborator

I can provide a base image for OneTrainer with a leaner, better-documented Dockerfile, including file ownership synchronization and automatic volumes for user-generated data. However, I’m not sure what “Vast” (added in #894) and “RunPod” are, or how they should be integrated into the upcoming changeset. Also, I would suggest removing NVIDIA-UI.dockerfile, since CUDA dependencies are already bundled with the PyTorch & related Linux wheels—so a dedicated image for that use case doesn’t seem necessary.

Thanks for working on this.
The RunPod and Vast files are for use with RunPod and Vast as cloud providers. For this use case, it is important that our docker images are based on their docker images

OneTrainer/resources/docker/RunPod-NVIDIA-CLI.Dockerfile

Line 6 in daae18e

FROM runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04

because

they contain some functionality that is unique to cloud use, for example having Jypiter installed so you can remotely edit files
if we use their standard docker image, these images of about 10 GB already exist on their clouds and don't have to be downloaded when you start a cloud instance. Only the OneTrainer layer has to be downloaded.

This will probably limit how much you can unify the docker files, but it might still be possible to a degree?

0 replies

bbergeron0 · 2025-08-21T17:12:28Z

bbergeron0
Aug 21, 2025
Author

Thanks for the explanation. I still see a possibility for unification if we can use a build argument for the base image, as long as the OneTrainer layer can be built on top of such an image. The first criteria that come to mind are being Python-enabled and Debian-based. I’ve never worked on multi-target Docker setups, but I know OneTrainer isn’t the first project to cut its teeth on this issue, so I could copy a few practices from others projects.

For now, I’ll focus on updating and cleaning up the local Docker setup, then reducing duplicated configurations between the local, Vast, and RunPod environments. I’ll consider the feasibility of full unification as I go. I'll open a draft PR when my first objectives will be complete.

1 reply

O-J1 Aug 21, 2025
Collaborator

Sounds like a solid plan, thank you again for lending your expertise :)

wooseopkim · 2025-10-10T04:34:53Z

wooseopkim
Oct 10, 2025

Just a quick note: it'd be more awesome if we could build and publish the image to GHCR in a GHA workflow.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplifying the Docker Stack #946

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Simplifying the Docker Stack #946

Uh oh!

bbergeron0 Aug 20, 2025

Replies: 4 comments · 1 reply

Uh oh!

Uh oh!

O-J1 Aug 20, 2025 Collaborator

Uh oh!

dxqb Aug 21, 2025 Collaborator

Uh oh!

bbergeron0 Aug 21, 2025 Author

Uh oh!

O-J1 Aug 21, 2025 Collaborator

Uh oh!

wooseopkim Oct 10, 2025

bbergeron0
Aug 20, 2025

Replies: 4 comments 1 reply

O-J1
Aug 20, 2025
Collaborator

dxqb
Aug 21, 2025
Collaborator

bbergeron0
Aug 21, 2025
Author

O-J1 Aug 21, 2025
Collaborator

wooseopkim
Oct 10, 2025