Skip to content

Support running a Python script directly in CustomTrainer (e.g., python myscript.py) #47

@jskswamy

Description

@jskswamy

What you would like to be added?

I would like the CustomTrainer API in the Kubeflow Trainer SDK to support running a Python script directly by specifying a file path (e.g., python myscript.py), instead of requiring users to provide a Python function.

Proposed API:

CustomTrainer(python_file="run_kubernetes.py", ...)
  • If python_file is provided, the SDK should set the container entrypoint to ["python", "run_kubernetes.py"] (or the specified file).
  • No function serialization, no wrapper scripts, no subprocesses, and no use of runpy inside another Python process.
  • This should be mutually exclusive with the existing func argument.

Why is this needed?

  • Simplicity & Familiarity: Most users have existing training scripts and expect to run them as python myscript.py, just like they do locally or in YAML-based jobs.
  • Avoids Indirection: The current approach requires wrapping code in a function, which is then serialized, deserialized, and run via a generated entrypoint script. This is convoluted, harder to debug, and not how most ML workflows are structured.
  • Better UX: Direct script execution is more transparent, easier to reason about, and matches user expectations from other ML platforms and Kubernetes YAML jobs.
  • Migration Path: This makes it much easier for users to migrate from script-based workflows (e.g., Slurm, bash, or direct Kubernetes Jobs) to Kubeflow Trainer.
  • Cleaner Container Lifecycle: Running the script as the main process ensures correct signal handling, exit codes, and resource cleanup.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions