Skip to content

[Core feature] Allow setting metadata (labels&annotations) individually on all K8s task types #6238

@fg91

Description

@fg91

Motivation: Why do you think this is important?

Today, as a Flyte user, I have the following options to set labels/annotations on the pods/CRD objects of K8s tasks in a flyte workflow execution:

  1. Set via pod template:

    def task(
        pod_template=PodTemplate(annotations=..., labels=...)
    )

    This sets labels/annotations on the pods of individual tasks.
    For distributed tasks (like pytorch, ray, ...) this sets the metadata not on the CRD object but its pod template spec.

  2. Set via pyflyte run --labels ... --annotations ...

    This applies the metadata to all K8s objects in a flyte workflow execution, including task pods and task CRD objects. However, this mechanism doesn't work on an individual task level.

As a Flyte user, I would like to be able to specify specific labels/annotations for individual k8s task CRD objects like pytorch jobs, ray job, ... (the same way I already can today for pods via the pod template):

@task(
    task_config=PyTorch(
        num_workers=...,
        ...
        # Proposed addition:
        metadata=ObjectMeta(
            annotations={"kueue.x-k8s.io/queue-name": "queue-name"},
            labels={...}
        )
    )
)

I propose to use the same syntax/flyteidl type for all K8s (non-pod) plugins like Elastic, TfJob, MpiJob, RayJobConfig, ...


In my concrete case, I would like to have this feature in order to leverage Kueue to gang schedule worker pods for distributed pytorch training tasks (e.g. as documented here).
This requires setting a queue name annotation on the underlying PytorchJob CRD object.

There have been previous asks from the community to enable such a feature/integration:

  • Attempts to integrate Yunikorn and Kueue more deeply into flytepropeller which weren't accepted though.

    In contrast, the feature I propose allows users to choose to use Kueue while it isn't a Flyte-Kueue integration. Instead it is a very general feature that could be used for any other integration as well that makes use of annotations/labels to select workloads.

  • There have been discussions in Slack about using Kueue, suggesting to use e.g. pyflyte run --labels/--annotations to set the required metadata. However, this is not good enough because this applies the metadata to all nodes in the graph while you might want to do queueing/gang scheduling only for a subset.

Describe alternatives you've considered

Add task kwargs for labels and annotations:

@task(
    # If we added these args ...
    labels={...},
    annotations={...),
    # ... for simple python function tasks this would conflict with this existing arg:
    pod_template=PodTemplate(annotations=...)
)

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions