Add utility for TrainJob progress reporting

### What you would like to be added?

Add a utility function to the Kubeflow SDK that allows training scripts to report progress and metrics to the Kubeflow Trainer controller. This enables the `trainerStatus` field introduced in [kubeflow/trainer#3227](https://github.com/kubeflow/trainer/pull/3227) and specified in [KEP-2779](https://github.com/kubeflow/trainer/issues/2779).

### Context

The Kubeflow Trainer controller ([PR #3227](https://github.com/kubeflow/trainer/pull/3227)) adds a Progress Plugin that:
1. Injects environment variables into training pods (`KUBEFLOW_TRAINER_STATUS_URL`, `KUBEFLOW_TRAINER_STATUS_TOKEN`, `KUBEFLOW_TRAINER_STATUS_CA_CERT`)
2. Runs a Status Server that accepts progress updates via HTTPS POST
3. Updates `TrainJob.status.trainerStatus` with progress, ETA, and metrics

The SDK needs a simple, safe utility function that training scripts can call to report status.

**Note:** A `KubeflowCallback` implementation has been submitted to HuggingFace Transformers ([PR #44487](https://github.com/huggingface/transformers/pull/44487)) which depends on this SDK utility. The callback auto-activates when running in Kubeflow and can call utility `update_runtime_status()` to report progress.


### Why is this needed?

**Problem:** AI practitioners running training jobs on Kubernetes have no native way to monitor training progress. They must either:
1. Parse container logs manually
2. Set up external tracking systems (MLflow/W&B) which adds infrastructure overhead
3. Wait blindly for jobs to complete

**Solution:** The Kubeflow Trainer controller ([PR #3227](https://github.com/kubeflow/trainer/pull/3227)) adds a Status Server that accepts progress updates from training pods. The SDK needs a client-side utility to POST updates to this server.

**User experience improvement:**

Before (no visibility):
```bash
kubectl get trainjob my-job
# NAME     STATUS    AGE
# my-job   Running   2h   ← Is it 10% done? 90% done? No idea.
```

After (with progress tracking):
```bash
kubectl get trainjob my-job -o jsonpath='{.status.trainerStatus}'
# {"progressPercentage": 67, "estimatedRemainingSeconds": 3600, "metrics": [...]}
```

**Dependency:** A `KubeflowCallback` has been submitted to HuggingFace Transformers ([PR #44487](https://github.com/huggingface/transformers/pull/44487) - In Review) which depends on this SDK utility. The callback auto-activates when running in Kubeflow and can call SDK's utility `update_runtime_status()` to report progress.

## Proposed API

```python
from kubeflow.trainer.progress import update_runtime_status

# Basic usage - SDK handles throttling (max 1 update/5s)
update_runtime_status(
    progress_percent=50,
    estimated_time_remaining=120,  # seconds or timedelta
    metrics={"loss": "0.234", "eval_accuracy": "0.89"}
)

# Force update (bypass throttling) - use for start/end
update_runtime_status(progress_percent=0, force=True)   # Training started
update_runtime_status(progress_percent=100, force=True) # Training complete
```

### Love this feature?

Give it a 👍 We prioritize the features with most 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add utility for TrainJob progress reporting #367

What you would like to be added?

Context

Why is this needed?

Proposed API

Love this feature?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add utility for TrainJob progress reporting #367

Description

What you would like to be added?

Context

Why is this needed?

Proposed API

Love this feature?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions