Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions content/en/events/upcoming-events/gsoc-2026.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,35 @@ To participate in GSoC with Kubeflow, you **must** meet the GSoC [eligibility re

---

## Project 3: OptimizationJob CRD for Hyperparameter Optimization

**Components:** [kubeflow/katib](https://www.github.com/kubeflow/katib), [kubeflow/sdk](https://www.github.com/kubeflow/sdk), [kubeflow/trainer](https://www.github.com/kubeflow/trainer)

**Mentors:** [@akshaychitneni](https://github.com/akshaychitneni), [@andreyvelich](https://github.com/andreyvelich)

**Contributor:**

**Details:**

Hyperparameter optimization (HPO) is critical for maximizing model performance in machine learning workflows. While Katib currently provides HPO capabilities through the `Experiment` CRD, it was designed for broad use cases including Neural Architecture Search (NAS) and arbitrary workloads.

This project aims to design and implement a new **OptimizationJob CRD** (`optimizer.kubeflow.org/v1alpha1`) specifically focused on hyperparameter optimization for TrainJobs. The new CRD will provide:

- **Tighter TrainJob Integration**: Replace unstructured trial specifications with typed TrainJob templates, enabling strong validation
- **Shared Initialization**: Implement a common initializer pattern that runs once and shares model/dataset artifacts across all trials reducing trial startup time and storage costs
- **Simplified API**: Focus exclusively on HPO use cases
- **Modern Metrics Collection**: Support push-based metrics reporting via the Kubeflow SDK
- **SDK Alignment**: Integrate with `OptimizerClient` API from [KEP-46: Hyperparameter Optimization in Kubeflow SDK](https://github.com/kubeflow/sdk/blob/main/docs/proposals/46-hyperparameter-optimization/README.md)

Tracking issue: [kubeflow/katib#2605](https://github.com/kubeflow/katib/issues/2605)

**Difficulty:** Hard

**Size:** 350 hours (Large)

**Skills Required/Preferred:**
* Go
* Python
* Familiarity with Kubernetes controllers, CRDs
* Basic understanding of machine learning training workflows
* Experience with HPO frameworks
Loading