Skip to content

Commit 3b058f2

Browse files
akshaychitneniAkshay Chitneni
andauthored
gsoc: OptimizationJob CRD for Hyperparameter Optimization (#4291)
Signed-off-by: Akshay Chitneni <[email protected]> Co-authored-by: Akshay Chitneni <[email protected]>
1 parent 13f057d commit 3b058f2

File tree

1 file changed

+32
-0
lines changed

1 file changed

+32
-0
lines changed

content/en/events/upcoming-events/gsoc-2026.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,3 +131,35 @@ The GSoC contributor is building the bedrock layer that these future innovations
131131

132132
---
133133

134+
## Project 3: OptimizationJob CRD for Hyperparameter Optimization
135+
136+
**Components:** [kubeflow/katib](https://www.github.com/kubeflow/katib), [kubeflow/sdk](https://www.github.com/kubeflow/sdk), [kubeflow/trainer](https://www.github.com/kubeflow/trainer)
137+
138+
**Mentors:** [@akshaychitneni](https://github.com/akshaychitneni), [@andreyvelich](https://github.com/andreyvelich)
139+
140+
**Contributor:**
141+
142+
**Details:**
143+
144+
Hyperparameter optimization (HPO) is critical for maximizing model performance in machine learning workflows. While Katib currently provides HPO capabilities through the `Experiment` CRD, it was designed for broad use cases including Neural Architecture Search (NAS) and arbitrary workloads.
145+
146+
This project aims to design and implement a new **OptimizationJob CRD** (`optimizer.kubeflow.org/v1alpha1`) specifically focused on hyperparameter optimization for TrainJobs. The new CRD will provide:
147+
148+
- **Tighter TrainJob Integration**: Replace unstructured trial specifications with typed TrainJob templates, enabling strong validation
149+
- **Shared Initialization**: Implement a common initializer pattern that runs once and shares model/dataset artifacts across all trials reducing trial startup time and storage costs
150+
- **Simplified API**: Focus exclusively on HPO use cases
151+
- **Modern Metrics Collection**: Support push-based metrics reporting via the Kubeflow SDK
152+
- **SDK Alignment**: Integrate with `OptimizerClient` API from [KEP-46: Hyperparameter Optimization in Kubeflow SDK](https://github.com/kubeflow/sdk/blob/main/docs/proposals/46-hyperparameter-optimization/README.md)
153+
154+
Tracking issue: [kubeflow/katib#2605](https://github.com/kubeflow/katib/issues/2605)
155+
156+
**Difficulty:** Hard
157+
158+
**Size:** 350 hours (Large)
159+
160+
**Skills Required/Preferred:**
161+
* Go
162+
* Python
163+
* Familiarity with Kubernetes controllers, CRDs
164+
* Basic understanding of machine learning training workflows
165+
* Experience with HPO frameworks

0 commit comments

Comments
 (0)