diff --git a/content/en/events/upcoming-events/gsoc-2026.md b/content/en/events/upcoming-events/gsoc-2026.md index 33cbc0e8e0..cd35949cd6 100644 --- a/content/en/events/upcoming-events/gsoc-2026.md +++ b/content/en/events/upcoming-events/gsoc-2026.md @@ -104,3 +104,35 @@ To participate in GSoC with Kubeflow, you **must** meet the GSoC [eligibility re --- +## Project 3: OptimizationJob CRD for Hyperparameter Optimization + +**Components:** [kubeflow/katib](https://www.github.com/kubeflow/katib), [kubeflow/sdk](https://www.github.com/kubeflow/sdk), [kubeflow/trainer](https://www.github.com/kubeflow/trainer) + +**Mentors:** [@akshaychitneni](https://github.com/akshaychitneni), [@andreyvelich](https://github.com/andreyvelich) + +**Contributor:** + +**Details:** + +Hyperparameter optimization (HPO) is critical for maximizing model performance in machine learning workflows. While Katib currently provides HPO capabilities through the `Experiment` CRD, it was designed for broad use cases including Neural Architecture Search (NAS) and arbitrary workloads. + +This project aims to design and implement a new **OptimizationJob CRD** (`optimizer.kubeflow.org/v1alpha1`) specifically focused on hyperparameter optimization for TrainJobs. The new CRD will provide: + +- **Tighter TrainJob Integration**: Replace unstructured trial specifications with typed TrainJob templates, enabling strong validation +- **Shared Initialization**: Implement a common initializer pattern that runs once and shares model/dataset artifacts across all trials reducing trial startup time and storage costs +- **Simplified API**: Focus exclusively on HPO use cases +- **Modern Metrics Collection**: Support push-based metrics reporting via the Kubeflow SDK +- **SDK Alignment**: Integrate with `OptimizerClient` API from [KEP-46: Hyperparameter Optimization in Kubeflow SDK](https://github.com/kubeflow/sdk/blob/main/docs/proposals/46-hyperparameter-optimization/README.md) + +Tracking issue: [kubeflow/katib#2605](https://github.com/kubeflow/katib/issues/2605) + +**Difficulty:** Hard + +**Size:** 350 hours (Large) + +**Skills Required/Preferred:** +* Go +* Python +* Familiarity with Kubernetes controllers, CRDs +* Basic understanding of machine learning training workflows +* Experience with HPO frameworks \ No newline at end of file