Skip to content

Conversation

harshad16
Copy link
Member

@harshad16 harshad16 commented Feb 28, 2025

https://issues.redhat.com/browse/RHOAIENG-4148

Description

Re-adjust the statefulset, route, service based on generateName

The setup on Opendatahub allows spin-up of workbenches, via

  • Data Science Project
  • Jupyter Tile (application>enabled)

Way of naming in the above methods:

  • Data Science Project: users choose to name it
  • Jupyter Tile: jupyter-nb-<encoded-username>

Length issue on Jupyter Tile:

  • Depending on the username, the dashboard encode special characters as not special chars are accepted in Kubernetes.
    • This can cause the length of the username to change with few added chars.
  • Dashboard adds jupyter-nb- that is 11 len, here.

Explanation:

In ODH username above 36 chars were failing , fixing the Route name to generated using Kubernets generateName method, we can fix the route sub-domain issue.

Upon further exploring found that labels have length constraint of 63 chars.

The controllerRevision, adds a hash that is 8-10Chars, which adds 11 chars to a label of statefulset.
Given a workbench in jupyter-tile, that allows:
username: 63- (len of prefix jupyter-nb i.e 11) - (len of hash i.e 11) = 41

In ODH username between 36-41 chars were failing in statefulset creation stage , fixing the Statefulset name to generated using Kubernets generateName method, we can fix the controller-revision-hash label.

With this, username can be up to 52 chars.
as there are other labels, which would fail the 63 chars limit (app, notebook-name labels)

End Result:
Old username limit: max 36
New username limit: max 52

Related-to:
#179
https://issues.redhat.com/browse/RHOAIENG-4148

How Has This Been Tested?

TBD

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link

openshift-ci bot commented Feb 28, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@codecov-commenter
Copy link

codecov-commenter commented Feb 28, 2025

Codecov Report

Attention: Patch coverage is 51.72414% with 70 lines in your changes missing coverage. Please review.

Project coverage is 55.28%. Comparing base (690694f) to head (c156792).

Files with missing lines Patch % Lines
...-notebook-controller/controllers/notebook_route.go 42.55% 49 Missing and 5 partials ⚠️
...book-controller/controllers/notebook_controller.go 53.33% 11 Missing and 3 partials ⚠️
...-notebook-controller/controllers/notebook_oauth.go 90.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #539      +/-   ##
==========================================
- Coverage   56.47%   55.28%   -1.20%     
==========================================
  Files          10       10              
  Lines        2686     2780      +94     
==========================================
+ Hits         1517     1537      +20     
- Misses       1054     1120      +66     
- Partials      115      123       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@openshift-ci openshift-ci bot added size/l and removed size/l labels Feb 28, 2025
@harshad16 harshad16 force-pushed the long-route-rhoai-4148 branch from 04facee to 32f2412 Compare March 13, 2025 13:09
@openshift-ci openshift-ci bot added size/l and removed size/l labels Mar 13, 2025
@harshad16 harshad16 marked this pull request as ready for review March 13, 2025 13:09
@openshift-ci openshift-ci bot requested review from andyatmiami and jstourac March 13, 2025 13:10
@harshad16 harshad16 force-pushed the long-route-rhoai-4148 branch from 32f2412 to fe7b9cb Compare March 17, 2025 06:17
@openshift-ci openshift-ci bot added size/l and removed size/l labels Mar 17, 2025
@jiridanek
Copy link
Member

jiridanek commented Mar 17, 2025

This is good work, but I worry it's not really all that much helpful to fix https://issues.redhat.com/browse/RHOAIENG-4148

For odh/kubeflow, the thing to help with the customer issue, would be to reject Notebook CR creation immediately, instead of getting stuck creating route, or whatever, later.

The largest weight lays on Dashboard which simply cannot put the username into the Notebook CR name, because the usernames can be long.

It is a good change, but it will not resolve RHOAIENG-4148 alone.

Also, it is getting us towards

which is probably a good direction to move towards.

@harshad16 harshad16 force-pushed the long-route-rhoai-4148 branch from fe7b9cb to c788a91 Compare March 26, 2025 17:32
@openshift-ci openshift-ci bot added size/l and removed size/l labels Mar 26, 2025
@jiridanek
Copy link
Member

Ftr, there's a new feature in rhods operator to let admin change the Jupyterlab namespace, https://issues.redhat.com/browse/RHOAIENG-22096

Default still is

 ${DEFAULT_WORKBENCHES_NAMESPACE}=    rhods-notebooks

@harshad16
Copy link
Member Author

Ftr, there's a new feature in rhods operator to let admin change the Jupyterlab namespace, https://issues.redhat.com/browse/RHOAIENG-22096

Default still is

 ${DEFAULT_WORKBENCHES_NAMESPACE}=    rhods-notebooks

Great point, noticing that, I was trying to change the logic for making it more generic 👍

@harshad16 harshad16 changed the title WIP: RHOAI-4148: Re-adjust the statefulset, route, service based on generateName RHOAI-4148: Re-adjust the statefulset, route, service based on generateName Apr 1, 2025
@jiridanek
Copy link
Member

/lgtm

I've been thinking about this and while I still believe what I said/thought before (the change does not really fix customer issue, change will move our logic further from upstream kubeflow notebooks, the branching logic for the route takes time to understand), I don't see anything wrong with this.

In the interest of making further progress, I think we should merge this. It will fix some scenarios, upstream kubeflow is not really making any changes in the 1.x notebook controller codebase, and the logic is not too bad and worst case chatgpt can help explain if I ever stumble with it.

Ship it!

@harshad16 harshad16 force-pushed the long-route-rhoai-4148 branch from c788a91 to c156792 Compare April 24, 2025 12:48
@openshift-ci openshift-ci bot removed the lgtm label Apr 24, 2025
Copy link

openshift-ci bot commented Apr 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jiridanek. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added size/l and removed size/l labels Apr 24, 2025
@jstourac
Copy link
Member

Do we want to do something with the failing CI code static analysis check?

prefix=components/odh-notebook-controller --timeout=5m] in [/home/runner/work/kubeflow/kubeflow/components/odh-notebook-controller] ...
  Error: components/odh-notebook-controller/controllers/notebook_route.go:106:21: ST1023: should omit type bool from declaration; it will be inferred from the right-hand side (staticcheck)
  	var isGenerateName bool = false
  	                   ^
  Error: components/odh-notebook-controller/controllers/notebook_route.go:236:5: QF1008: could remove embedded field "ObjectMeta" from selector (staticcheck)
  	sa.ObjectMeta.Annotations["serviceaccounts.openshift.io/oauth-redirectreference.second"] = "" +
  	   ^
  2 issues:
  * staticcheck: 2

AFAIK it's not failing in main?

}
if isGenerateName {
ssObjectMeta = metav1.ObjectMeta{
GenerateName: "nb-",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this look like in the real world? isn't it too generic? I mean - would it make sense to have at least some portion of the actual original instance.Name here so it's at least somehow traceable to the original user via oc get ... command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this look like in the real world? isn't it too generic?

As this change would only happen if isGenerateName is set, which depends on notebook name to be larger than 52 char. In ODH scenario, it is only possible via jupyter tile , as DS project has slices notebook name above 30 char.
In jupyter tile, user dont get to set the notebook name on the UI, so on console it would show up as nb-
user can query them via cli using:
oc get notebook --all-namespaces -l opendatahub.io/user={username}
or
oc get statefulset --all-namespaces -l statefulset={notebook.name}
`

we could have some part of instance.name, i wasn't sure, what slice should it be prefix or suffix end of instance.name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. My intention here was to make it easier to distinguish also on the pods level - when one lists the pods from rhods-notebooks namespace (Jupyter tile), then with this change it will not be possible to match relevant pods based on their name to the users anymore at all.

I don't insist. The solution using the label may be good enough. I was just trying to be sure that introducing this change we're not making this a bit more complex for eventual customer automation they could have.

But yeah, I don't really insist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, i do feel inline with you.
With this case being exclusive to jupyter tiles, keeping the name sequence could be a way. Perhaps we can do this in next iterations.

@daniellutz
Copy link

I understand and agree with @jiridanek's comments.

Since some of these are mostly due to OCP's own limitations and requirements, this PR will be a great fix to these issues already. Tests are present, only some linter checks appearing on CI, but everything looks good!

/lgtm

- ST1023: should omit type bool from declaration; it will be inferred from the right-hand side (staticcheck)
- QF1008: could remove embedded field "ObjectMeta" from selector (staticcheck)

Signed-off-by: Harshad Reddy Nalla <[email protected]>
@openshift-ci openshift-ci bot added size/l and removed size/l labels May 1, 2025
@jstourac
Copy link
Member

/lgtm

@harshad16
Copy link
Member Author

/retest-required

@harshad16
Copy link
Member Author

Thank you for all the reviews
getting this in.

@harshad16 harshad16 merged commit 3408b8c into opendatahub-io:main May 19, 2025
19 of 22 checks passed
@openshift-ci openshift-ci bot added size/l and removed size/l labels Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants