Skip to content

Bump to helm v4#7886

Open
yuvipanda wants to merge 5 commits into2i2c-org:mainfrom
yuvipanda:helm-v4
Open

Bump to helm v4#7886
yuvipanda wants to merge 5 commits into2i2c-org:mainfrom
yuvipanda:helm-v4

Conversation

@yuvipanda
Copy link
Member

Tested it locally, and seems fine. Also bumps kubectl version.

Remove a lot of superfluous and drifted-to-no-longer-be-accurate comments.

Once this is merged, we would need to make sure everyone upgrades their local install of helm to v4 as well

Ref #7470

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

Merging this PR will trigger the following deployment actions.

Support deployments

No support upgrades will be triggered

Staging deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp 2i2c staging Core infrastructure has been modified
gcp 2i2c dask-staging Core infrastructure has been modified
aws smithsonian staging Core infrastructure has been modified
aws 2i2c-aws-us staging Core infrastructure has been modified
aws nasa-veda staging Core infrastructure has been modified
aws disasters staging Core infrastructure has been modified
aws opensci staging Core infrastructure has been modified
gcp hhmi staging Core infrastructure has been modified
aws maap staging Core infrastructure has been modified
gcp awi-ciroh staging Core infrastructure has been modified
aws jupyter-health staging Core infrastructure has been modified
aws strudel staging Core infrastructure has been modified
gcp cloudbank staging Core infrastructure has been modified
aws nmfs-openscapes staging Core infrastructure has been modified
aws projectpythia staging Core infrastructure has been modified
kubeconfig 2i2c-jetstream2 staging Core infrastructure has been modified
aws reflective staging Core infrastructure has been modified
aws temple staging Core infrastructure has been modified
gcp 2i2c-uk staging Core infrastructure has been modified
aws berkeley-geojupyter staging Core infrastructure has been modified
kubeconfig utoronto staging Core infrastructure has been modified
kubeconfig utoronto r-staging Core infrastructure has been modified
aws bnext-bio staging Core infrastructure has been modified
gcp leap staging Core infrastructure has been modified
aws victor staging Core infrastructure has been modified
aws ucmerced staging Core infrastructure has been modified
aws nasa-cryo staging Core infrastructure has been modified
aws nasa-ghg-hub staging Core infrastructure has been modified
aws openscapeshub staging Core infrastructure has been modified
aws earthscope staging Core infrastructure has been modified
aws aimatx-2i2c-hub staging Core infrastructure has been modified

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp 2i2c mtu Core infrastructure has been modified
aws smithsonian prod Core infrastructure has been modified
aws 2i2c-aws-us showcase Core infrastructure has been modified
aws nasa-veda prod Core infrastructure has been modified
aws nasa-veda binder Core infrastructure has been modified
aws disasters prod Core infrastructure has been modified
aws opensci sciencecore Core infrastructure has been modified
aws opensci climaterisk Core infrastructure has been modified
aws opensci small-binder Core infrastructure has been modified
aws opensci big-binder Core infrastructure has been modified
gcp hhmi spyglass Core infrastructure has been modified
gcp hhmi binder Core infrastructure has been modified
aws maap prod Core infrastructure has been modified
gcp awi-ciroh prod Core infrastructure has been modified
aws jupyter-health prod Core infrastructure has been modified
aws strudel prod Core infrastructure has been modified
aws strudel workshop Core infrastructure has been modified
gcp cloudbank ahs Core infrastructure has been modified
gcp cloudbank authoring Core infrastructure has been modified
gcp cloudbank bcc Core infrastructure has been modified
gcp cloudbank bmcc Core infrastructure has been modified
gcp cloudbank chaffey Core infrastructure has been modified
gcp cloudbank ccsf Core infrastructure has been modified
gcp cloudbank chabot Core infrastructure has been modified
gcp cloudbank chicagostate Core infrastructure has been modified
gcp cloudbank cmu Core infrastructure has been modified
gcp cloudbank cra Core infrastructure has been modified
gcp cloudbank csm Core infrastructure has been modified
gcp cloudbank csum Core infrastructure has been modified
gcp cloudbank deanza Core infrastructure has been modified
gcp cloudbank demo Core infrastructure has been modified
gcp cloudbank dvc Core infrastructure has been modified
gcp cloudbank elac Core infrastructure has been modified
gcp cloudbank elcamino Core infrastructure has been modified
gcp cloudbank evc Core infrastructure has been modified
gcp cloudbank etsu Core infrastructure has been modified
gcp cloudbank fresno Core infrastructure has been modified
gcp cloudbank foothill Core infrastructure has been modified
gcp cloudbank glendale Core infrastructure has been modified
gcp cloudbank golden Core infrastructure has been modified
gcp cloudbank gwu Core infrastructure has been modified
gcp cloudbank gpu-demo Core infrastructure has been modified
gcp cloudbank high Core infrastructure has been modified
gcp cloudbank hmc Core infrastructure has been modified
gcp cloudbank humboldt Core infrastructure has been modified
gcp cloudbank kean Core infrastructure has been modified
gcp cloudbank lacc Core infrastructure has been modified
gcp cloudbank lahc Core infrastructure has been modified
gcp cloudbank laney Core infrastructure has been modified
gcp cloudbank lavc Core infrastructure has been modified
gcp cloudbank lbcc Core infrastructure has been modified
gcp cloudbank mendocino Core infrastructure has been modified
gcp cloudbank merced Core infrastructure has been modified
gcp cloudbank merritt Core infrastructure has been modified
gcp cloudbank mmc Core infrastructure has been modified
gcp cloudbank miracosta Core infrastructure has been modified
gcp cloudbank mission Core infrastructure has been modified
gcp cloudbank moreno Core infrastructure has been modified
gcp cloudbank norco Core infrastructure has been modified
gcp cloudbank ocu Core infrastructure has been modified
gcp cloudbank palomar Core infrastructure has been modified
gcp cloudbank pasadena Core infrastructure has been modified
gcp cloudbank redwoods Core infrastructure has been modified
gcp cloudbank reedley Core infrastructure has been modified
gcp cloudbank riohondo Core infrastructure has been modified
gcp cloudbank saddleback Core infrastructure has been modified
gcp cloudbank santiago Core infrastructure has been modified
gcp cloudbank sbcc Core infrastructure has been modified
gcp cloudbank sbcc-dev Core infrastructure has been modified
gcp cloudbank sierra Core infrastructure has been modified
gcp cloudbank sjcc Core infrastructure has been modified
gcp cloudbank sjsu Core infrastructure has been modified
gcp cloudbank skyline Core infrastructure has been modified
gcp cloudbank sou Core infrastructure has been modified
gcp cloudbank stanford Core infrastructure has been modified
gcp cloudbank spelman Core infrastructure has been modified
gcp cloudbank srjc Core infrastructure has been modified
gcp cloudbank tuskegee Core infrastructure has been modified
gcp cloudbank ucsc Core infrastructure has been modified
gcp cloudbank uchicago Core infrastructure has been modified
gcp cloudbank umd Core infrastructure has been modified
gcp cloudbank und Core infrastructure has been modified
gcp cloudbank virginia Core infrastructure has been modified
gcp cloudbank wlac Core infrastructure has been modified
aws nmfs-openscapes prod Core infrastructure has been modified
aws nmfs-openscapes workshop Core infrastructure has been modified
aws nmfs-openscapes noaa-only Core infrastructure has been modified
aws projectpythia prod Core infrastructure has been modified
aws projectpythia pythia-binder Core infrastructure has been modified
aws reflective prod Core infrastructure has been modified
aws reflective workshop Core infrastructure has been modified
aws temple prod Core infrastructure has been modified
aws temple advanced Core infrastructure has been modified
aws temple research Core infrastructure has been modified
gcp 2i2c-uk lis Core infrastructure has been modified
aws berkeley-geojupyter prod Core infrastructure has been modified
kubeconfig utoronto prod Core infrastructure has been modified
kubeconfig utoronto r-prod Core infrastructure has been modified
kubeconfig utoronto highmem Core infrastructure has been modified
kubeconfig projectpythia-binder binderhub Core infrastructure has been modified
aws bnext-bio prod Core infrastructure has been modified
gcp leap prod Core infrastructure has been modified
gcp leap public Core infrastructure has been modified
aws victor prod Core infrastructure has been modified
aws ucmerced prod Core infrastructure has been modified
aws nasa-cryo prod Core infrastructure has been modified
aws nasa-ghg-hub prod Core infrastructure has been modified
aws nasa-ghg-hub binder Core infrastructure has been modified
aws openscapeshub prod Core infrastructure has been modified
aws openscapeshub workshop Core infrastructure has been modified
gcp dubois ephemeral Core infrastructure has been modified
aws earthscope prod Core infrastructure has been modified
aws earthscope binder Core infrastructure has been modified
aws aimatx-2i2c-hub prod Core infrastructure has been modified

@agoose77
Copy link
Contributor

agoose77 commented Mar 10, 2026

FYI we've had two outages recently due to regressions in Helm null handling.

I noticed it first with Project Pythia (incident report), and @GeorgianaElena noticed it in Earthscope

Although merging that PR has meant that earthscope no longer has any null problems, we still see regressions for Project Pythia:

function test-helm() {
  local version="${1:?need arg}"
  test -d outputs && rm -r outputs;         
  mkdir outputs;   
  echo "Testing $version"
  podman run --rm -it -v $PWD/outputs:/outputs -v $PWD:/app -w /app "alpine/helm:$version" template helm-charts/basehub --values=config/clusters/projectpythia-binder/binderhub.values.yaml --output-dir /outputs >/dev/null; 
  grep  'hub\.jupyter\.org/node-purpose' outputs/**/*.yaml
}
versions=("3.17.0" "3.17.1" "3.20.0" "4.1.1")

for version in "${versions[@]}"; do
  test-helm "$version"
  printf "\n\n"
done

I dug into this, and it's caused by changes to sibling merging (of values set in a child chart). Here's a reproducer: https://github.com/agoose77/reproducer-helm-merge-changes

I've opened a bug report here: helm/helm#31919 Anecdotally, it feels like Helm has been grappling with these bugs for a while.


So what do we do?

It seems like this is only a bug in this particular merging scenario. We could either update the binderhub chart to filter out these nulls, or opt to set the value only once via e.g. jsonnet.

Alternatively, we wait for Helm to fix this ... but that will likely take quite a while. I'm happy to dig in and figure out the fix for Helm, but I'm not sure if that's the best use of our time.

@yuvipanda
Copy link
Member Author

@agoose77 would upgrading to helm cause this regression to come back? Or is the concern that there will be other bugs that we run into?

@agoose77
Copy link
Contributor

I think merging via helm 4 would break at least one cluster atm given the open bug report.

@yuvipanda
Copy link
Member Author

@agoose77 I looked into that, and i agree. I also think we can't fix this with extraConfig because we also need to set this for the core pods themselves, which can't be done with extraConfig. Is that right?

@GeorgianaElena
Copy link
Member

@agoose77, I believe the only places where this breaks is when when we set the selector to null here, right?

dockerApi:
nodeSelector:
hub.jupyter.org/node-purpose:

As a workaround, until helm fixes this, why not stop setting this node purpose selector in the basehub

nodeSelector:
hub.jupyter.org/node-purpose: user

And instead set it in each hub's config?

I know it's extra work, but we can write a script that does it and shouldn't take us much time.

@agoose77
Copy link
Contributor

agoose77 commented Mar 18, 2026

@GeorgianaElena I'm a bit nervous because it might happen to other hubs. We'd need to validate that each hub is currently not broken by the upgrade. Future hub changes could also trigger these bugs, but we'd catch them in production equally as in local development. (Not ideal to know about a Helm bug that we might step on at any time, but manageable).

I haven't taken the view "we're going to do this, how do we do it safely" — let me put that hat on now.

Tested it locally, and seems fine. Also bumps kubectl version.

Remove a lot of superfluous and drifted-to-no-longer-be-accurate
comments.

Once this is merged, we would need to make sure everyone upgrades
their local install of helm to v4 as well

Ref 2i2c-org#7470
@yuvipanda
Copy link
Member Author

I see this as basically us having found an issue in helm, and @agoose77 opening the issue upstream clearly helped - there's a fix coming in helm/helm#31946. We could just wait for that to land, and then deploy it.

And yes, this is something we have to do, and sooner than later :)

@yuvipanda
Copy link
Member Author

Ok, I'm not sure we can wait for helm/helm#31946 - it looks LLM generated, and from someone who also opened a few hundred PRs in a few hundred other repos recently, and is their first PR in helm. I do believe it'll eventually get fixed in helm, but that PR may not be it.

Another option for us to consider is to move it to basehub/values.jsonnet, and selectively apply it based on the name of the cluster. Normally I'd not suggest doing that, but given this is a time limited regression, I'm ok with that. And we'd need to add a piece of doc to any jetstream cluster that it needs an exception written in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants