Skip to content

Conversation

@jakenotjay
Copy link
Contributor

  • Default config values for ngpus, scheduler_ngpus and worker_ngpus are now converted to integers before being checked to be greater than the value of zero.
  • Beforehand, these values were assumed to be integers, causing a type error for all instantiations of GCPCluster, creating a breaking change
  • I would suggest a new release of dask cloud provider (after tests) as this currently breaks the GCPCluster

… now converted to integers before being checked to be greater than the value of zero. Beforehand, these values were assumed to be integers, causing a type error for all instantiations of GCPCluster, creating a breaking change
@jakenotjay
Copy link
Contributor Author

@jacobtomlinson apologies for tagging, but this one is a bit more urgent with the fact that GCP cluster is currently broken when no value is passed for n_gpus

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the effort you've been putting in around here lately, much appreciated! As far as I can see there are no issues reporting this problem, so I expect this bug is not widespread, so don't worry about the urgency too much. If it's blocking you personally then at least you now have a fix 😄.

In future if you could open an issue describing the problem, with a code reproducer and a full traceback it would be much appreciated! It makes reviewing PRs much easier when I have a good understanding of the bug being fixed.

I'm guessing the change you're referring to is #451. I agree there is a bug in that PR where the default values have been set to "" in the config, when instead they should be null (YAML equivalent of None). It looks like the bug predates that PR, but we never ended up going down a branch that triggered it, with the added complexity of functionality it's now possible for one of these default values to remain set when configuring GPUs. We might want to think about better type checking in the Dask config system more broadly.

I think we need to do a couple more things to get this PR merged:

  1. Add a test that highlights triggers the bug, so that we can be confident we have fixed it and safeguard us from future regressions
  2. Update the default config values from "" to null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants