Skip to content

Conversation

@AdheipSingh
Copy link

@AdheipSingh AdheipSingh commented Jan 7, 2026

Description

Summary of Working Validation:

  1. CREATE: Child queues cannot exceed parent quotas
  2. UPDATE: Warnings when parent quota is reduced below children
  3. DELETE: Parent queues cannot be deleted if they have children

Related Issues

Fixes #81

Checklist

Note: Ensure your PR title follows the Conventional Commits format (e.g., feat(scheduler): add new feature)

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated documentation (if needed)

Breaking Changes

Additional Notes

Screenshot 2026-01-08 at 3 37 49 AM
  1. test-parent-queue.yaml
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
  name: parent-queue
spec:
  resources:
    cpu:
      quota: 2000
    gpu:
      quota: 2
    memory:
      quota: 4096
  1. test-valid-child.yaml
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
name: valid-child-queue
spec:
parentQueue: parent-queue
resources:
  cpu:
    quota: 1000
  gpu:
    quota: 1
  memory:
    quota: 2048
  1. test-invalid-child.yaml
apiVersion: scheduling.run.ai/v2
kind: Queue
metadata:
name: invalid-child-queue
spec:
parentQueue: parent-queue
resources:
  cpu:
    quota: 3000  # This exceeds parent's 2000
  gpu:
    quota: 1
  memory:
    quota: 2048

cc @enoodle

}
}

// Validate total children quotas don't exceed parent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not an invalid state to have over-provisioning of quotas in child queues.
The scheduler enforces all the quota restrictions on both levels, so sometimes admins like to over-provision child queue under a parent queue, with the assumption that not all child-queues are fully utilized all the time. If they ARE requesting more resources than the parent queue allows, the scheduler simply resolves it within that queue

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining.
In that case please correct me if im wrong, Child CPU/GPU/Memory quotas > parent quota -> This can be a WARNING instead of an ERROR.
cc @enoodle

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @itsomri - it is a valid state to have a queue limiting the quota of all of its sub queues. I would also not warn that it "may cause pod scheduling failures" because it might be the intention:
For example I may have a queue for many researchers to use interactive jobs and I might want to limit their total GPU request to 5, but each one can also request up to 5. The pods will not fail to schedule but will just have to wait for their turn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole validation should be behind a feature flag that will be opt-in

@AdheipSingh AdheipSingh requested a review from itsomri January 8, 2026 19:41
}
return nil, nil
}
// TODO: Remove after QueueValidator is fully integrated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be missing after this PR for it to be fully integrated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When the CPU value in the test queue is larger than that in the default queue, the queue can be created normally.

3 participants