Skip to content

Conversation

@Adir111
Copy link
Contributor

@Adir111 Adir111 commented Aug 25, 2025

  • Fixed bug with CPU/Memory graphs, which reserved was negative.
  • Fixed bug with CPU/Memory graphs, which caused wrong calculation for algorithms (worker requirements were not taken in account).
  • Fixed bug with CPU/Memory graphs, where 'free' could be calculated wrong.
  • Now showing in graph only active requested values, which is much better indicator for environment usage (could take limit as well when env.USE_RESOURCE_LIMITS was true, which didn't really give us truthy information).
  • Fixed bug with jobCreator, which could create a spec with null resources - job creation would have failed. Now resources written are only the one configured custom (could happen with workerCustomResources).
  • Fixed bug with etcd data, where worker stats under nodes were incorrectly counted. Now counts correctly.
  • Fixed bug with resource calculation in task-executor - when useResourceLimits (env.USE_RESOURCE_LIMITS) was true and there was a pod without limit - took it as 0. Now it takes its 'request' if there is no limit, for calculation purposes in scheduling an algorithm request.
  • Fixed bug - when checking if there are enough resources for job creation, didn't take into account the worker request resources when they were the default environment value (took just 0 before).
  • Added some jsdoc to methods, and did tiny refactors (nothing special).
  • Added additional unit-tests for normalizeResources method.

IMPORTANT - This PR cannot be merged until this one: kube-HPC/kubernetes-client.hkube#39 is merged, and package.lock file is updated with the new version of our k8s client. Also, this PR should be merged before as well: #2224

Related issue - #2185


This change is Reviewable

…yResources is true, plus fixed merged method to not include fields that are empty or null, since null will fail k8s job and empty will just be odd in the job spec, better of without it.
…limits if useResouceLimits is on (some pods have no limit, which can cause limits to be lower than requests, which is impossible).
golanha
golanha previously approved these changes Aug 31, 2025
Copy link
Member

@golanha golanha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@golanha reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Adir111)

@Adir111 Adir111 merged commit 8979a58 into master Aug 31, 2025
4 of 5 checks passed
@Adir111 Adir111 deleted the incorrect_resources_allocation branch August 31, 2025 12:29
hkube-ci pushed a commit that referenced this pull request Aug 31, 2025
* refactored names and added jsdoc

* fixed bug where workers stats under node were not calculated correctly

* fixed bug - bad calculation, as long free is negative it causes reserved to always be negative.

* added new method to get default value for resources, via limitrange

* added stub data for test

* fixed bug - should apply workerResourceRequests only if settings.applyResources is true, plus fixed merged method to not include fields that are empty or null, since null will fail k8s job and empty will just be odd in the job spec, better of without it.

* refactor a bit, and changed behavior to take max out of requests and limits if useResouceLimits is on (some pods have no limit, which can cause limits to be lower than requests, which is impossible).

* jsdoc was added and small refactor

* now taking worker resources in account when calculating resources per algortihm

* extracted default container resources out of the reconcile, and now in data preparation

* fix all graph calculations errors

* Now takes maximum between request and limit in case useResourceLimits flag is true, in case many pods have no limit and causes limit to be lower than request (which is impossible)

* Passing default resources, so when calculating a worker without any special definition of resource, it will take its default resource instead to be more concise in calculation of requirements

* Now sending actual request to other always, while keeping the requests for calculations dependant on the requests or limits & requests (for missing limit)

* fixed free&other  calculation after changes in task-executor

* .

* added unit tests to test the method after it changes.

* update k8s client version .... bump version [skip ci]
hkube-ci pushed a commit that referenced this pull request Aug 31, 2025
* refactored names and added jsdoc

* fixed bug where workers stats under node were not calculated correctly

* fixed bug - bad calculation, as long free is negative it causes reserved to always be negative.

* added new method to get default value for resources, via limitrange

* added stub data for test

* fixed bug - should apply workerResourceRequests only if settings.applyResources is true, plus fixed merged method to not include fields that are empty or null, since null will fail k8s job and empty will just be odd in the job spec, better of without it.

* refactor a bit, and changed behavior to take max out of requests and limits if useResouceLimits is on (some pods have no limit, which can cause limits to be lower than requests, which is impossible).

* jsdoc was added and small refactor

* now taking worker resources in account when calculating resources per algortihm

* extracted default container resources out of the reconcile, and now in data preparation

* fix all graph calculations errors

* Now takes maximum between request and limit in case useResourceLimits flag is true, in case many pods have no limit and causes limit to be lower than request (which is impossible)

* Passing default resources, so when calculating a worker without any special definition of resource, it will take its default resource instead to be more concise in calculation of requirements

* Now sending actual request to other always, while keeping the requests for calculations dependant on the requests or limits & requests (for missing limit)

* fixed free&other  calculation after changes in task-executor

* .

* added unit tests to test the method after it changes.

* update k8s client version .... bump version [skip ci]
RonShvarz pushed a commit that referenced this pull request Sep 7, 2025
* refactored names and added jsdoc

* fixed bug where workers stats under node were not calculated correctly

* fixed bug - bad calculation, as long free is negative it causes reserved to always be negative.

* added new method to get default value for resources, via limitrange

* added stub data for test

* fixed bug - should apply workerResourceRequests only if settings.applyResources is true, plus fixed merged method to not include fields that are empty or null, since null will fail k8s job and empty will just be odd in the job spec, better of without it.

* refactor a bit, and changed behavior to take max out of requests and limits if useResouceLimits is on (some pods have no limit, which can cause limits to be lower than requests, which is impossible).

* jsdoc was added and small refactor

* now taking worker resources in account when calculating resources per algortihm

* extracted default container resources out of the reconcile, and now in data preparation

* fix all graph calculations errors

* Now takes maximum between request and limit in case useResourceLimits flag is true, in case many pods have no limit and causes limit to be lower than request (which is impossible)

* Passing default resources, so when calculating a worker without any special definition of resource, it will take its default resource instead to be more concise in calculation of requirements

* Now sending actual request to other always, while keeping the requests for calculations dependant on the requests or limits & requests (for missing limit)

* fixed free&other  calculation after changes in task-executor

* .

* added unit tests to test the method after it changes.

* update k8s client version
RonShvarz pushed a commit that referenced this pull request Sep 7, 2025
* refactored names and added jsdoc

* fixed bug where workers stats under node were not calculated correctly

* fixed bug - bad calculation, as long free is negative it causes reserved to always be negative.

* added new method to get default value for resources, via limitrange

* added stub data for test

* fixed bug - should apply workerResourceRequests only if settings.applyResources is true, plus fixed merged method to not include fields that are empty or null, since null will fail k8s job and empty will just be odd in the job spec, better of without it.

* refactor a bit, and changed behavior to take max out of requests and limits if useResouceLimits is on (some pods have no limit, which can cause limits to be lower than requests, which is impossible).

* jsdoc was added and small refactor

* now taking worker resources in account when calculating resources per algortihm

* extracted default container resources out of the reconcile, and now in data preparation

* fix all graph calculations errors

* Now takes maximum between request and limit in case useResourceLimits flag is true, in case many pods have no limit and causes limit to be lower than request (which is impossible)

* Passing default resources, so when calculating a worker without any special definition of resource, it will take its default resource instead to be more concise in calculation of requirements

* Now sending actual request to other always, while keeping the requests for calculations dependant on the requests or limits & requests (for missing limit)

* fixed free&other  calculation after changes in task-executor

* .

* added unit tests to test the method after it changes.

* update k8s client version .... bump version [skip ci]
RonShvarz pushed a commit that referenced this pull request Sep 7, 2025
* refactored names and added jsdoc

* fixed bug where workers stats under node were not calculated correctly

* fixed bug - bad calculation, as long free is negative it causes reserved to always be negative.

* added new method to get default value for resources, via limitrange

* added stub data for test

* fixed bug - should apply workerResourceRequests only if settings.applyResources is true, plus fixed merged method to not include fields that are empty or null, since null will fail k8s job and empty will just be odd in the job spec, better of without it.

* refactor a bit, and changed behavior to take max out of requests and limits if useResouceLimits is on (some pods have no limit, which can cause limits to be lower than requests, which is impossible).

* jsdoc was added and small refactor

* now taking worker resources in account when calculating resources per algortihm

* extracted default container resources out of the reconcile, and now in data preparation

* fix all graph calculations errors

* Now takes maximum between request and limit in case useResourceLimits flag is true, in case many pods have no limit and causes limit to be lower than request (which is impossible)

* Passing default resources, so when calculating a worker without any special definition of resource, it will take its default resource instead to be more concise in calculation of requirements

* Now sending actual request to other always, while keeping the requests for calculations dependant on the requests or limits & requests (for missing limit)

* fixed free&other  calculation after changes in task-executor

* .

* added unit tests to test the method after it changes.

* update k8s client version .... bump version [skip ci]
RonShvarz added a commit that referenced this pull request Sep 8, 2025
* Incorrect resources allocation (#2224)

* refactored names and added jsdoc

* fixed bug where workers stats under node were not calculated correctly

* fixed bug - bad calculation, as long free is negative it causes reserved to always be negative.

* added new method to get default value for resources, via limitrange

* added stub data for test

* fixed bug - should apply workerResourceRequests only if settings.applyResources is true, plus fixed merged method to not include fields that are empty or null, since null will fail k8s job and empty will just be odd in the job spec, better of without it.

* refactor a bit, and changed behavior to take max out of requests and limits if useResouceLimits is on (some pods have no limit, which can cause limits to be lower than requests, which is impossible).

* jsdoc was added and small refactor

* now taking worker resources in account when calculating resources per algortihm

* extracted default container resources out of the reconcile, and now in data preparation

* fix all graph calculations errors

* Now takes maximum between request and limit in case useResourceLimits flag is true, in case many pods have no limit and causes limit to be lower than request (which is impossible)

* Passing default resources, so when calculating a worker without any special definition of resource, it will take its default resource instead to be more concise in calculation of requirements

* Now sending actual request to other always, while keeping the requests for calculations dependant on the requests or limits & requests (for missing limit)

* fixed free&other  calculation after changes in task-executor

* .

* added unit tests to test the method after it changes.

* update k8s client version

* Incorrect resources allocation (#2224)

* refactored names and added jsdoc

* fixed bug where workers stats under node were not calculated correctly

* fixed bug - bad calculation, as long free is negative it causes reserved to always be negative.

* added new method to get default value for resources, via limitrange

* added stub data for test

* fixed bug - should apply workerResourceRequests only if settings.applyResources is true, plus fixed merged method to not include fields that are empty or null, since null will fail k8s job and empty will just be odd in the job spec, better of without it.

* refactor a bit, and changed behavior to take max out of requests and limits if useResouceLimits is on (some pods have no limit, which can cause limits to be lower than requests, which is impossible).

* jsdoc was added and small refactor

* now taking worker resources in account when calculating resources per algortihm

* extracted default container resources out of the reconcile, and now in data preparation

* fix all graph calculations errors

* Now takes maximum between request and limit in case useResourceLimits flag is true, in case many pods have no limit and causes limit to be lower than request (which is impossible)

* Passing default resources, so when calculating a worker without any special definition of resource, it will take its default resource instead to be more concise in calculation of requirements

* Now sending actual request to other always, while keeping the requests for calculations dependant on the requests or limits & requests (for missing limit)

* fixed free&other  calculation after changes in task-executor

* .

* added unit tests to test the method after it changes.

* update k8s client version .... bump version [skip ci]

* Incorrect resources allocation (#2224)

* refactored names and added jsdoc

* fixed bug where workers stats under node were not calculated correctly

* fixed bug - bad calculation, as long free is negative it causes reserved to always be negative.

* added new method to get default value for resources, via limitrange

* added stub data for test

* fixed bug - should apply workerResourceRequests only if settings.applyResources is true, plus fixed merged method to not include fields that are empty or null, since null will fail k8s job and empty will just be odd in the job spec, better of without it.

* refactor a bit, and changed behavior to take max out of requests and limits if useResouceLimits is on (some pods have no limit, which can cause limits to be lower than requests, which is impossible).

* jsdoc was added and small refactor

* now taking worker resources in account when calculating resources per algortihm

* extracted default container resources out of the reconcile, and now in data preparation

* fix all graph calculations errors

* Now takes maximum between request and limit in case useResourceLimits flag is true, in case many pods have no limit and causes limit to be lower than request (which is impossible)

* Passing default resources, so when calculating a worker without any special definition of resource, it will take its default resource instead to be more concise in calculation of requirements

* Now sending actual request to other always, while keeping the requests for calculations dependant on the requests or limits & requests (for missing limit)

* fixed free&other  calculation after changes in task-executor

* .

* added unit tests to test the method after it changes.

* update k8s client version .... bump version [skip ci]

* moved size into condition (should verify requestedAlgorithm) (#2231)

* moved size into condition (should verify requestedAlgorithm) (#2231) .... bump version [skip ci]

* moved size into condition (should verify requestedAlgorithm) (#2231) .... bump version [skip ci]

* Trigger CI

---------

Co-authored-by: Adir David <[email protected]>
Co-authored-by: GitHub Action <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants