Skip to content

Conversation

@sanderegg
Copy link
Member

@sanderegg sanderegg commented Sep 19, 2025

What do these changes do?

  • remove distributed Variable that can timeout after 30 seconds and bring us in a undefined state and send tasks to the underlying dask backend without writing it to the DB
  • removes concurrent access to the dask-scheduler as this defeats the purpose of limiting its access via a semaphore (capacity is 20, with concurrent access set to 10, we get potentially 200 concurrent calls that the dask-scheduler does not handle)
  • removes concurrent access to the DB as many warnings showing DB access approach maximum were spotted.

Related issue/s

How to test

Dev-ops

@sanderegg sanderegg added this to the Cheops milestone Sep 19, 2025
@sanderegg sanderegg self-assigned this Sep 19, 2025
@sanderegg sanderegg added a:director-v2 issue related with the director-v2 service a:computational clusters labels Sep 19, 2025
@codecov
Copy link

codecov bot commented Sep 19, 2025

Codecov Report

❌ Patch coverage is 80.64516% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.62%. Comparing base (a26142b) to head (2bde407).
⚠️ Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (a26142b) and HEAD (2bde407). Click for more details.

HEAD has 31 uploads less than BASE
Flag BASE (a26142b) HEAD (2bde407)
unittests 32 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #8397       +/-   ##
===========================================
- Coverage   87.90%   68.62%   -19.29%     
===========================================
  Files        1951      761     -1190     
  Lines       75951    35038    -40913     
  Branches     1336      175     -1161     
===========================================
- Hits        66763    24044    -42719     
- Misses       8788    10937     +2149     
+ Partials      400       57      -343     
Flag Coverage Δ
integrationtests 63.92% <67.74%> (-0.04%) ⬇️
unittests 84.47% <74.19%> (-2.11%) ⬇️
Components Coverage Δ
pkg_aws_library ∅ <ø> (∅)
pkg_celery_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 76.69% <ø> (-8.30%) ⬇️
agent ∅ <ø> (∅)
api_server ∅ <ø> (∅)
autoscaling ∅ <ø> (∅)
catalog ∅ <ø> (∅)
clusters_keeper ∅ <ø> (∅)
dask_sidecar ∅ <ø> (∅)
datcore_adapter ∅ <ø> (∅)
director ∅ <ø> (∅)
director_v2 90.91% <80.64%> (-0.06%) ⬇️
dynamic_scheduler ∅ <ø> (∅)
dynamic_sidecar 81.87% <ø> (-8.59%) ⬇️
efs_guardian ∅ <ø> (∅)
invitations ∅ <ø> (∅)
payments ∅ <ø> (∅)
resource_usage_tracker ∅ <ø> (∅)
storage ∅ <ø> (∅)
webclient ∅ <ø> (∅)
webserver 58.78% <ø> (-29.20%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a26142b...2bde407. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sanderegg sanderegg force-pushed the computational-backend/performnace-improvements-step6 branch from f40b358 to 51a5010 Compare September 19, 2025 14:18
@mergify
Copy link
Contributor

mergify bot commented Sep 19, 2025

🧪 CI Insights

Here's what we observed from your CI run for 2bde407.

✅ Passed Jobs With Interesting Signals

Pipeline Job Signal Health on master Retries 🔍 CI Insights 📄 Logs
CI integration-tests Base branch is broken, but the job passed. Looks like this might be a real fix 💪 Broken 0 View View
unit-tests Base branch is broken, but the job passed. Looks like this might be a real fix 💪 Broken 0 View View

@sanderegg sanderegg requested a review from Copilot September 19, 2025 14:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the distributed.Variable mechanism from the Dask client module to address timeout issues that could lead to undefined states. The changes eliminate the use of distributed Variables for storing task futures and simplify the task processing pipeline by removing concurrent execution limits in favor of sequential processing.

  • Removes distributed.Variable usage for storing and releasing task futures
  • Replaces concurrent task processing with sequential execution to reduce scheduler load
  • Removes unused imports and constants related to pipeline concurrency limits

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
dask_client.py Removes distributed.Variable operations for task future storage and cleanup
_scheduler_dask.py Converts concurrent task operations to sequential execution and adjusts concurrency limits
_scheduler_base.py Replaces asyncio.gather with limited_gather for better concurrency control

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@sonarqubecloud
Copy link

@sanderegg sanderegg marked this pull request as ready for review September 19, 2025 15:03
Copy link
Contributor

@bisgaard-itis bisgaard-itis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Contributor

@wvangeit wvangeit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@sanderegg sanderegg merged commit 8fafa40 into ITISFoundation:master Sep 19, 2025
91 of 95 checks passed
@sanderegg sanderegg deleted the computational-backend/performnace-improvements-step6 branch September 19, 2025 15:22
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Sep 24, 2025
65 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a:computational clusters a:director-v2 issue related with the director-v2 service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants