Skip to content

Conversation

@sanderegg
Copy link
Member

@sanderegg sanderegg commented Jun 3, 2025

What do these changes do?

As part of the issues found by @wvangeit where a service would define file_to_key mapping for input and pass in a zip file.
image

Having a service input port defining a file_to_key mapping and at the same time passing a zip file at runtime produces confusing results:

  1. the dask-sidecar downloads the input file which is a zip file,
  2. renames the zip file to the key map (here values.json)
  3. then since it's a zip file unzips the file which happened to also contain a values.json file which overrides the original values.json zip file
  4. the dask-sidecar then removes the original zip file, which means deletes the values.json file
    --> the jsonifier service would actually run with an empty input data and will not complain as it can run with no inputs.

This PR improves the dask-sidecar to raise an exception if it detects:

  • that an input port defines a file_to_key mapping
  • and that the user passes a zip file and does not set the file mime type explicitely to application/zip

On the other hand the unzipping process is enhanced by:

  • in case extraction is needed
  • downloads the zip file to a temporary folder
  • and extracts the zip file to the destination path

Related issue/s

How to test

Dev-ops

@sanderegg sanderegg added this to the Bazinga! milestone Jun 3, 2025
@sanderegg sanderegg self-assigned this Jun 3, 2025
@sanderegg sanderegg added t:maintenance Some planned maintenance work a:dask-service Any of the dask services: dask-scheduler/sidecar or worker a:computational clusters labels Jun 3, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the robustness of the dask-sidecar by introducing a fast fail when a file-to-key mapping is used with zip data and by improving the unzipping process using a temporary directory. Key changes include:

  • Adding a new error (ServiceInputsUseFileToKeyMapButReceivesZipDataError) to trigger a fail in case of mismatched file mapping and zip input.
  • Enhancing the unzipping process with a dedicated function (check_need_unzipping) and temporary directory usage.
  • Adjusting log levels and updating test fixtures to cover these scenarios.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
services/dask-sidecar/tests/unit/test_computational_sidecar_tasks.py Updated tests and fixtures to validate the new error behavior with file-to-key mapping and zip input data.
services/dask-sidecar/src/simcore_service_dask_sidecar/utils/files.py Introduced check_need_unzipping and adjusted the extraction logic using a temporary directory.
services/dask-sidecar/src/simcore_service_dask_sidecar/computational_sidecar/task_shared_volume.py Modified log level from debug to info when creating the input folder.
services/dask-sidecar/src/simcore_service_dask_sidecar/computational_sidecar/core.py Added logic to raise the new error when a file mapping is present along with zip data that needs extraction.
packages/dask-task-models-library/src/dask_task_models_library/container_tasks/errors.py Defined a new error class for the file-to-key mapping and zip data mismatch scenario.

@sanderegg sanderegg changed the title 🎨Computational backend: Fast fail in case of malformed input syntax and improve unzipping 🎨Computational backend: Fail fast in case of malformed input syntax and improve unzipping Jun 3, 2025
@codecov
Copy link

codecov bot commented Jun 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.05%. Comparing base (c3e1573) to head (2a13ab7).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7804      +/-   ##
==========================================
+ Coverage   86.73%   88.05%   +1.31%     
==========================================
  Files        1850     1269     -581     
  Lines       71875    54189   -17686     
  Branches     1215      189    -1026     
==========================================
- Hits        62338    47714   -14624     
+ Misses       9196     6416    -2780     
+ Partials      341       59     -282     
Flag Coverage Δ
integrationtests 64.21% <ø> (+0.02%) ⬆️
unittests 87.74% <100.00%> (+1.26%) ⬆️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library 79.62% <100.00%> (+0.07%) ⬆️
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 85.09% <ø> (+0.05%) ⬆️
agent 96.29% <ø> (ø)
api_server 91.69% <ø> (ø)
autoscaling 96.03% <ø> (ø)
catalog 92.29% <ø> (ø)
clusters_keeper 99.13% <ø> (ø)
dask_sidecar 91.79% <100.00%> (+0.12%) ⬆️
datcore_adapter 97.94% <ø> (ø)
director 76.73% <ø> (-0.10%) ⬇️
director_v2 91.02% <ø> (-0.09%) ⬇️
dynamic_scheduler 96.69% <ø> (ø)
dynamic_sidecar 90.08% <ø> (ø)
efs_guardian 89.65% <ø> (ø)
invitations 93.00% <ø> (ø)
payments 92.57% <ø> (ø)
resource_usage_tracker 88.98% <ø> (ø)
storage 87.46% <ø> (-0.11%) ⬇️
webclient ∅ <ø> (∅)
webserver 83.76% <ø> (+0.01%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3e1573...2a13ab7. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sanderegg sanderegg force-pushed the improvements/download-zip-file-to-temporary-folder branch from 1951113 to f2da9ef Compare June 3, 2025 17:12
@sanderegg sanderegg added the 🤖-automerge marks PR as ready to be merged for Mergify label Jun 3, 2025
@sanderegg
Copy link
Member Author

@mergify queue

@mergify
Copy link
Contributor

mergify bot commented Jun 3, 2025

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at dc35a5e

Copy link
Member

@odeimaiz odeimaiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jun 3, 2025

@mergify mergify bot merged commit dc35a5e into ITISFoundation:master Jun 3, 2025
96 checks passed
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Jun 6, 2025
92 tasks
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Aug 5, 2025
88 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤖-automerge marks PR as ready to be merged for Mergify a:computational clusters a:dask-service Any of the dask services: dask-scheduler/sidecar or worker t:maintenance Some planned maintenance work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants