Skip to content

Conversation

@sanderegg
Copy link
Member

@sanderegg sanderegg commented Jul 7, 2025

What do these changes do?

This very noisy PR introduces asynchronous logging facilities into our services (with the exceptions of the dask-sidecar and storage workers).

Why?

Since our services are heavily relying on asynchronous python code via asyncio, the default python logging that we were using is synchronous, which blocks the event loop everytime a log it written into stderr.
Here are some references about that fact:

How is it done?

  • a Queue is used and a secondary thread takes care of writing to the stream now: see logging_utils.py in servicelib
  • the logging_utils.py was refactored and now the following happens:
    • all handlers are removed from the loggers
    • all loggers are set to propagate to their ancestor
    • only the root logger get our handler
    • reduction of logging noise is centralised in logging_utils.py as well --> reduced copy/pasting
  • for fastapi-based application a helper was created in fastapi/logging_lifespan.py in servicelib which brings a setup_logging_lifespan and a backwards-compatible setup_logging_shutdown_event method
  • for the webserver aiohttp-based application the implementation is directly in the webserver, see log.py and cli.py in the webserver service

Testing

  • a new pytest-simcore fixture was added preserve_caplog_for_async_logging which is in autouse mode. Pytest normally attaches special log handlers when tests are running and these are removed by the reworked logging_utils. This fixture ensures that the handlers are added again.

Additional changes

  • alignment of boot.sh for fastapi-based application: now uses an application factory (there is no real reason for this, but it aligns better with the web server)
  • all entrypoint.sh now also print if UV is installed
  • STORAGE_TRACING ENV variable was missing from docker-compose.yml

Related issue/s

How to test

Dev-ops

@sanderegg sanderegg added this to the Engage milestone Jul 7, 2025
@sanderegg sanderegg self-assigned this Jul 7, 2025
@sanderegg sanderegg added the t:maintenance Some planned maintenance work label Jul 7, 2025
@codecov
Copy link

codecov bot commented Jul 7, 2025

Codecov Report

Attention: Patch coverage is 67.87330% with 142 lines in your changes missing coverage. Please review.

Project coverage is 88.19%. Comparing base (1ec6ab7) to head (fae4400).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8064      +/-   ##
==========================================
- Coverage   88.26%   88.19%   -0.08%     
==========================================
  Files        1879     1878       -1     
  Lines       72085    72242     +157     
  Branches     1269     1272       +3     
==========================================
+ Hits        63624    63712      +88     
- Misses       8094     8155      +61     
- Partials      367      375       +8     
Flag Coverage Δ
integrationtests 64.31% <43.93%> (-0.04%) ⬇️
unittests 86.80% <67.87%> (-0.06%) ⬇️
Components Coverage Δ
pkg_aws_library 93.93% <ø> (ø)
pkg_celery_library 87.34% <ø> (ø)
pkg_dask_task_models_library 79.62% <ø> (ø)
pkg_models_library 93.24% <ø> (ø)
pkg_notifications_library 85.26% <ø> (ø)
pkg_postgres_database 88.22% <ø> (ø)
pkg_service_integration 70.17% <ø> (ø)
pkg_service_library 71.76% <80.95%> (+0.58%) ⬆️
pkg_settings_library 90.39% <ø> (ø)
pkg_simcore_sdk 84.99% <ø> (-0.18%) ⬇️
agent 93.81% <28.57%> (-2.48%) ⬇️
api_server 92.85% <100.00%> (+0.01%) ⬆️
autoscaling 95.88% <100.00%> (ø)
catalog 92.34% <45.00%> (-0.25%) ⬇️
clusters_keeper 99.13% <100.00%> (+<0.01%) ⬆️
dask_sidecar 92.37% <100.00%> (+0.01%) ⬆️
datcore_adapter 97.94% <100.00%> (ø)
director 76.23% <27.77%> (-0.55%) ⬇️
director_v2 91.04% <65.51%> (-0.04%) ⬇️
dynamic_scheduler 96.27% <45.45%> (-0.42%) ⬇️
dynamic_sidecar 90.07% <81.81%> (-0.03%) ⬇️
efs_guardian 89.76% <100.00%> (+0.11%) ⬆️
invitations 91.44% <18.75%> (-2.16%) ⬇️
payments 92.60% <100.00%> (+0.03%) ⬆️
resource_usage_tracker 92.55% <100.00%> (-0.14%) ⬇️
storage 86.33% <0.00%> (-0.18%) ⬇️
webclient ∅ <ø> (∅)
webserver 88.46% <50.00%> (-0.09%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ec6ab7...fae4400. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sanderegg sanderegg force-pushed the maintenance/async-logger-facilities branch 4 times, most recently from 40ee833 to 476d08d Compare July 9, 2025 07:33
@sanderegg sanderegg changed the title ⚗️Make logging asynchronous ⚗️Introduce asynchronous logging facilities Jul 9, 2025
@sanderegg sanderegg requested a review from Copilot July 9, 2025 08:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces non-blocking asynchronous logging support across services by replacing the old synchronous config_all_loggers approach with a new setup_loggers function and an async_loggers_lifespan context manager. It also updates entrypoint scripts to surface the UV binary path and bumps PostgreSQL images in compose files.

  • Refactor logging setup: replace config_all_loggers with setup_loggers and offer an async logging lifespan
  • Add tests for async logging utilities (async_loggers_lifespan)
  • Update Docker entrypoints to echo the UV executable and bump Postgres image versions

Reviewed Changes

Copilot reviewed 43 out of 44 changed files in this pull request and generated no comments.

Show a summary per file
File Description
services/web/server/src/simcore_service_webserver/log.py Removed sync logger config and updated setup_logging signature
services/web/server/src/simcore_service_webserver/cli.py Added async_loggers_lifespan integration via AsyncExitStack
packages/service-library/src/servicelib/logging_utils.py Implemented setup_loggers and async_loggers_lifespan, deprecated old config function
packages/service-library/tests/test_logging_utils.py Added async logging tests, including retry-based assertions
services/*/docker/entrypoint.sh (multiple services) Echo UV path in entrypoint scripts
services/*/docker-compose-extra.yml (Postgres services) Bumped Postgres images from 14.5 to 14.8 in compose files
Comments suppressed due to low confidence (3)

services/web/server/docker/entrypoint.sh:30

  • [nitpick] The label "UV" is ambiguous and the uv executable may not exist. Consider renaming to "uvicorn" and using command -v uvicorn to verify the correct binary.
echo "$INFO" "UV : $(command -v uv)"

services/director-v2/docker-compose-extra.yml:3

  • [nitpick] The image tag is wrapped in quotes while others are unquoted. Remove the surrounding quotes for consistency across compose files.
    image: "postgres:14.8-alpine@sha256:150dd39ccb7ae6c7ba6130c3582c39a30bb5d3d22cb08ad0ba37001e3f829abc"

packages/service-library/tests/test_logging_utils.py:351

  • The use of parentheses around multiple context managers will be interpreted as a single tuple context and raise a syntax error. Change to with suppress(ValueError), log_exceptions(...) without the outer parentheses.
    with (

@sanderegg sanderegg force-pushed the maintenance/async-logger-facilities branch 7 times, most recently from 9186396 to 77f8894 Compare July 10, 2025 15:24
@sanderegg sanderegg changed the title ⚗️Introduce asynchronous logging facilities ⚗️Introduce asynchronous logging facilities (🚨) Jul 10, 2025
@sanderegg sanderegg marked this pull request as ready for review July 10, 2025 15:46
@sanderegg sanderegg added the 🤖-automerge marks PR as ready to be merged for Mergify label Jul 10, 2025
@sanderegg
Copy link
Member Author

@mergify queue

@mergify
Copy link
Contributor

mergify bot commented Jul 10, 2025

queue

🟠 Waiting for conditions to match

  • -closed [📌 queue requirement]
  • any of: [🔀 queue conditions]
    • all of: [📌 queue conditions of queue default]
      • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
      • #review-threads-unresolved=0
      • #approved-reviews-by >= 2 [🛡 GitHub branch protection]
      • #approved-reviews-by>=2
      • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
      • #changes-requested-reviews-by=0
      • -conflict
      • -draft
      • base=master
      • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
      • label!=🤖-do-not-merge
      • label=🤖-automerge
      • any of: [🛡 GitHub branch protection]
        • check-skipped = deploy to dockerhub
        • check-neutral = deploy to dockerhub
        • check-success = deploy to dockerhub
      • any of: [🛡 GitHub branch protection]
        • check-success = system-tests
        • check-neutral = system-tests
        • check-skipped = system-tests
      • any of: [🛡 GitHub branch protection]
        • check-success = unit-tests
        • check-neutral = unit-tests
        • check-skipped = unit-tests
      • any of: [🛡 GitHub branch protection]
        • check-success = check OAS' are up to date
        • check-neutral = check OAS' are up to date
        • check-skipped = check OAS' are up to date
      • any of: [🛡 GitHub branch protection]
        • check-success = integration-tests
        • check-neutral = integration-tests
        • check-skipped = integration-tests
      • any of: [🛡 GitHub branch protection]
        • check-success = build-test-images (frontend) / build-test-images
        • check-neutral = build-test-images (frontend) / build-test-images
        • check-skipped = build-test-images (frontend) / build-test-images
  • -conflict [📌 queue requirement]
  • -draft [📌 queue requirement]
  • any of: [📌 queue -> configuration change requirements]
    • -mergify-configuration-changed
    • check-success = Configuration changed

return _logging_lifespan


def setup_logging_shutdown_event(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really do not care when the action is taken, the function should not reflect this. Others don't

Suggested change
def setup_logging_shutdown_event(
def setup_logging(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GitHK please read the documentation of that helper function.
It is used for old style fastapi applications and is to be used as a shutdown event.

Since we have things half way done, with some stuff going to lifespans, some others not as usual, there is a need for the 2 systems.. That does not make things simpler.
Also I don't get your point, why is the other one supposed to be _lifespan suffix and not this one? this makes no sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes, I see what you mean.
I would maybe call this function get_setup_logging_shutdown_event since it sounds better to me.
All good though.

@sanderegg sanderegg force-pushed the maintenance/async-logger-facilities branch from b649bb9 to fae4400 Compare July 15, 2025 06:17
@sonarqubecloud
Copy link

@sanderegg sanderegg merged commit a9d9415 into ITISFoundation:master Jul 15, 2025
94 of 96 checks passed
@sanderegg sanderegg deleted the maintenance/async-logger-facilities branch July 15, 2025 09:08
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Aug 5, 2025
88 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤖-automerge marks PR as ready to be merged for Mergify t:maintenance Some planned maintenance work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants