Skip to content

Reduce the autoscaled pool logs at INFO level #1591

@vdusek

Description

@vdusek

Current logs:

$ uv run python run_crawler.py
[crawlee.crawlers._playwright._playwright_crawler] INFO  Crawled 0/1 pages, 0 failed requests, desired concurrency 1.
[crawlee._autoscaling.autoscaled_pool] INFO  current_concurrency = 0; desired_concurrency = 2; cpu = 0.0; mem = 0.0; event_loop = 0.0; client_info = 0.0
[crawlee.crawlers._playwright._playwright_crawler] INFO  Crawled 16/52 pages, 0 failed requests, desired concurrency 2.
[crawlee.crawlers._playwright._playwright_crawler] INFO  Crawled 42/87 pages, 0 failed requests, desired concurrency 3.
[crawlee.crawlers._playwright._playwright_crawler] INFO  Crawled 75/100 pages, 0 failed requests, desired concurrency 4.
[crawlee._autoscaling.autoscaled_pool] INFO  Waiting for remaining tasks to finish
[crawlee.crawlers._playwright._playwright_crawler] INFO  Final request statistics:
┌───────────────────────────────┬───────────┐
│ requests_finished             │ 100       │
│ requests_failed               │ 0         │
│ retry_histogram               │ [100]     │
│ request_avg_failed_duration   │ None      │
│ request_avg_finished_duration │ 1.28s     │
│ requests_finished_per_minute  │ 153       │
│ requests_failed_per_minute    │ 0         │
│ request_total_duration        │ 2min 8.4s │
│ requests_total                │ 100       │
│ crawler_runtime               │ 39.27s    │
└───────────────────────────────┴───────────┘

IMO the current autoscaled pool logs are too internal, compared to the higher-level crawler progress logs. They expose metrics (CPU, memory, event loop utilization) and "waiting for remaining tasks to finish", which don't seem useful at the INFO level during normal runs.

I suggest lowering these autoscaled pool logs from INFO to DEBUG. Alternatively, we could keep only the current (and/or desired) concurrency values at INFO, since those are likely the only pieces of information that could make sense for normal runs.

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions