fix: prevent memory leak by closing unused context by Martichou · Pull Request #1640 · unclecode/crawl4ai

Martichou · 2025-11-25T20:59:43Z

Summary

When scraping many URLs continuously, browser contexts accumulate in memory and are never cleaned up. The existing cleanup mechanism only runs when browsers go idle, which never happens under continuous load. This causes memory to grow unbounded until the process crashes or becomes unresponsive.

Fixes #943

Small note: I'm not used to python, I won't lie, Claude helped me a bit here, but I've checked what it did and tested it. So this is not just yet another AI slop :)

List of files changed and why

browser_manager.py: Add _context_refcounts tracking, cleanup_contexts(), and release_context() methods
async_crawler_strategy.py: Release context ref in finally block after crawl
deploy/docker/api.py: Trigger context cleanup after each request

How Has This Been Tested?

This has been tested locally by running the following script and comparing the before/after memory usage with both the master version and the patched version through a docker compose.

The script simply perform 100 scrape with 8 concurrency and report the status code repartition:
https://gist.github.com/Martichou/27555055d130d1c65f6a8457fbeb2a22

Result of the test:

Unpatched version:

Baseline memory usage: 4.5%
End of first test run using unpatched version: 23.4%
End of second test run using unpatched version: 27.6%
End of third test run using unpatched version: 32.8%

Patched version:

Baseline memory usage: 5.7%
End of first test run using unpatched version: 11.2%
End of second test run using unpatched version: 12.3%
End of third test run using unpatched version: 13.4%

It may not have eliminated every leaks (1% gains between run for unknown reason), but closing the browser using the kill browser endpoint make the memory go back to 10%.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added/updated unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

When scraping many URLs continuously, browser contexts accumulated in memory and were never cleaned up. The existing cleanup only ran when browsers went idle, which never happened under continuous load. See: #943. Key changes: - browser_manager.py: Add _context_refcounts tracking, cleanup_contexts(), and release_context() methods - async_crawler_strategy.py: Release context ref in finally block after crawl - deploy/docker/api.py: Trigger context cleanup after each request This fixes or at least, drastically improve the memory leaks in my testing.

Signed-off-by: Martichou <m@rtin.fyi>

Copilot

Pull request overview

This pull request addresses a memory leak issue where browser contexts accumulate in memory and are never cleaned up under continuous load. The fix introduces reference counting for contexts and adds periodic cleanup mechanisms.

Implements reference counting to track active usage of browser contexts
Adds cleanup_contexts() method to periodically close idle contexts
Triggers context cleanup after each API request to prevent unbounded memory growth

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

File	Description
crawl4ai/browser_manager.py	Adds reference counting system (_context_refcounts), cleanup_contexts() method for closing idle contexts, and release_context() method for decrementing refcounts
crawl4ai/async_crawler_strategy.py	Adds release_context() call in finally block to decrement refcount when crawl completes
deploy/docker/api.py	Triggers cleanup_contexts() after each request to limit context accumulation, with whitespace cleanup
README.md	Adds new sponsor (Thor Data) - unrelated to memory leak fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-01T18:54:04Z

crawl4ai/async_crawler_strategy.py

+            # Release the context reference so cleanup can work
+            if not self.browser_config.use_managed_browser:
+                try:
+                    config_signature = self.browser_manager._make_config_signature(config)
+                    await self.browser_manager.release_context(config_signature)
+                except Exception:
+                    pass  # Don't fail on cleanup


The release_context call here creates a reference counting imbalance when using session_id. Looking at browser_manager.py get_page(), when a session_id is provided and already exists, the function returns early (line 1063-1066) without incrementing the refcount. However, this release_context will still be called, decrementing a counter that was never incremented. This will cause the refcount to go negative (though clamped to 0 by the max() call in release_context), potentially allowing contexts to be cleaned up while still in use by sessions. The condition should also check that no session_id is being used, similar to: if not self.browser_config.use_managed_browser and not config.session_id:

Copilot · 2026-01-01T18:54:05Z

crawl4ai/browser_manager.py

+            # If force=True and we still have too many, close active ones too
+            if force and len(self.contexts_by_config) - len(contexts_to_close) > max_contexts:
+                remaining_excess = len(self.contexts_by_config) - len(contexts_to_close) - max_contexts
+                contexts_to_close.extend(active_contexts[:remaining_excess])


The documentation states that force will "close contexts even if they have pages (but never if refcount > 0)", but the implementation at lines 1207-1209 will actually close active contexts when force=True, and active_contexts includes contexts with refcount > 0 (added at line 1171-1173). This means force=True can close contexts that are actively being used by requests, contradicting the docstring and potentially causing "Target closed" errors during active crawls. The condition should filter out contexts with refcount > 0 from active_contexts before extending contexts_to_close.

Suggested change

# If force=True and we still have too many, close active ones too

if force and len(self.contexts_by_config) - len(contexts_to_close) > max_contexts:

remaining_excess = len(self.contexts_by_config) - len(contexts_to_close) - max_contexts

contexts_to_close.extend(active_contexts[:remaining_excess])

# If force=True and we still have too many, close additional contexts

# but never close contexts with refcount > 0 (they may be in active use).

if force and len(self.contexts_by_config) - len(contexts_to_close) > max_contexts:

remaining_excess = len(self.contexts_by_config) - len(contexts_to_close) - max_contexts

# From active_contexts, only consider those whose refcount is 0 for forced closure

force_closable_active = [

(sig, ctx)

for sig, ctx in active_contexts

if self._context_refcounts.get(sig, 0) == 0

]

contexts_to_close.extend(force_closable_active[:remaining_excess])

Copilot · 2026-01-01T18:54:05Z

crawl4ai/browser_manager.py

+                            except Exception:
+                                pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except Exception:

pass

except Exception as e:

# Ignore individual page close failures but record them for diagnostics

self.logger.warning(

message="Error closing page during context cleanup: {error}",

tag="WARNING",

params={"error": str(e)}

)

Signed-off-by: Martichou <m@rtin.fyi>

unclecode · 2026-02-01T13:02:51Z

Hi @Martichou — thanks for this PR and the detailed memory profiling data. Your diagnosis of the problem was spot on: browser contexts in contexts_by_config were never cleaned up under continuous load, causing unbounded memory growth.

We ended up implementing the fix directly on develop with a slightly different (but inspired by your work) approach that accounts for recent changes to browser_manager.py (global page tracking, CDP connection caching, etc.) that would have caused heavy merge conflicts with this PR. Here's what we landed:

1. Config signature shrink (the bigger win)

The root cause of excessive context creation was that _make_config_signature() was hashing ~60+ fields from CrawlerRunConfig, but only 7 fields actually affect the browser context (proxy_config, locale, timezone_id, geolocation, override_navigator, simulate_user, magic). Fields like word_count_threshold, css_selector, screenshot, verbose, etc. were producing phantom-different signatures for browser-identical contexts. We switched from a blacklist to a whitelist approach.

2. Refcount tracking + LRU eviction (your core idea)

Following your reference-counting concept, we added:

_context_refcounts — tracks active crawls per context (increment in get_page() under lock, decrement in release_page_with_context() under lock)
_context_last_used — monotonic timestamps for LRU ordering
_evict_lru_context_locked() — when contexts exceed _max_contexts (default 20), evicts the oldest context with refcount == 0
Eviction only targets idle contexts (never evicts mid-crawl), and context close happens outside the lock to avoid blocking

3. Storage state path leak fix

Also fixed a separate leak where the storage_state code path created a temporary context for cloning that was never closed.

Test coverage

Added 20 tests (8 signature, 5 eviction logic, 7 real-browser integration) — all passing. The real-browser tests verify: same context reused for non-context config changes, refcounts drop to 0 after crawl, LRU eviction caps contexts, concurrent crawls track correctly.

Your PR was the catalyst for this investigation — really appreciate the contribution and the benchmarking script. Closing in favor of the direct implementation on develop.

@Martichou

contexts_by_config accumulated browser contexts unboundedly in long-running crawlers (Docker API). Two root causes fixed: 1. _make_config_signature() hashed ~60 CrawlerRunConfig fields but only 7 affect the browser context (proxy_config, locale, timezone_id, geolocation, override_navigator, simulate_user, magic). Switched from blacklist to whitelist — non-context fields like word_count_threshold, css_selector, screenshot, verbose no longer cause unnecessary context creation. 2. No eviction mechanism existed between close() calls. Added refcount tracking (_context_refcounts, incremented under _contexts_lock in get_page, decremented in release_page_with_context) and LRU eviction (_evict_lru_context_locked) that caps contexts at _max_contexts=20, evicting only idle contexts (refcount==0) oldest-first. Also fixed: storage_state path leaked a temporary context every request (now explicitly closed after clone_runtime_state). Closes #943. Credit to @Martichou for the investigation in #1640.

unclecode · 2026-02-04T02:02:19Z

@Martichou Thanks for the detailed memory analysis — really helpful.

I've pushed additional fixes to the develop branch that build on top of your findings:

Memory-saving Chrome flags — disabled unused Chrome features (OptimizationHints, MediaRouter, component updates, domain reliability) by default, plus an opt-in memory_saving_mode for aggressive cache discard and V8 heap cap (512MB)
Browser recycling — new max_pages_before_recycle config option that automatically restarts the browser process after N pages to reclaim leaked Chromium memory. Blocks new crawls during recycle, wakes them when done. Recommended: 500-1000 for long-running crawlers
CDP session leak fix — cdp.detach() after viewport adjustment to prevent orphaned DevTools sessions from accumulating
Session kill fix — context is now only closed when its refcount drops to 0, preventing use-after-close errors for shared contexts

Usage:

config = BrowserConfig(
    memory_saving_mode=True,           # aggressive cache + V8 heap cap
    max_pages_before_recycle=500,      # recycle browser every 500 pages
)

Could you pull the develop branch and re-run your memory tests to see how these changes affect the growth pattern? Especially interested in whether the recycling + CDP detach combo brings the curve back down.

Martichou · 2026-02-04T11:46:33Z

@unclecode Hey, thanks for following-up!

I've just tested the new max_pages_before_recycle. A couple of things to say; I think this is exactly what's needed, so kudo for that! But this should be more aggressive, when going through my backlog of links to scrape, the Crawl4AI instance perform more than 20 concurrent requests, which means the recycle will never be performed due to the various check for no-inflight crawl.

In my testing, doing crawl one by one manually allow the recycle to trigger, but with 20 concurrent there's no chance.
I've set the max max_pages_before_recycle to 5, and already am at more than 189 hits without recycle.

What would you think of when a recycle should happens, we either:

block all new incoming until there's no more in-flight to allow the recycle to trigger (with a max timeout and force kill if needed).
instead of blocking new incoming, we force re-create a new instance of the browser, to be used from now on for this config and this would let the previous one be recycled/killed.
keep the recycle as is, but have a new option be "new browser after X", or this can be simply a params in the browser_config to specify the "version" so the caller can decide when he want a new instance to be created? And this version would be used in the hash of the config to see which browser to use?

Lemme know what you think, and thank you for your help here :)

unclecode · 2026-02-05T07:55:37Z

@Martichou Thanks for testing and the great feedback! You were right — the old approach never triggered under sustained concurrent load because it waited for total_active == 0.

I've pushed a completely new approach based on your suggestion (option 2 — spin up a new browser while old one drains):

How it works now:

Instead of waiting for a quiet moment, we bump a version number and let old browsers drain naturally:

_browser_version is now part of the config signature
When threshold is hit → bump version → reset counter
New requests automatically get new contexts (different version = different signature)
Old contexts stay in _pending_cleanup and drain naturally
When old context's refcount drops to 0, it gets closed
Safety cap: max 3 old browsers draining at once (blocks if exceeded)

No blocking, no waiting — old and new browsers coexist briefly during transitions. This means recycling now works under any load pattern.

Usage (unchanged):

config = BrowserConfig(
    memory_saving_mode=True,
    max_pages_before_recycle=500,
)

Could you pull develop again and re-run your memory tests? This time with 20+ concurrent crawls, you should see the version bumps happening in the logs and memory being reclaimed as old contexts drain.

Martichou · 2026-02-05T18:11:32Z

Hey @unclecode, thanks for the follow-up again.

I've just test the develop branch (with the newest changes) and it's not behaving as expected. I'm not sure why, I can't really make sense of all the logs but from what I see and understand is that the old contexts never get closed because the request seems to never complete(?).
Maybe unrelated but there's quite a lot of error too in the logs.

From what I see in the UI: more than 90+ active request, with 900+ seconds of being active; when usually it's between 10s and 90s, no more (timeout of 60s and hard timeout of 2 minutes). My system is doing at most 24 concurrents requests.

I guess something's wrong in the cleanup / count or something? I wish I could help more here, but this is kinda beyond my Python understanding right now (and inner working of Crawl4AI).

Helped with AI (disclosing cause maybe stupid):

Can the release_page_with_context runs before _maybe_bump_browser_version adds the sig to _pending_cleanup? Because then the sig enters pending with refcount already at 0, and no future release will trigger cleanup?

Suggested fix by Claude:
Fix: After adding sigs to _pending_cleanup, immediately check if any already have refcount 0 and clean them up right there.

I'll let you decide and see what you want to do, but I highly appreciate your time and your effort in solving this issues once and for all! Kudo for that 🫶

hanged.log

Martichou force-pushed the fix/leaks branch from 4e1c406 to 8bdef83 Compare November 25, 2025 21:02

ntohidi changed the base branch from main to develop November 26, 2025 08:18

aravindkarnam and others added 4 commits December 23, 2025 16:28

sponsors: Add thor data as sponsor

da82f0a

sponsors: Add thor data as sponsor

a234959

Merge pull request #1677 from unclecode/sponsors/thor_data

c85f56b

Copilot AI review requested due to automatic review settings January 1, 2026 18:48

Martichou force-pushed the fix/leaks branch from 8bdef83 to a046203 Compare January 1, 2026 18:48

Copilot started reviewing on behalf of Martichou January 1, 2026 18:48 View session

chore: add lsof inside dockerfile

5196b95

Signed-off-by: Martichou <m@rtin.fyi>

Copilot AI reviewed Jan 1, 2026

View reviewed changes

chore: add tini inside the dockerfile

43b197a

Signed-off-by: Martichou <m@rtin.fyi>

unclecode closed this Feb 1, 2026

Martichou deleted the fix/leaks branch February 1, 2026 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent memory leak by closing unused context#1640

fix: prevent memory leak by closing unused context#1640
Martichou wants to merge 6 commits intounclecode:developfrom
Martichou:fix/leaks

Martichou commented Nov 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 1, 2026

Uh oh!

Copilot AI Jan 1, 2026

Uh oh!

Copilot AI Jan 1, 2026

Uh oh!

unclecode commented Feb 1, 2026

Uh oh!

unclecode commented Feb 4, 2026

Uh oh!

Martichou commented Feb 4, 2026 •

edited

Loading

Uh oh!

unclecode commented Feb 5, 2026

Uh oh!

Martichou commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-            # If force=True and we still have too many, close active ones too
-            if force and len(self.contexts_by_config) - len(contexts_to_close) > max_contexts:
-                remaining_excess = len(self.contexts_by_config) - len(contexts_to_close) - max_contexts
-                contexts_to_close.extend(active_contexts[:remaining_excess])
+            # If force=True and we still have too many, close additional contexts
+            # but never close contexts with refcount > 0 (they may be in active use).
+            if force and len(self.contexts_by_config) - len(contexts_to_close) > max_contexts:
+                remaining_excess = len(self.contexts_by_config) - len(contexts_to_close) - max_contexts
+                # From active_contexts, only consider those whose refcount is 0 for forced closure
+                force_closable_active = [
+                    (sig, ctx)
+                    for sig, ctx in active_contexts
+                    if self._context_refcounts.get(sig, 0) == 0
+                ]
+                contexts_to_close.extend(force_closable_active[:remaining_excess])

-                            except Exception:
-                                pass
+                            except Exception as e:
+                                # Ignore individual page close failures but record them for diagnostics
+                                self.logger.warning(
+                                    message="Error closing page during context cleanup: {error}",
+                                    tag="WARNING",
+                                    params={"error": str(e)}
+                                )

Uh oh!

Conversation

Martichou commented Nov 25, 2025

Summary

List of files changed and why

How Has This Been Tested?

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

unclecode commented Feb 1, 2026

1. Config signature shrink (the bigger win)

2. Refcount tracking + LRU eviction (your core idea)

3. Storage state path leak fix

Test coverage

Uh oh!

unclecode commented Feb 4, 2026

Uh oh!

Martichou commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

unclecode commented Feb 5, 2026

How it works now:

Usage (unchanged):

Uh oh!

Martichou commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Martichou commented Feb 4, 2026 •

edited

Loading