Skip to content

feat: async engine with aiohttp for 3-5x performance improvement #2797

@matthew6s

Description

@matthew6s

Description

Problem

Sherlock currently uses requests-futures with a ThreadPoolExecutor capped at 20 workers. When scanning 400+ sites, requests are batched in groups of 20 with overhead from thread context switching and the GIL. A full scan typically takes 45-90 seconds.

Proposal

A new async_engine.py module using asyncio + aiohttp as a drop-in replacement for the synchronous sherlock() function:

  • aiohttp.ClientSession with TCPConnector for connection pooling
  • asyncio.Semaphore for configurable concurrency (default 100)
  • limit_per_host=3 to stay polite and avoid rate-limiting
  • DNS caching to reduce lookup overhead on repeated scans

New CLI flags

  • --workers N — max concurrent requests (default: 100)
  • --sync — fall back to the legacy synchronous engine

Backwards compatibility

  • Return value is identical (same dict structure, same QueryResult objects)
  • All existing CLI flags work unchanged
  • Default behavior switches to async, with --sync to opt out

Expected performance

Scan type Current (20 threads) Async (100 concurrent)
Full scan (478 sites) ~45-90s ~15-25s
Targeted (50 sites) ~15-20s ~5-8s

New dependency

  • aiohttp ^3.9.0

I have a working implementation ready to PR if there's interest. Happy to adjust the approach based on feedback.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions