Skip to content

Releases: spider-rs/spider

v2.45.28

02 Mar 15:28

Choose a tag to compare

Agent Hardening

  • Cap LLM-controlled durations (Wait, ClickHold, SetViewport, OpenPage)
  • Add js_escape() for safe JS string interpolation in action handlers
  • Wrap Navigate and screenshot calls with timeouts
  • Use PageWaitStrategy::Load for WaitForNavigation instead of fixed sleep
  • Replace eval_with_timeout for Fill/Type/Clear actions with error propagation
  • Improve semaphore and logging diagnostics on error paths

v2.45.27

02 Mar 14:57

Choose a tag to compare

Changes since vv1.7.5

  • feat(agent): eval timeouts, spawn concurrency limiter, PageWaitStrategy, panic hardening
  • feat(agent): action feedback, partial result preservation, and JS error capture
  • style: fix rustfmt formatting in smart_vs_chrome tests
  • chore(release): 2.45.25
  • fix: use final redirect URL as base for relative link resolution (#364)
  • chore: ignore transitive audit advisories with no direct fix
  • chore: ignore RUSTSEC-2025-0134 (rustls-pemfile via warp)
  • chore: ignore RUSTSEC-2026-0002 (lru 0.13.0 via wreq)
  • docs: backfill CHANGELOG v2.45.1-v2.45.22 and add .editorconfig
  • chore: remove cargo dependabot, keep github-actions only
  • chore: fix clippy warnings and formatting across workspace
  • chore(deps): bump flexbuffers 2->25, async-openai 0.32->0.33, clap 4.5.59->4.5.60, actions
  • docs: rewrite README, add issue/PR templates, security policy, CI improvements
  • docs: add Contributor Covenant v2.1 Code of Conduct
  • docs: rewrite CONTRIBUTING.md and add zero-config quick start to README
  • fix(cache): proper HTTP staleness for Chrome-cached pages
  • fix(cache): Period policy bypasses HTTP is_stale for Chrome-rendered pages
  • chore: bump all workspace crates to v2.45.22
  • fix(cache): enable cache_chrome_hybrid_mem feature for Chrome cache writes
  • chore: bump all crates to 2.45.21, chromey 2.38.4
  • chore(agent): fix model router import
  • docs(spider_agent): add SKILLS.md for engine skill integration reference
  • chore(chrome): 2.45.19 root sandbox default launch
  • fix: add no_sandbox() to headless BrowserConfig (#354)
  • test: comprehensive crawler-test.com integration suite (302 tests, 408 URLs)
  • feat: per-round model pool routing for cost-optimized automation + bump 2.45.18
  • perf: skip model scoring for pools ≤ 2 models + bump 2.45.17
  • test: comprehensive multi-LLM router reliability tests + bump 2.45.16
  • release(cli): 2.45.15 wait_for
  • feat(cli): add wait-for capabilities to spider_cli (#352)
  • fix: Chrome mode honors wait_for config for networkIdle before HTML extraction
  • fix: smart mode lifecycle waiting — match Chrome coverage without re-fetch
  • fix: auto-retry www. URLs on SSL protocol errors — strip www prefix
  • fix: reject empty HTML from all cache and seeded resource paths
  • fix: invalidate empty HTML shell responses on cache read path
  • feat: deferred Chrome — cache-only crawl phase before browser launch
  • feat: cache-first fast path — skip browser/HTTP when cache has data
  • fix: add Website::with_hedge() forwarding method + bump to v2.45.7
  • feat: add work-stealing (hedged requests) for slow crawl requests
  • fix: skip HTML-specific heuristics in cache empty check for non-HTML content
  • chore: bump all workspace crates to v2.45.4
  • fix: skip caching empty/near-empty HTML responses
  • chore: bump all workspace crates to v2.45.3
  • chore: bump spider to v2.45.3
  • fix(uring_fs): fix compile errors when io_uring feature is enabled
  • perf(io_uring): add StreamingWriter for streaming file I/O, bump v2.45.2
  • feat(io_uring): expand io_uring integration for file I/O, bump v2.45.1
  • refactor(agent): extract types and HTML cleaning into spider_agent_types and spider_agent_html crates
  • chore(deps): major version bumps, bump v2.45.0
  • chore: bump v2.44.42
  • perf(website): use take() instead of clone() for subdomain base URL
  • fix(website): add tests for subdomain relative URL resolution, bump v2.44.41
  • fix(website): use page's own URL for relative link resolution on subdomains
  • chore: clarify network event collection comment (#350)
  • feat(agent): integrate llm_models_spider v0.1.9 with smart model selection
  • docs: add missing feature flags, Spider Cloud example, and integration docs
  • refactor(agent): move all skill content to spider_skills crate
  • feat(agent): add L15-L48 skills, NEST/CIRCLE engine loops, haiku benchmark
  • fix: add should_use_chrome_ai to stub RemoteMultimodalConfigs
  • feat(agent): add L8-L14 skills, WAM engine loop, and Chrome AI improvements
  • chore(agent): add improved steps
  • test(website): fix rotation cases
  • feat(agent): optimize text-only rounds and bump workspace crates
  • chore: fix lint warnings and feature-combo build regressions
  • fix: improve agent concurrency and ensure all-feature builds
  • feat(cache): optimize automation caching and bump crates to 2.44.33
  • Add crawl_smart cache parity test and bump crates to 2.44.32
  • Improve cache skip-browser flows and add remote cache e2e example
  • feat: add spider cloud end-to-end examples and bump crates to 2.44.30
  • Update Cargo.lock for 2.44.29
  • Improve remote multimodal automation reliability and release 2.44.29
  • Expose optional automation reasoning in metadata and bump 2.44.28
  • release: bump workspace crates to 2.44.27
  • feat(spider_cli): add runtime http/headless mode controls and release 2.44.26
  • chore(client): add fallback agent declaring
  • docs: update dual-model example with per-endpoint URL and API key configuration
  • feat: dual-model routing examples and comprehensive tests
  • feat: extraction-only mode optimization for single-round data extraction
  • docs: add Spider Cloud reliability integration section to README
  • fix: feature flag compilation issues across multiple feature combos
  • test(ignore): add test_crawl_subdomains to ignore
  • fix: feature flag compilation issues across wreq, agent_full, and cache combos
  • chore: bump to v2.44.20 for release
  • fix: broken chrome feature — missing relevant field and cfg gate on detect_cf_turnstyle
  • chore: bump to v2.44.19 for release
  • feat: integrate spider_skills crate for skill types and web challenge definitions
  • feat(agent): add URL-level relevance pre-filter for crawling, bump to v2.44.18
  • perf: fast-path trie protocol detection, remove redundant char boundary check
  • perf: optimize trie, robot parser, and URL preparation hot paths, bump to v2.44.17
  • feat(agent): add relevance gate for remote multimodal crawling
  • perf: optimize core pipeline hot paths, add benchmarks, bump to v2.44.15
  • chore: bump versions to 2.44.14
  • fix(agent): use CDP mouse methods for all click actions to maintain position tracking
  • docs: simplify root README intro
  • docs: add spider_cloud to root README, improve API response handling
  • chore: bump versions to 2.44.13
  • feat(spider): add spider_cloud integration and S3 skills loading
  • chore: bump versions to 2.44.12
  • feat(spider_agent): add dual-model routing and long-term experience memory
  • fix(spider): fix broken glob feature compilation and logic errors
  • chore: bump versions to 2.44.10
  • feat(spider_agent): improve skill triggers and board reading for web challenges
  • feat(spider_agent): improve skill strategies for real browser interaction
  • feat: sync skill content with improved efficiency directives
  • chore: bump versions to 2.44.9
  • feat(spider_agent): add Claude-optimized automation features
  • chore: bump versions to 2.44.8
  • feat(spider_agent): add concurrent page spawning with OpenPage action
  • fix(spider_agent): update system_prompt_compiled test for locked prompt
  • chore: bump versions to 2.44.7
  • feat(spider_agent): integrate llm_models_spider for auto-updated vision detection
  • feat(spider_agent): add inline vision model detection
  • chore: bump versions to 2.44.6
  • feat(spider): enable HTTP extraction without agent_chrome feature
  • feat(agent): enhance CAPTCHA handling and lock system prompt
  • feat(agent): improve web automation for CAPTCHAs and visual challenges
  • docs(readme): add spider_agent section and link
  • chore(examples): add remote_multimodal_multi example
  • feat(agent): consolidate automation into spider_agent with seamless feature integration
  • chore(example): add extraction example multi
  • chore(website): fix scrape subscription hang
  • chore(examples): add remote_multimodal
  • chore: bump all crates to v2.44.3 for version parity
  • fix(spider): keep spider_agent optional, add agent_webdriver feature
  • feat(spider): make spider_agent a required dependency
  • feat: granular usage tracking in AutomationUsage
  • chore(spider): bump to v2.43.23, update spider_agent to 0.6
  • feat(spider_agent): add usage limits, custom tools, and granular tracking
  • feat(automation): add api_calls tracking and HTTP extraction support
  • chore(spider): bump to v2.43.21, update spider_agent to 0.5
  • feat(spider_agent): add complete WebAutomation parity and PromptUrlGate
  • docs: expand v2 changelog with comprehensive feature documentation
  • docs: update changelog for v2.43.20 and spider_agent releases
  • chore(spider): update spider_agent dependency to 0.4
  • feat(spider_agent): add performance optimizations for automation
  • feat(spider_agent): add comprehensive automation module
  • chore(spider_agent): bump version to 0.2.0
  • feat(spider_agent): enhance memory with URL/action/extraction history
  • feat(spider_agent): add webdriver support via thirtyfour
  • feat(spider_agent): add chrome browser and temp storage support
  • feat(spider_agent): add usage tracking and multimodal example
  • fix(spider): fix doctest and update chromey for adblock compatibility
  • feat(spider_agent): add standalone multimodal agent crate
  • fix(search): use reqwest::Client directly for cache feature compatibility
  • chore(docs): fix rustdoc broken intra-doc links
  • feat(search): add search_and_extract and research methods
  • feat(search): add web search integration with multiple providers
  • perf(automation): byte-size-based smart cleaning for best performance
  • feat(cache): add skip browser mode + smart HTML cleaning
  • fix(automation): correct lol_html API usage and add intelligent screenshot detection
  • feat(automation): add Phase 3 agentic features - autonomous agent, action chaining, error recovery
  • feat(automation): add Phase 2 agentic features - selector cache, structured outputs, map API
  • feat(automation): add simplified agentic APIs - act(), observe(), extract()
  • feat(automation): add agentic memory for multi-round automation
  • feat(automation): add prompt-based website configuration
  • perf(website): remove unnecessary clones and allocations
    -...
Read more

v2.45.24

21 Feb 12:54

Choose a tag to compare

What's New

Performance

  • Cache-first fast path — skip browser/HTTP entirely when cache has data (~5-50ms vs 1-3s)
  • Deferred Chrome — process multi-page crawls from cache before launching a browser
  • Work-stealing (hedged requests) — parallel retry for slow crawl requests
  • io_uring — StreamingWriter for high-throughput file I/O on Linux

Agent

  • Per-round model pool routing — route cheap rounds to fast models, complex rounds to capable ones
  • Comprehensive router tests — 30+ multi-LLM reliability tests

Fixes

  • Chrome mode honors wait_for config (e.g. idle_network0) before HTML extraction
  • Smart mode lifecycle waiting matches Chrome coverage without re-fetch
  • Auto-retry www. URLs on SSL protocol errors
  • Reject empty HTML from all cache paths
  • Default no_sandbox() for headless Chrome (#354)

CLI

  • --wait-for flag for page load strategies (#352)

Housekeeping

  • Clippy warnings and formatting cleanup across workspace
  • Rewritten README, CONTRIBUTING.md, CHANGELOG backfill
  • Issue/PR templates, security policy, CI workflows (lint, audit, release)

v2.45.20

05 Feb 21:04

Choose a tag to compare

What's New

Relevance Gate for Remote Multimodal Crawling

Added a relevance_gate config that instructs the LLM to return a "relevant": true|false field in its JSON response. When a page is deemed irrelevant, its wildcard budget credit is refunded so the crawler discovers more relevant content.

New config fields:

  • relevance_gate: bool — enables the feature
  • relevance_prompt: Option<String> — optional custom relevance criteria

How it works:

  1. When enabled, the system prompt instructs the LLM to include "relevant": true|false
  2. If the model returns false, a budget credit is atomically accumulated
  3. Credits are drained in the crawl loop to restore the wildcard budget
  4. Default fallback is true (assume relevant) if the model omits the field

Example:

let cfgs = RemoteMultimodalConfigs::new(api_url, model)
    .with_relevance_gate(Some("Only pages about Rust programming".into()));

Full Changelog

  • feat(agent): add relevance_gate and relevance_prompt to RemoteMultimodalConfig
  • feat(agent): add atomic relevance_credits counter to RemoteMultimodalConfigs
  • feat(agent): add relevant: Option<bool> to AutomationResult and AutomationResults
  • feat(agent): extend system prompt and extraction with relevance gate instructions
  • feat(spider): add restore_wildcard_budget() for budget refund
  • feat(spider): drain relevance credits in crawl loop dequeue

v2.44.13

05 Feb 19:07

Choose a tag to compare

What's New

  • Spider Cloud integration (spider_cloud feature) — optional proxy rotation, anti-bot bypass, and intelligent fallback via spider.cloud
    • Modes: Proxy, Api, Unblocker, Fallback, Smart
    • Smart mode auto-detects Cloudflare challenges, CAPTCHAs, and bot protection then retries via /unblocker
  • S3 skills loading (skills_s3 feature) — load agent skills from S3-compatible storage (AWS, MinIO, R2)
  • CLI: --spider-cloud-key and --spider-cloud-mode flags

Crates

  • spider v2.44.13
  • spider_agent v2.44.13
  • spider_cli v2.44.13
  • spider_utils v2.44.13
  • spider_worker v2.44.13

spider v2.43.20

03 Feb 14:37

Choose a tag to compare

Spider v2.43.20

Changes

  • fix(spider): Fix doctest and update chromey for adblock compatibility
  • fix(search): Use reqwest::Client directly for cache feature compatibility
  • chore(spider): Update spider_agent dependency to 0.4

spider_agent Integration

The agent feature now uses spider_agent v0.4.0, which includes:

  • Smart caching with size-aware LRU eviction
  • High-performance chain execution with parallel step support
  • Batch processing for multiple items
  • Prefetch management for predictive page loading
  • Smart model routing based on task complexity

Full Changelog

v2.43.19...v2.43.20

spider_agent v0.4.0

03 Feb 14:34

Choose a tag to compare

Spider Agent v0.4.0

Performance Optimizations

This release adds several performance optimizations for automation workflows:

Smart Caching

  • SmartCache: Size-aware LRU cache with automatic cleanup
    • Bounded memory usage with configurable limits
    • TTL-based expiration
    • Automatic cleanup on memory pressure
    • Statistics tracking (hits, misses, evictions)

High-Performance Execution

  • ChainExecutor: Parallel step execution for automation chains

    • Analyzes dependencies for optimal parallelization
    • Response caching with TTL
    • Configurable concurrency limits
    • Step timeout support
  • BatchExecutor: Efficient batch processing

    • Process multiple items with configurable batch sizes
    • Parallel execution within batches
    • Index-aware processing option
  • PrefetchManager: Predictive page loading

    • Prefetch URLs in the background
    • Automatic cache management
    • Concurrent prefetch limits

Smart Model Routing

  • ModelRouter: Intelligent model selection based on task complexity
    • Task analysis for complexity scoring
    • User-configurable model policies
    • Cost tier constraints (Low/Medium/High)
    • Latency-aware routing

Other Changes

  • Added MessageContent helper methods: as_text(), full_text(), is_text(), has_images()
  • Default ModelPolicy now allows High tier routing
  • Fixed compilation warnings

Full Changelog

spider_agent-v0.3.0...spider_agent-v0.4.0

v2.43.18 - Web Search Integration

02 Feb 23:25

Choose a tag to compare

Features

Web Search Integration

Add web search capabilities to Spider's RemoteMultimodalEngine with support for multiple search providers.

Supported Providers

  • Serper (search_serper) - Google SERP API
  • Brave (search_brave) - Privacy-focused search
  • Bing (search_bing) - Microsoft Bing Web Search
  • Tavily (search_tavily) - AI-optimized search

New Methods

  • search() - Search the web and return structured results
  • search_and_extract() - Search + fetch pages + LLM extraction
  • research() - Search + extract + synthesize findings into summary

Setup

Cargo.toml

[dependencies]
spider = { version = "2.43.18", features = ["search_serper"] }

Configuration

use spider::configuration::{SearchConfig, SearchProviderType};
use spider::features::automation::RemoteMultimodalEngine;

let mut engine = RemoteMultimodalEngine::new(api_url, model, None);
engine.with_search_config(Some(
    SearchConfig::new(SearchProviderType::Serper, "your-api-key")
        // Optional: custom API endpoint
        .with_api_url("https://custom.api.com/search")
));

// Simple search
let results = engine.search("rust web crawler", None, None).await?;

// Search + extract
let data = engine.search_and_extract(
    "best rust frameworks",
    "Extract name and description",
    None,
    None,
).await?;

// Research with synthesis
use spider::features::automation::ResearchOptions;
let research = engine.research(
    "How do async runtimes work?",
    ResearchOptions::new().with_max_pages(5).with_synthesis(),
    None,
).await?;
println!("Summary: {}", research.summary.unwrap());

Custom API Endpoints

All providers support custom API URLs for self-hosted or alternative endpoints:

SearchConfig::new(SearchProviderType::Brave, "api-key")
    .with_api_url("https://my-brave-proxy.example.com/search")

Full Changelog

v2.43.17...v2.43.18

v2.43.13 - Advanced Agentic Automation

02 Feb 19:29

Choose a tag to compare

🤖 Advanced Agentic Automation Features

This release adds comprehensive agentic automation capabilities to spider, making it a powerful tool for autonomous web interactions.

Phase 1: Simplified Agentic APIs

  • act(page, instruction) - Execute single actions with natural language
  • observe(page) - Analyze page state and get structured observations
  • extract_page(page, prompt, schema) - Extract structured data from pages
  • AutomationMemory - In-memory state management for multi-round automation
  • run_with_memory() - Stateful automation with persistent context

Phase 2: Self-Healing & Discovery

  • SelectorCache - Self-healing selector cache with LRU eviction
  • act_cached(page, instruction, cache) - Actions with automatic selector caching
  • StructuredOutputConfig - Native JSON schema enforcement for reliable outputs
  • extract_structured(page, prompt, config) - Schema-validated data extraction
  • map(page, prompt) - AI-powered URL discovery and categorization
  • MapResult / DiscoveredUrl - Relevance-scored URL discovery

Phase 3: Autonomous Agent Execution

  • execute(page, config) - Full autonomous goal-oriented execution
  • agent(page, goal) - Simple goal execution with defaults
  • agent_extract(page, goal, prompt) - Goal execution with data extraction
  • chain(page, steps) - Sequential action composition with conditions
  • AgentConfig - Comprehensive agent configuration (max_steps, timeout, recovery, etc.)
  • RecoveryStrategy - Error handling strategies (Retry, Alternative, Skip, Abort)
  • ChainStep / ChainCondition - Conditional action execution
  • AgentEvent - Real-time progress tracking events
  • AgentResult / ChainResult - Detailed execution results with history

Example Usage

// Autonomous agent
let config = AgentConfig::new("Find and add the cheapest laptop to cart")
    .with_max_steps(30)
    .with_success_url("/cart")
    .with_extraction("Extract cart total");

let result = engine.execute(&page, config).await?;

// Action chaining
let steps = vec![
    ChainStep::new("click Login"),
    ChainStep::new("type email").when(ChainCondition::ElementExists("#email")),
    ChainStep::new("click Submit").then_extract("Extract any errors"),
];
let result = engine.chain(&page, steps).await?;

// Self-healing cache
let mut cache = SelectorCache::new();
engine.act_cached(&page, "click submit", &mut cache).await?;

Full Changelog

  • feat(automation): add Phase 3 agentic features - autonomous agent, action chaining, error recovery
  • feat(automation): add Phase 2 agentic features - selector cache, structured outputs, map API
  • feat(automation): add simplified agentic APIs - act(), observe(), extract()
  • feat(automation): add agentic memory for multi-round automation

v2.43.3

02 Feb 02:19

Choose a tag to compare

Bug Fix

  • fix(automation): Improve best_effort_parse_json_object parsing to handle LLM responses with reasoning text before JSON code blocks
    • Find ```json blocks anywhere in response (not just at boundaries)
    • Support JSON arrays in addition to objects
    • Better fallback parsing for various LLM response formats

Full Changelog: v2.43.2...v2.43.3