Releases: spider-rs/spider
v2.45.28
Agent Hardening
- Cap LLM-controlled durations (Wait, ClickHold, SetViewport, OpenPage)
- Add
js_escape()for safe JS string interpolation in action handlers - Wrap
Navigateand screenshot calls with timeouts - Use
PageWaitStrategy::LoadforWaitForNavigationinstead of fixed sleep - Replace
eval_with_timeoutfor Fill/Type/Clear actions with error propagation - Improve semaphore and logging diagnostics on error paths
v2.45.27
Changes since vv1.7.5
- feat(agent): eval timeouts, spawn concurrency limiter, PageWaitStrategy, panic hardening
- feat(agent): action feedback, partial result preservation, and JS error capture
- style: fix rustfmt formatting in smart_vs_chrome tests
- chore(release): 2.45.25
- fix: use final redirect URL as base for relative link resolution (#364)
- chore: ignore transitive audit advisories with no direct fix
- chore: ignore RUSTSEC-2025-0134 (rustls-pemfile via warp)
- chore: ignore RUSTSEC-2026-0002 (lru 0.13.0 via wreq)
- docs: backfill CHANGELOG v2.45.1-v2.45.22 and add .editorconfig
- chore: remove cargo dependabot, keep github-actions only
- chore: fix clippy warnings and formatting across workspace
- chore(deps): bump flexbuffers 2->25, async-openai 0.32->0.33, clap 4.5.59->4.5.60, actions
- docs: rewrite README, add issue/PR templates, security policy, CI improvements
- docs: add Contributor Covenant v2.1 Code of Conduct
- docs: rewrite CONTRIBUTING.md and add zero-config quick start to README
- fix(cache): proper HTTP staleness for Chrome-cached pages
- fix(cache): Period policy bypasses HTTP is_stale for Chrome-rendered pages
- chore: bump all workspace crates to v2.45.22
- fix(cache): enable cache_chrome_hybrid_mem feature for Chrome cache writes
- chore: bump all crates to 2.45.21, chromey 2.38.4
- chore(agent): fix model router import
- docs(spider_agent): add SKILLS.md for engine skill integration reference
- chore(chrome): 2.45.19 root sandbox default launch
- fix: add no_sandbox() to headless BrowserConfig (#354)
- test: comprehensive crawler-test.com integration suite (302 tests, 408 URLs)
- feat: per-round model pool routing for cost-optimized automation + bump 2.45.18
- perf: skip model scoring for pools ≤ 2 models + bump 2.45.17
- test: comprehensive multi-LLM router reliability tests + bump 2.45.16
- release(cli): 2.45.15 wait_for
- feat(cli): add wait-for capabilities to spider_cli (#352)
- fix: Chrome mode honors wait_for config for networkIdle before HTML extraction
- fix: smart mode lifecycle waiting — match Chrome coverage without re-fetch
- fix: auto-retry www. URLs on SSL protocol errors — strip www prefix
- fix: reject empty HTML from all cache and seeded resource paths
- fix: invalidate empty HTML shell responses on cache read path
- feat: deferred Chrome — cache-only crawl phase before browser launch
- feat: cache-first fast path — skip browser/HTTP when cache has data
- fix: add Website::with_hedge() forwarding method + bump to v2.45.7
- feat: add work-stealing (hedged requests) for slow crawl requests
- fix: skip HTML-specific heuristics in cache empty check for non-HTML content
- chore: bump all workspace crates to v2.45.4
- fix: skip caching empty/near-empty HTML responses
- chore: bump all workspace crates to v2.45.3
- chore: bump spider to v2.45.3
- fix(uring_fs): fix compile errors when io_uring feature is enabled
- perf(io_uring): add StreamingWriter for streaming file I/O, bump v2.45.2
- feat(io_uring): expand io_uring integration for file I/O, bump v2.45.1
- refactor(agent): extract types and HTML cleaning into spider_agent_types and spider_agent_html crates
- chore(deps): major version bumps, bump v2.45.0
- chore: bump v2.44.42
- perf(website): use take() instead of clone() for subdomain base URL
- fix(website): add tests for subdomain relative URL resolution, bump v2.44.41
- fix(website): use page's own URL for relative link resolution on subdomains
- chore: clarify network event collection comment (#350)
- feat(agent): integrate llm_models_spider v0.1.9 with smart model selection
- docs: add missing feature flags, Spider Cloud example, and integration docs
- refactor(agent): move all skill content to spider_skills crate
- feat(agent): add L15-L48 skills, NEST/CIRCLE engine loops, haiku benchmark
- fix: add should_use_chrome_ai to stub RemoteMultimodalConfigs
- feat(agent): add L8-L14 skills, WAM engine loop, and Chrome AI improvements
- chore(agent): add improved steps
- test(website): fix rotation cases
- feat(agent): optimize text-only rounds and bump workspace crates
- chore: fix lint warnings and feature-combo build regressions
- fix: improve agent concurrency and ensure all-feature builds
- feat(cache): optimize automation caching and bump crates to 2.44.33
- Add crawl_smart cache parity test and bump crates to 2.44.32
- Improve cache skip-browser flows and add remote cache e2e example
- feat: add spider cloud end-to-end examples and bump crates to 2.44.30
- Update Cargo.lock for 2.44.29
- Improve remote multimodal automation reliability and release 2.44.29
- Expose optional automation reasoning in metadata and bump 2.44.28
- release: bump workspace crates to 2.44.27
- feat(spider_cli): add runtime http/headless mode controls and release 2.44.26
- chore(client): add fallback agent declaring
- docs: update dual-model example with per-endpoint URL and API key configuration
- feat: dual-model routing examples and comprehensive tests
- feat: extraction-only mode optimization for single-round data extraction
- docs: add Spider Cloud reliability integration section to README
- fix: feature flag compilation issues across multiple feature combos
- test(ignore): add test_crawl_subdomains to ignore
- fix: feature flag compilation issues across wreq, agent_full, and cache combos
- chore: bump to v2.44.20 for release
- fix: broken chrome feature — missing
relevantfield and cfg gate on detect_cf_turnstyle - chore: bump to v2.44.19 for release
- feat: integrate spider_skills crate for skill types and web challenge definitions
- feat(agent): add URL-level relevance pre-filter for crawling, bump to v2.44.18
- perf: fast-path trie protocol detection, remove redundant char boundary check
- perf: optimize trie, robot parser, and URL preparation hot paths, bump to v2.44.17
- feat(agent): add relevance gate for remote multimodal crawling
- perf: optimize core pipeline hot paths, add benchmarks, bump to v2.44.15
- chore: bump versions to 2.44.14
- fix(agent): use CDP mouse methods for all click actions to maintain position tracking
- docs: simplify root README intro
- docs: add spider_cloud to root README, improve API response handling
- chore: bump versions to 2.44.13
- feat(spider): add spider_cloud integration and S3 skills loading
- chore: bump versions to 2.44.12
- feat(spider_agent): add dual-model routing and long-term experience memory
- fix(spider): fix broken glob feature compilation and logic errors
- chore: bump versions to 2.44.10
- feat(spider_agent): improve skill triggers and board reading for web challenges
- feat(spider_agent): improve skill strategies for real browser interaction
- feat: sync skill content with improved efficiency directives
- chore: bump versions to 2.44.9
- feat(spider_agent): add Claude-optimized automation features
- chore: bump versions to 2.44.8
- feat(spider_agent): add concurrent page spawning with OpenPage action
- fix(spider_agent): update system_prompt_compiled test for locked prompt
- chore: bump versions to 2.44.7
- feat(spider_agent): integrate llm_models_spider for auto-updated vision detection
- feat(spider_agent): add inline vision model detection
- chore: bump versions to 2.44.6
- feat(spider): enable HTTP extraction without agent_chrome feature
- feat(agent): enhance CAPTCHA handling and lock system prompt
- feat(agent): improve web automation for CAPTCHAs and visual challenges
- docs(readme): add spider_agent section and link
- chore(examples): add remote_multimodal_multi example
- feat(agent): consolidate automation into spider_agent with seamless feature integration
- chore(example): add extraction example multi
- chore(website): fix scrape subscription hang
- chore(examples): add remote_multimodal
- chore: bump all crates to v2.44.3 for version parity
- fix(spider): keep spider_agent optional, add agent_webdriver feature
- feat(spider): make spider_agent a required dependency
- feat: granular usage tracking in AutomationUsage
- chore(spider): bump to v2.43.23, update spider_agent to 0.6
- feat(spider_agent): add usage limits, custom tools, and granular tracking
- feat(automation): add api_calls tracking and HTTP extraction support
- chore(spider): bump to v2.43.21, update spider_agent to 0.5
- feat(spider_agent): add complete WebAutomation parity and PromptUrlGate
- docs: expand v2 changelog with comprehensive feature documentation
- docs: update changelog for v2.43.20 and spider_agent releases
- chore(spider): update spider_agent dependency to 0.4
- feat(spider_agent): add performance optimizations for automation
- feat(spider_agent): add comprehensive automation module
- chore(spider_agent): bump version to 0.2.0
- feat(spider_agent): enhance memory with URL/action/extraction history
- feat(spider_agent): add webdriver support via thirtyfour
- feat(spider_agent): add chrome browser and temp storage support
- feat(spider_agent): add usage tracking and multimodal example
- fix(spider): fix doctest and update chromey for adblock compatibility
- feat(spider_agent): add standalone multimodal agent crate
- fix(search): use reqwest::Client directly for cache feature compatibility
- chore(docs): fix rustdoc broken intra-doc links
- feat(search): add search_and_extract and research methods
- feat(search): add web search integration with multiple providers
- perf(automation): byte-size-based smart cleaning for best performance
- feat(cache): add skip browser mode + smart HTML cleaning
- fix(automation): correct lol_html API usage and add intelligent screenshot detection
- feat(automation): add Phase 3 agentic features - autonomous agent, action chaining, error recovery
- feat(automation): add Phase 2 agentic features - selector cache, structured outputs, map API
- feat(automation): add simplified agentic APIs - act(), observe(), extract()
- feat(automation): add agentic memory for multi-round automation
- feat(automation): add prompt-based website configuration
- perf(website): remove unnecessary clones and allocations
-...
v2.45.24
What's New
Performance
- Cache-first fast path — skip browser/HTTP entirely when cache has data (~5-50ms vs 1-3s)
- Deferred Chrome — process multi-page crawls from cache before launching a browser
- Work-stealing (hedged requests) — parallel retry for slow crawl requests
- io_uring — StreamingWriter for high-throughput file I/O on Linux
Agent
- Per-round model pool routing — route cheap rounds to fast models, complex rounds to capable ones
- Comprehensive router tests — 30+ multi-LLM reliability tests
Fixes
- Chrome mode honors
wait_forconfig (e.g.idle_network0) before HTML extraction - Smart mode lifecycle waiting matches Chrome coverage without re-fetch
- Auto-retry www. URLs on SSL protocol errors
- Reject empty HTML from all cache paths
- Default
no_sandbox()for headless Chrome (#354)
CLI
--wait-forflag for page load strategies (#352)
Housekeeping
- Clippy warnings and formatting cleanup across workspace
- Rewritten README, CONTRIBUTING.md, CHANGELOG backfill
- Issue/PR templates, security policy, CI workflows (lint, audit, release)
v2.45.20
What's New
Relevance Gate for Remote Multimodal Crawling
Added a relevance_gate config that instructs the LLM to return a "relevant": true|false field in its JSON response. When a page is deemed irrelevant, its wildcard budget credit is refunded so the crawler discovers more relevant content.
New config fields:
relevance_gate: bool— enables the featurerelevance_prompt: Option<String>— optional custom relevance criteria
How it works:
- When enabled, the system prompt instructs the LLM to include
"relevant": true|false - If the model returns
false, a budget credit is atomically accumulated - Credits are drained in the crawl loop to restore the wildcard budget
- Default fallback is
true(assume relevant) if the model omits the field
Example:
let cfgs = RemoteMultimodalConfigs::new(api_url, model)
.with_relevance_gate(Some("Only pages about Rust programming".into()));Full Changelog
- feat(agent): add
relevance_gateandrelevance_prompttoRemoteMultimodalConfig - feat(agent): add atomic
relevance_creditscounter toRemoteMultimodalConfigs - feat(agent): add
relevant: Option<bool>toAutomationResultandAutomationResults - feat(agent): extend system prompt and extraction with relevance gate instructions
- feat(spider): add
restore_wildcard_budget()for budget refund - feat(spider): drain relevance credits in crawl loop dequeue
v2.44.13
What's New
- Spider Cloud integration (
spider_cloudfeature) — optional proxy rotation, anti-bot bypass, and intelligent fallback via spider.cloud- Modes: Proxy, Api, Unblocker, Fallback, Smart
- Smart mode auto-detects Cloudflare challenges, CAPTCHAs, and bot protection then retries via
/unblocker
- S3 skills loading (
skills_s3feature) — load agent skills from S3-compatible storage (AWS, MinIO, R2) - CLI:
--spider-cloud-keyand--spider-cloud-modeflags
Crates
spiderv2.44.13spider_agentv2.44.13spider_cliv2.44.13spider_utilsv2.44.13spider_workerv2.44.13
spider v2.43.20
Spider v2.43.20
Changes
- fix(spider): Fix doctest and update chromey for adblock compatibility
- fix(search): Use reqwest::Client directly for cache feature compatibility
- chore(spider): Update spider_agent dependency to 0.4
spider_agent Integration
The agent feature now uses spider_agent v0.4.0, which includes:
- Smart caching with size-aware LRU eviction
- High-performance chain execution with parallel step support
- Batch processing for multiple items
- Prefetch management for predictive page loading
- Smart model routing based on task complexity
Full Changelog
spider_agent v0.4.0
Spider Agent v0.4.0
Performance Optimizations
This release adds several performance optimizations for automation workflows:
Smart Caching
- SmartCache: Size-aware LRU cache with automatic cleanup
- Bounded memory usage with configurable limits
- TTL-based expiration
- Automatic cleanup on memory pressure
- Statistics tracking (hits, misses, evictions)
High-Performance Execution
-
ChainExecutor: Parallel step execution for automation chains
- Analyzes dependencies for optimal parallelization
- Response caching with TTL
- Configurable concurrency limits
- Step timeout support
-
BatchExecutor: Efficient batch processing
- Process multiple items with configurable batch sizes
- Parallel execution within batches
- Index-aware processing option
-
PrefetchManager: Predictive page loading
- Prefetch URLs in the background
- Automatic cache management
- Concurrent prefetch limits
Smart Model Routing
- ModelRouter: Intelligent model selection based on task complexity
- Task analysis for complexity scoring
- User-configurable model policies
- Cost tier constraints (Low/Medium/High)
- Latency-aware routing
Other Changes
- Added
MessageContenthelper methods:as_text(),full_text(),is_text(),has_images() - Default
ModelPolicynow allows High tier routing - Fixed compilation warnings
Full Changelog
v2.43.18 - Web Search Integration
Features
Web Search Integration
Add web search capabilities to Spider's RemoteMultimodalEngine with support for multiple search providers.
Supported Providers
- Serper (
search_serper) - Google SERP API - Brave (
search_brave) - Privacy-focused search - Bing (
search_bing) - Microsoft Bing Web Search - Tavily (
search_tavily) - AI-optimized search
New Methods
search()- Search the web and return structured resultssearch_and_extract()- Search + fetch pages + LLM extractionresearch()- Search + extract + synthesize findings into summary
Setup
Cargo.toml
[dependencies]
spider = { version = "2.43.18", features = ["search_serper"] }Configuration
use spider::configuration::{SearchConfig, SearchProviderType};
use spider::features::automation::RemoteMultimodalEngine;
let mut engine = RemoteMultimodalEngine::new(api_url, model, None);
engine.with_search_config(Some(
SearchConfig::new(SearchProviderType::Serper, "your-api-key")
// Optional: custom API endpoint
.with_api_url("https://custom.api.com/search")
));
// Simple search
let results = engine.search("rust web crawler", None, None).await?;
// Search + extract
let data = engine.search_and_extract(
"best rust frameworks",
"Extract name and description",
None,
None,
).await?;
// Research with synthesis
use spider::features::automation::ResearchOptions;
let research = engine.research(
"How do async runtimes work?",
ResearchOptions::new().with_max_pages(5).with_synthesis(),
None,
).await?;
println!("Summary: {}", research.summary.unwrap());Custom API Endpoints
All providers support custom API URLs for self-hosted or alternative endpoints:
SearchConfig::new(SearchProviderType::Brave, "api-key")
.with_api_url("https://my-brave-proxy.example.com/search")Full Changelog
v2.43.13 - Advanced Agentic Automation
🤖 Advanced Agentic Automation Features
This release adds comprehensive agentic automation capabilities to spider, making it a powerful tool for autonomous web interactions.
Phase 1: Simplified Agentic APIs
act(page, instruction)- Execute single actions with natural languageobserve(page)- Analyze page state and get structured observationsextract_page(page, prompt, schema)- Extract structured data from pagesAutomationMemory- In-memory state management for multi-round automationrun_with_memory()- Stateful automation with persistent context
Phase 2: Self-Healing & Discovery
SelectorCache- Self-healing selector cache with LRU evictionact_cached(page, instruction, cache)- Actions with automatic selector cachingStructuredOutputConfig- Native JSON schema enforcement for reliable outputsextract_structured(page, prompt, config)- Schema-validated data extractionmap(page, prompt)- AI-powered URL discovery and categorizationMapResult/DiscoveredUrl- Relevance-scored URL discovery
Phase 3: Autonomous Agent Execution
execute(page, config)- Full autonomous goal-oriented executionagent(page, goal)- Simple goal execution with defaultsagent_extract(page, goal, prompt)- Goal execution with data extractionchain(page, steps)- Sequential action composition with conditionsAgentConfig- Comprehensive agent configuration (max_steps, timeout, recovery, etc.)RecoveryStrategy- Error handling strategies (Retry, Alternative, Skip, Abort)ChainStep/ChainCondition- Conditional action executionAgentEvent- Real-time progress tracking eventsAgentResult/ChainResult- Detailed execution results with history
Example Usage
// Autonomous agent
let config = AgentConfig::new("Find and add the cheapest laptop to cart")
.with_max_steps(30)
.with_success_url("/cart")
.with_extraction("Extract cart total");
let result = engine.execute(&page, config).await?;
// Action chaining
let steps = vec![
ChainStep::new("click Login"),
ChainStep::new("type email").when(ChainCondition::ElementExists("#email")),
ChainStep::new("click Submit").then_extract("Extract any errors"),
];
let result = engine.chain(&page, steps).await?;
// Self-healing cache
let mut cache = SelectorCache::new();
engine.act_cached(&page, "click submit", &mut cache).await?;Full Changelog
- feat(automation): add Phase 3 agentic features - autonomous agent, action chaining, error recovery
- feat(automation): add Phase 2 agentic features - selector cache, structured outputs, map API
- feat(automation): add simplified agentic APIs - act(), observe(), extract()
- feat(automation): add agentic memory for multi-round automation
v2.43.3
Bug Fix
- fix(automation): Improve
best_effort_parse_json_objectparsing to handle LLM responses with reasoning text before JSON code blocks- Find ```json blocks anywhere in response (not just at boundaries)
- Support JSON arrays in addition to objects
- Better fallback parsing for various LLM response formats
Full Changelog: v2.43.2...v2.43.3