Releases · spider-rs/spider

02 Mar 15:28

j-mendez

v2.45.28

948b72e

v2.45.28 Latest

Latest

Agent Hardening

Cap LLM-controlled durations (Wait, ClickHold, SetViewport, OpenPage)
Add js_escape() for safe JS string interpolation in action handlers
Wrap Navigate and screenshot calls with timeouts
Use PageWaitStrategy::Load for WaitForNavigation instead of fixed sleep
Replace eval_with_timeout for Fill/Type/Clear actions with error propagation
Improve semaphore and logging diagnostics on error paths

Assets 2

02 Mar 14:57

github-actions

v2.45.27

e88e2b9

v2.45.27

Changes since vv1.7.5

feat(agent): eval timeouts, spawn concurrency limiter, PageWaitStrategy, panic hardening
feat(agent): action feedback, partial result preservation, and JS error capture
style: fix rustfmt formatting in smart_vs_chrome tests
chore(release): 2.45.25
fix: use final redirect URL as base for relative link resolution (#364)
chore: ignore transitive audit advisories with no direct fix
chore: ignore RUSTSEC-2025-0134 (rustls-pemfile via warp)
chore: ignore RUSTSEC-2026-0002 (lru 0.13.0 via wreq)
docs: backfill CHANGELOG v2.45.1-v2.45.22 and add .editorconfig
chore: remove cargo dependabot, keep github-actions only
chore: fix clippy warnings and formatting across workspace
chore(deps): bump flexbuffers 2->25, async-openai 0.32->0.33, clap 4.5.59->4.5.60, actions
docs: rewrite README, add issue/PR templates, security policy, CI improvements
docs: add Contributor Covenant v2.1 Code of Conduct
docs: rewrite CONTRIBUTING.md and add zero-config quick start to README
fix(cache): proper HTTP staleness for Chrome-cached pages
fix(cache): Period policy bypasses HTTP is_stale for Chrome-rendered pages
chore: bump all workspace crates to v2.45.22
fix(cache): enable cache_chrome_hybrid_mem feature for Chrome cache writes
chore: bump all crates to 2.45.21, chromey 2.38.4
chore(agent): fix model router import
docs(spider_agent): add SKILLS.md for engine skill integration reference
chore(chrome): 2.45.19 root sandbox default launch
fix: add no_sandbox() to headless BrowserConfig (#354)
test: comprehensive crawler-test.com integration suite (302 tests, 408 URLs)
feat: per-round model pool routing for cost-optimized automation + bump 2.45.18
perf: skip model scoring for pools ≤ 2 models + bump 2.45.17
test: comprehensive multi-LLM router reliability tests + bump 2.45.16
release(cli): 2.45.15 wait_for
feat(cli): add wait-for capabilities to spider_cli (#352)
fix: Chrome mode honors wait_for config for networkIdle before HTML extraction
fix: smart mode lifecycle waiting — match Chrome coverage without re-fetch
fix: auto-retry www. URLs on SSL protocol errors — strip www prefix
fix: reject empty HTML from all cache and seeded resource paths
fix: invalidate empty HTML shell responses on cache read path
feat: deferred Chrome — cache-only crawl phase before browser launch
feat: cache-first fast path — skip browser/HTTP when cache has data
fix: add Website::with_hedge() forwarding method + bump to v2.45.7
feat: add work-stealing (hedged requests) for slow crawl requests
fix: skip HTML-specific heuristics in cache empty check for non-HTML content
chore: bump all workspace crates to v2.45.4
fix: skip caching empty/near-empty HTML responses
chore: bump all workspace crates to v2.45.3
chore: bump spider to v2.45.3
fix(uring_fs): fix compile errors when io_uring feature is enabled
perf(io_uring): add StreamingWriter for streaming file I/O, bump v2.45.2
feat(io_uring): expand io_uring integration for file I/O, bump v2.45.1
refactor(agent): extract types and HTML cleaning into spider_agent_types and spider_agent_html crates
chore(deps): major version bumps, bump v2.45.0
chore: bump v2.44.42
perf(website): use take() instead of clone() for subdomain base URL
fix(website): add tests for subdomain relative URL resolution, bump v2.44.41
fix(website): use page's own URL for relative link resolution on subdomains
chore: clarify network event collection comment (#350)
feat(agent): integrate llm_models_spider v0.1.9 with smart model selection
docs: add missing feature flags, Spider Cloud example, and integration docs
refactor(agent): move all skill content to spider_skills crate
feat(agent): add L15-L48 skills, NEST/CIRCLE engine loops, haiku benchmark
fix: add should_use_chrome_ai to stub RemoteMultimodalConfigs
feat(agent): add L8-L14 skills, WAM engine loop, and Chrome AI improvements
chore(agent): add improved steps
test(website): fix rotation cases
feat(agent): optimize text-only rounds and bump workspace crates
chore: fix lint warnings and feature-combo build regressions
fix: improve agent concurrency and ensure all-feature builds
feat(cache): optimize automation caching and bump crates to 2.44.33
Add crawl_smart cache parity test and bump crates to 2.44.32
Improve cache skip-browser flows and add remote cache e2e example
feat: add spider cloud end-to-end examples and bump crates to 2.44.30
Update Cargo.lock for 2.44.29
Improve remote multimodal automation reliability and release 2.44.29
Expose optional automation reasoning in metadata and bump 2.44.28
release: bump workspace crates to 2.44.27
feat(spider_cli): add runtime http/headless mode controls and release 2.44.26
chore(client): add fallback agent declaring
docs: update dual-model example with per-endpoint URL and API key configuration
feat: dual-model routing examples and comprehensive tests
feat: extraction-only mode optimization for single-round data extraction
docs: add Spider Cloud reliability integration section to README
fix: feature flag compilation issues across multiple feature combos
test(ignore): add test_crawl_subdomains to ignore
fix: feature flag compilation issues across wreq, agent_full, and cache combos
chore: bump to v2.44.20 for release
fix: broken chrome feature — missing relevant field and cfg gate on detect_cf_turnstyle
chore: bump to v2.44.19 for release
feat: integrate spider_skills crate for skill types and web challenge definitions
feat(agent): add URL-level relevance pre-filter for crawling, bump to v2.44.18
perf: fast-path trie protocol detection, remove redundant char boundary check
perf: optimize trie, robot parser, and URL preparation hot paths, bump to v2.44.17
feat(agent): add relevance gate for remote multimodal crawling
perf: optimize core pipeline hot paths, add benchmarks, bump to v2.44.15
chore: bump versions to 2.44.14
fix(agent): use CDP mouse methods for all click actions to maintain position tracking
docs: simplify root README intro
docs: add spider_cloud to root README, improve API response handling
chore: bump versions to 2.44.13
feat(spider): add spider_cloud integration and S3 skills loading
chore: bump versions to 2.44.12
feat(spider_agent): add dual-model routing and long-term experience memory
fix(spider): fix broken glob feature compilation and logic errors
chore: bump versions to 2.44.10
feat(spider_agent): improve skill triggers and board reading for web challenges
feat(spider_agent): improve skill strategies for real browser interaction
feat: sync skill content with improved efficiency directives
chore: bump versions to 2.44.9
feat(spider_agent): add Claude-optimized automation features
chore: bump versions to 2.44.8
feat(spider_agent): add concurrent page spawning with OpenPage action
fix(spider_agent): update system_prompt_compiled test for locked prompt
chore: bump versions to 2.44.7
feat(spider_agent): integrate llm_models_spider for auto-updated vision detection
feat(spider_agent): add inline vision model detection
chore: bump versions to 2.44.6
feat(spider): enable HTTP extraction without agent_chrome feature
feat(agent): enhance CAPTCHA handling and lock system prompt
feat(agent): improve web automation for CAPTCHAs and visual challenges
docs(readme): add spider_agent section and link
chore(examples): add remote_multimodal_multi example
feat(agent): consolidate automation into spider_agent with seamless feature integration
chore(example): add extraction example multi
chore(website): fix scrape subscription hang
chore(examples): add remote_multimodal
chore: bump all crates to v2.44.3 for version parity
fix(spider): keep spider_agent optional, add agent_webdriver feature
feat(spider): make spider_agent a required dependency
feat: granular usage tracking in AutomationUsage
chore(spider): bump to v2.43.23, update spider_agent to 0.6
feat(spider_agent): add usage limits, custom tools, and granular tracking
feat(automation): add api_calls tracking and HTTP extraction support
chore(spider): bump to v2.43.21, update spider_agent to 0.5
feat(spider_agent): add complete WebAutomation parity and PromptUrlGate
docs: expand v2 changelog with comprehensive feature documentation
docs: update changelog for v2.43.20 and spider_agent releases
chore(spider): update spider_agent dependency to 0.4
feat(spider_agent): add performance optimizations for automation
feat(spider_agent): add comprehensive automation module
chore(spider_agent): bump version to 0.2.0
feat(spider_agent): enhance memory with URL/action/extraction history
feat(spider_agent): add webdriver support via thirtyfour
feat(spider_agent): add chrome browser and temp storage support
feat(spider_agent): add usage tracking and multimodal example
fix(spider): fix doctest and update chromey for adblock compatibility
feat(spider_agent): add standalone multimodal agent crate
fix(search): use reqwest::Client directly for cache feature compatibility
chore(docs): fix rustdoc broken intra-doc links
feat(search): add search_and_extract and research methods
feat(search): add web search integration with multiple providers
perf(automation): byte-size-based smart cleaning for best performance
feat(cache): add skip browser mode + smart HTML cleaning
fix(automation): correct lol_html API usage and add intelligent screenshot detection
feat(automation): add Phase 3 agentic features - autonomous agent, action chaining, error recovery
feat(automation): add Phase 2 agentic features - selector cache, structured outputs, map API
feat(automation): add simplified agentic APIs - act(), observe(), extract()
feat(automation): add agentic memory for multi-round automation
feat(automation): add prompt-based website configuration
perf(website): remove unnecessary clones and allocations
-...

Contributors

azudev4

Assets 2

21 Feb 12:54

github-actions

v2.45.24

ac0e0f1

v2.45.24

What's New

Performance

Cache-first fast path — skip browser/HTTP entirely when cache has data (~5-50ms vs 1-3s)
Deferred Chrome — process multi-page crawls from cache before launching a browser
Work-stealing (hedged requests) — parallel retry for slow crawl requests
io_uring — StreamingWriter for high-throughput file I/O on Linux

Agent

Per-round model pool routing — route cheap rounds to fast models, complex rounds to capable ones
Comprehensive router tests — 30+ multi-LLM reliability tests

Fixes

Chrome mode honors wait_for config (e.g. idle_network0) before HTML extraction
Smart mode lifecycle waiting matches Chrome coverage without re-fetch
Auto-retry www. URLs on SSL protocol errors
Reject empty HTML from all cache paths
Default no_sandbox() for headless Chrome (#354)

CLI

--wait-for flag for page load strategies (#352)

Housekeeping

Clippy warnings and formatting cleanup across workspace
Rewritten README, CONTRIBUTING.md, CHANGELOG backfill
Issue/PR templates, security policy, CI workflows (lint, audit, release)

Assets 2

05 Feb 21:04

j-mendez

v2.45.20

65a32f1

v2.45.20

What's New

Relevance Gate for Remote Multimodal Crawling

Added a relevance_gate config that instructs the LLM to return a "relevant": true|false field in its JSON response. When a page is deemed irrelevant, its wildcard budget credit is refunded so the crawler discovers more relevant content.

New config fields:

relevance_gate: bool — enables the feature
relevance_prompt: Option<String> — optional custom relevance criteria

How it works:

When enabled, the system prompt instructs the LLM to include "relevant": true|false
If the model returns false, a budget credit is atomically accumulated
Credits are drained in the crawl loop to restore the wildcard budget
Default fallback is true (assume relevant) if the model omits the field

Example:

let cfgs = RemoteMultimodalConfigs::new(api_url, model)
    .with_relevance_gate(Some("Only pages about Rust programming".into()));

Full Changelog

feat(agent): add relevance_gate and relevance_prompt to RemoteMultimodalConfig
feat(agent): add atomic relevance_credits counter to RemoteMultimodalConfigs
feat(agent): add relevant: Option<bool> to AutomationResult and AutomationResults
feat(agent): extend system prompt and extraction with relevance gate instructions
feat(spider): add restore_wildcard_budget() for budget refund
feat(spider): drain relevance credits in crawl loop dequeue

Assets 2

05 Feb 19:07

j-mendez

v2.44.13

ea0d2dc

v2.44.13

What's New

Spider Cloud integration (spider_cloud feature) — optional proxy rotation, anti-bot bypass, and intelligent fallback via spider.cloud
- Modes: Proxy, Api, Unblocker, Fallback, Smart
- Smart mode auto-detects Cloudflare challenges, CAPTCHAs, and bot protection then retries via /unblocker
S3 skills loading (skills_s3 feature) — load agent skills from S3-compatible storage (AWS, MinIO, R2)
CLI: --spider-cloud-key and --spider-cloud-mode flags

Crates

spider v2.44.13
spider_agent v2.44.13
spider_cli v2.44.13
spider_utils v2.44.13
spider_worker v2.44.13

Assets 2

03 Feb 14:37

j-mendez

v2.43.20

a7a009a

spider v2.43.20

Spider v2.43.20

Changes

fix(spider): Fix doctest and update chromey for adblock compatibility
fix(search): Use reqwest::Client directly for cache feature compatibility
chore(spider): Update spider_agent dependency to 0.4

spider_agent Integration

The agent feature now uses spider_agent v0.4.0, which includes:

Smart caching with size-aware LRU eviction
High-performance chain execution with parallel step support
Batch processing for multiple items
Prefetch management for predictive page loading
Smart model routing based on task complexity

Full Changelog

v2.43.19...v2.43.20

Assets 2

03 Feb 14:34

j-mendez

spider_agent-v0.4.0

f1f28cc

spider_agent v0.4.0

Spider Agent v0.4.0

Performance Optimizations

This release adds several performance optimizations for automation workflows:

Smart Caching

SmartCache: Size-aware LRU cache with automatic cleanup
- Bounded memory usage with configurable limits
- TTL-based expiration
- Automatic cleanup on memory pressure
- Statistics tracking (hits, misses, evictions)

High-Performance Execution

ChainExecutor: Parallel step execution for automation chains
- Analyzes dependencies for optimal parallelization
- Response caching with TTL
- Configurable concurrency limits
- Step timeout support
BatchExecutor: Efficient batch processing
- Process multiple items with configurable batch sizes
- Parallel execution within batches
- Index-aware processing option
PrefetchManager: Predictive page loading
- Prefetch URLs in the background
- Automatic cache management
- Concurrent prefetch limits

Smart Model Routing

ModelRouter: Intelligent model selection based on task complexity
- Task analysis for complexity scoring
- User-configurable model policies
- Cost tier constraints (Low/Medium/High)
- Latency-aware routing

Other Changes

Added MessageContent helper methods: as_text(), full_text(), is_text(), has_images()
Default ModelPolicy now allows High tier routing
Fixed compilation warnings

Full Changelog

spider_agent-v0.3.0...spider_agent-v0.4.0

Assets 2

02 Feb 23:25

j-mendez

v2.43.18

125268a

v2.43.18 - Web Search Integration

Features

Web Search Integration

Add web search capabilities to Spider's RemoteMultimodalEngine with support for multiple search providers.

Supported Providers

Serper (search_serper) - Google SERP API
Brave (search_brave) - Privacy-focused search
Bing (search_bing) - Microsoft Bing Web Search
Tavily (search_tavily) - AI-optimized search

New Methods

search() - Search the web and return structured results
search_and_extract() - Search + fetch pages + LLM extraction
research() - Search + extract + synthesize findings into summary

Setup

Cargo.toml

[dependencies]
spider = { version = "2.43.18", features = ["search_serper"] }

Configuration

use spider::configuration::{SearchConfig, SearchProviderType};
use spider::features::automation::RemoteMultimodalEngine;

let mut engine = RemoteMultimodalEngine::new(api_url, model, None);
engine.with_search_config(Some(
    SearchConfig::new(SearchProviderType::Serper, "your-api-key")
        // Optional: custom API endpoint
        .with_api_url("https://custom.api.com/search")
));

// Simple search
let results = engine.search("rust web crawler", None, None).await?;

// Search + extract
let data = engine.search_and_extract(
    "best rust frameworks",
    "Extract name and description",
    None,
    None,
).await?;

// Research with synthesis
use spider::features::automation::ResearchOptions;
let research = engine.research(
    "How do async runtimes work?",
    ResearchOptions::new().with_max_pages(5).with_synthesis(),
    None,
).await?;
println!("Summary: {}", research.summary.unwrap());

Custom API Endpoints

All providers support custom API URLs for self-hosted or alternative endpoints:

SearchConfig::new(SearchProviderType::Brave, "api-key")
    .with_api_url("https://my-brave-proxy.example.com/search")

Full Changelog

v2.43.17...v2.43.18

Assets 2

02 Feb 19:29

j-mendez

v2.43.13

c8a8176

v2.43.13 - Advanced Agentic Automation

🤖 Advanced Agentic Automation Features

This release adds comprehensive agentic automation capabilities to spider, making it a powerful tool for autonomous web interactions.

Phase 1: Simplified Agentic APIs

act(page, instruction) - Execute single actions with natural language
observe(page) - Analyze page state and get structured observations
extract_page(page, prompt, schema) - Extract structured data from pages
AutomationMemory - In-memory state management for multi-round automation
run_with_memory() - Stateful automation with persistent context

Phase 2: Self-Healing & Discovery

SelectorCache - Self-healing selector cache with LRU eviction
act_cached(page, instruction, cache) - Actions with automatic selector caching
StructuredOutputConfig - Native JSON schema enforcement for reliable outputs
extract_structured(page, prompt, config) - Schema-validated data extraction
map(page, prompt) - AI-powered URL discovery and categorization
MapResult / DiscoveredUrl - Relevance-scored URL discovery

Phase 3: Autonomous Agent Execution

execute(page, config) - Full autonomous goal-oriented execution
agent(page, goal) - Simple goal execution with defaults
agent_extract(page, goal, prompt) - Goal execution with data extraction
chain(page, steps) - Sequential action composition with conditions
AgentConfig - Comprehensive agent configuration (max_steps, timeout, recovery, etc.)
RecoveryStrategy - Error handling strategies (Retry, Alternative, Skip, Abort)
ChainStep / ChainCondition - Conditional action execution
AgentEvent - Real-time progress tracking events
AgentResult / ChainResult - Detailed execution results with history

Example Usage

// Autonomous agent
let config = AgentConfig::new("Find and add the cheapest laptop to cart")
    .with_max_steps(30)
    .with_success_url("/cart")
    .with_extraction("Extract cart total");

let result = engine.execute(&page, config).await?;

// Action chaining
let steps = vec![
    ChainStep::new("click Login"),
    ChainStep::new("type email").when(ChainCondition::ElementExists("#email")),
    ChainStep::new("click Submit").then_extract("Extract any errors"),
];
let result = engine.chain(&page, steps).await?;

// Self-healing cache
let mut cache = SelectorCache::new();
engine.act_cached(&page, "click submit", &mut cache).await?;

Full Changelog

feat(automation): add Phase 3 agentic features - autonomous agent, action chaining, error recovery
feat(automation): add Phase 2 agentic features - selector cache, structured outputs, map API
feat(automation): add simplified agentic APIs - act(), observe(), extract()
feat(automation): add agentic memory for multi-round automation

Assets 2

02 Feb 02:19

j-mendez

v2.43.3

44ce332

v2.43.3

Bug Fix

fix(automation): Improve best_effort_parse_json_object parsing to handle LLM responses with reasoning text before JSON code blocks
- Find ```json blocks anywhere in response (not just at boundaries)
- Support JSON arrays in addition to objects
- Better fallback parsing for various LLM response formats

Full Changelog: v2.43.2...v2.43.3

Assets 2

Releases: spider-rs/spider

v2.45.28

Agent Hardening

Uh oh!

v2.45.27

Changes since vv1.7.5

Contributors

Uh oh!

v2.45.24

What's New

Performance

Agent

Fixes

CLI

Housekeeping

Uh oh!

v2.45.20

What's New

Relevance Gate for Remote Multimodal Crawling

Full Changelog

Uh oh!

v2.44.13

What's New

Crates

Uh oh!

spider v2.43.20

Spider v2.43.20

Changes

spider_agent Integration

Full Changelog

Uh oh!

spider_agent v0.4.0

Spider Agent v0.4.0

Performance Optimizations

Smart Caching

High-Performance Execution

Smart Model Routing

Other Changes

Full Changelog

Uh oh!

v2.43.18 - Web Search Integration

Features

Web Search Integration

Supported Providers

New Methods

Setup

Cargo.toml

Configuration

Custom API Endpoints

Full Changelog

Uh oh!

v2.43.13 - Advanced Agentic Automation

🤖 Advanced Agentic Automation Features

Phase 1: Simplified Agentic APIs

Phase 2: Self-Healing & Discovery

Phase 3: Autonomous Agent Execution

Example Usage

Full Changelog

Uh oh!

v2.43.3

Bug Fix

Uh oh!