High-performance web scraping toolkit for Rust β graph-based execution engine + anti-detection browser automation.
Stygian is a monorepo containing two complementary Rust crates for building robust, scalable web scraping systems:
π stygian-graph
Graph-based scraping engine treating pipelines as DAGs with pluggable service modules:
- Hexagonal architecture β domain core isolated from infrastructure
- Extreme concurrency β Tokio for I/O, Rayon for CPU-bound tasks
- AI extraction β Claude, GPT, Gemini, GitHub Copilot, Ollama support
- Multi-modal β images, PDFs, videos via LLM vision APIs
- Distributed execution β Redis/Valkey-backed work queues
- Circuit breaker β graceful degradation when services fail
- Idempotency β safe retries with deduplication keys
π stygian-browser
Anti-detection browser automation library for bypassing modern bot protection:
- Browser pooling β warm pool, sub-100ms acquisition
- CDP-based β Chrome DevTools Protocol via chromiumoxide
- Stealth features β navigator spoofing, canvas noise, WebGL randomization
- Human behavior β BΓ©zier mouse paths, realistic typing
- Cloudflare/DataDome/PerimeterX β bypass detection layers
use stygian_graph::{PipelineBuilder, adapters::HttpAdapter};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let pipeline = PipelineBuilder::new()
.node("fetch", HttpAdapter::new())
.node("parse", MyParserAdapter)
.edge("fetch", "parse")
.build()?;
let results = pipeline
.execute(json!({"url": "https://example.com"}))
.await?;
println!("Results: {:?}", results);
Ok(())
}use stygian_browser::{BrowserConfig, BrowserPool};
use std::time::Duration;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let pool = BrowserPool::new(BrowserConfig::default()).await?;
let handle = pool.acquire().await?;
let mut page = handle.browser().new_page().await?;
page.navigate(
"https://example.com",
WaitUntil::Selector("body".to_string()),
Duration::from_secs(30),
).await?;
let html = page.content().await?;
println!("Page loaded: {} bytes", html.len());
handle.release().await;
Ok(())
}Add to your Cargo.toml:
[dependencies]
stygian-graph = "0.2"
stygian-browser = "0.2" # optional, for JavaScript rendering
tokio = { version = "1", features = ["full"] }Domain Layer (business logic)
β
Ports (trait definitions)
β
Adapters (HTTP, browser, AI providers, storage)
- Zero I/O dependencies in domain layer
- Dependency inversion β adapters depend on ports, not vice versa
- Extreme testability β mock any external system
- Self-contained modules with clear interfaces
- Pool management with resource limits
- Graceful degradation on browser unavailability
stygian/
βββ crates/
β βββ stygian-graph/ # Scraping engine
β βββ stygian-browser/ # Browser automation
βββ examples/ # Example pipelines
βββ docs/ # Architecture docs
βββ assets/ # Diagrams, images
# Install Rust 1.94.0+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Build workspace
cargo build --workspace
# Run tests
cargo test --workspace
# Run clippy
cargo clippy --workspace -- -D warnings# Unit tests
cargo test --lib
# Integration tests
cargo test --test '*'
# All tests (browser integration tests require Chrome)
cargo test --all-features
# Measure coverage (requires cargo-tarpaulin)
cargo tarpaulin --workspace --all-features --ignore-tests --out Lcovstygian-graph achieves strong unit coverage across domain, ports, and adapter layers.
stygian-browser coverage is structurally bounded by the Chrome CDP requirement β all tests
that spin up a real browser are marked #[ignore = "requires Chrome"]; pure-logic tests are
fully covered.
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Use Conventional Commits:
feat:β new featurefix:β bug fixrefactor:β code restructuringtest:β test additions/changesdocs:β documentation updates
Licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-only).
This means any modifications or derivative works must also be released under the AGPL-3.0, including when the software is used to provide a network service.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be licensed under the AGPL-3.0-only, without any additional terms or conditions.
Built with:
- chromiumoxide β CDP client
- petgraph β graph algorithms
- tokio β async runtime
- reqwest β HTTP client
Status: Active development | Version 0.2.0 | Rust 2024 edition | Linux + macOS
For detailed documentation, see the project docs site.
