Skip to content

greysquirr3l/stygian

Repository files navigation

stygian

stygian

High-performance web scraping toolkit for Rust β€” graph-based execution engine + anti-detection browser automation.

CI Security Audit Documentation OpenSSF Scorecard License: AGPL v3


What is stygian?

Stygian is a monorepo containing two complementary Rust crates for building robust, scalable web scraping systems:

πŸ“Š stygian-graph

Graph-based scraping engine treating pipelines as DAGs with pluggable service modules:

  • Hexagonal architecture β€” domain core isolated from infrastructure
  • Extreme concurrency β€” Tokio for I/O, Rayon for CPU-bound tasks
  • AI extraction β€” Claude, GPT, Gemini, GitHub Copilot, Ollama support
  • Multi-modal β€” images, PDFs, videos via LLM vision APIs
  • Distributed execution β€” Redis/Valkey-backed work queues
  • Circuit breaker β€” graceful degradation when services fail
  • Idempotency β€” safe retries with deduplication keys

Anti-detection browser automation library for bypassing modern bot protection:

  • Browser pooling β€” warm pool, sub-100ms acquisition
  • CDP-based β€” Chrome DevTools Protocol via chromiumoxide
  • Stealth features β€” navigator spoofing, canvas noise, WebGL randomization
  • Human behavior β€” BΓ©zier mouse paths, realistic typing
  • Cloudflare/DataDome/PerimeterX β€” bypass detection layers

Quick Start

Graph Scraping Pipeline

use stygian_graph::{PipelineBuilder, adapters::HttpAdapter};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let pipeline = PipelineBuilder::new()
        .node("fetch", HttpAdapter::new())
        .node("parse", MyParserAdapter)
        .edge("fetch", "parse")
        .build()?;

    let results = pipeline
        .execute(json!({"url": "https://example.com"}))
        .await?;
    
    println!("Results: {:?}", results);
    Ok(())
}

Browser Automation

use stygian_browser::{BrowserConfig, BrowserPool};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let pool = BrowserPool::new(BrowserConfig::default()).await?;
    let handle = pool.acquire().await?;
    
    let mut page = handle.browser().new_page().await?;
    page.navigate(
        "https://example.com",
        WaitUntil::Selector("body".to_string()),
        Duration::from_secs(30),
    ).await?;
    
    let html = page.content().await?;
    println!("Page loaded: {} bytes", html.len());
    
    handle.release().await;
    Ok(())
}

Installation

Add to your Cargo.toml:

[dependencies]
stygian-graph = "0.2"
stygian-browser = "0.2"  # optional, for JavaScript rendering
tokio = { version = "1", features = ["full"] }

Architecture

stygian-graph: Hexagonal (Ports & Adapters)

Domain Layer (business logic)
    ↑
Ports (trait definitions)
    ↑
Adapters (HTTP, browser, AI providers, storage)
  • Zero I/O dependencies in domain layer
  • Dependency inversion β€” adapters depend on ports, not vice versa
  • Extreme testability β€” mock any external system

stygian-browser: Modular

  • Self-contained modules with clear interfaces
  • Pool management with resource limits
  • Graceful degradation on browser unavailability

Project Structure

stygian/
β”œβ”€β”€ crates/
β”‚   β”œβ”€β”€ stygian-graph/      # Scraping engine
β”‚   └── stygian-browser/    # Browser automation
β”œβ”€β”€ examples/                # Example pipelines
β”œβ”€β”€ docs/                    # Architecture docs
└── assets/                  # Diagrams, images

Development

Setup

# Install Rust 1.94.0+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build workspace
cargo build --workspace

# Run tests
cargo test --workspace

# Run clippy
cargo clippy --workspace -- -D warnings

Testing

# Unit tests
cargo test --lib

# Integration tests
cargo test --test '*'

# All tests (browser integration tests require Chrome)
cargo test --all-features

# Measure coverage (requires cargo-tarpaulin)
cargo tarpaulin --workspace --all-features --ignore-tests --out Lcov

stygian-graph achieves strong unit coverage across domain, ports, and adapter layers. stygian-browser coverage is structurally bounded by the Chrome CDP requirement β€” all tests that spin up a real browser are marked #[ignore = "requires Chrome"]; pure-logic tests are fully covered.


Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Commit Convention

Use Conventional Commits:

  • feat: β€” new feature
  • fix: β€” bug fix
  • refactor: β€” code restructuring
  • test: β€” test additions/changes
  • docs: β€” documentation updates

License

Licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-only).

This means any modifications or derivative works must also be released under the AGPL-3.0, including when the software is used to provide a network service.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be licensed under the AGPL-3.0-only, without any additional terms or conditions.


Acknowledgments

Built with:


Status: Active development | Version 0.2.0 | Rust 2024 edition | Linux + macOS

For detailed documentation, see the project docs site.

About

High-performance graph-based web scraping engine + anti-detection browser automation for Rust

Topics

Resources

License

AGPL-3.0, Unknown licenses found

Licenses found

AGPL-3.0
LICENSE
Unknown
LICENSE-COMMERCIAL.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages