stygian — A monorepo containing two Rust crates:
-
stygian-graph — High-performance, graph-based scraping engine treating pipelines as DAGs with pluggable service modules (HTTP fetchers, LLM extractors, headless browsers). Built with hexagonal architecture for extreme concurrency and extensibility.
-
stygian-browser — Anti-detection browser automation library built on Chrome DevTools Protocol with stealth features to bypass modern anti-bot systems. Features browser pooling and resource management for scalability.
- Build all:
cargo build --workspace - Build graph:
cargo build -p stygian-graph - Build browser:
cargo build -p stygian-browser - Test all:
cargo test --workspace - Lint all:
cargo clippy --workspace -- -D warnings
- Domain layer must have zero I/O dependencies
- All external interactions go through port traits
- Adapters implement port traits and live in
adapters/ - New capabilities require a new port trait before an adapter
- Depend inward: adapters → ports ← domain
- Each module is self-contained with its own models, handlers, and storage
- Modules communicate through well-defined public interfaces
- Shared code goes in common modules
- Prefer module-level encapsulation over cross-cutting layers
- Language: rust
- Rust edition 2024, stable toolchain (1.94.0).
- All error types must use 'thiserror'; 'anyhow' is reserved for CLI entry points only.
- No .unwrap() or .expect() in library code; use exhaustive error handling.
- Async runtime: Tokio 1.49 for all I/O operations.
- Use Rust 1.94.0 features: async closures, trait upcasting, LazyCell/LazyLock, let chains.
- Use native 'async fn' in traits for plugin interfaces (Rust 2024).
- Documentation: every public trait and method must have a doc comment with an example.
- Graceful degradation: log failures, return errors, never panic in library code.
- Hexagonal Architecture (Ports & Adapters): Domain core is isolated from infrastructure concerns.
- Workspace structure: domain (business logic), ports (trait definitions), adapters (implementations), application (orchestration).
- Domain layer NEVER imports from adapters; use dependency inversion via ports.
- Apply advanced patterns: Typestate for pipeline stages, Phantom types for zero-cost safety, Interior mutability for caching.
- Concurrency: Tokio for I/O-bound, Rayon for CPU-bound, worker pools with backpressure.
- Zero-cost abstractions: avoid unnecessary Arcs/Boxed traits where generics suffice.
- Library versions: tokio 1.49, reqwest 0.13, petgraph 0.8, serde 1.0.210, rayon 1.10, scraper 0.20.
- AI provider support: Claude (Anthropic), ChatGPT (OpenAI), Gemini (Google), GitHub Copilot, Ollama (local).
- Idempotence: All operations must be safely retryable with idempotency keys.
- Security-first: Authorization checks at repository level, fail-secure by default.
- Use chromiumoxide crate for CDP automation (async, type-safe bindings).
- Thread-safe browser pool using
Arc<RwLock<Pool>>ortokio::sync::RwLock. - Configuration via environment variables for runtime flexibility.
- Anti-detection techniques must be toggleable via stealth profiles.
- Browser instances must be reusable to avoid cold start penalties.
- Performance targets: <100ms browser acquisition from warm pool, <2s from cold start.
- Memory management: Monitor heap size, close unused tabs, prevent leaks.
- Testing: Mock CDP protocol for unit tests, real browser for integration tests.
- Run
cargo test --workspacebefore committing - Every new public function needs at least one test
- Fix all test failures before marking a task complete
- Use conventional commits:
feat:,fix:,refactor:,test:,docs: - Focus commit messages on user impact, not file counts or line numbers
Generated by wiggum.