cargo build --release # Release build
cargo check # Fast type-check
cargo test # Run all tests (116 unit + 77 integration)
cargo test <test_name> # Run a specific test
cargo fmt # Format codeNo special env vars needed for tests — integration tests use wiremock and fixed signing keys.
This is a Rust rewrite of nearai/vllm-proxy. It proxies OpenAI-compatible API requests to a vLLM/sglang backend, adding cryptographic signing and TEE attestation.
Key flow: Request arrives -> auth check -> read body -> forward to backend -> hash request+response -> sign with ECDSA+Ed25519 -> cache signature -> return response.
lib.rs—AppState(shared state via Arc),request_id_middleware, public module exportsconfig.rs—Config::from_env(), all env vars loaded at startupsigning.rs—EcdsaContext(secp256k1 EIP-191),Ed25519Context,SigningPairproxy.rs—proxy_json_request(),proxy_streaming_request(),proxy_multipart_request(),proxy_simple()— the core proxy+sign+cache helpers that routes delegate tocache.rs—ChatCachewrapping moka with TTL, key format{model}:chat:{id}attestation.rs—AttestationCache,generate_attestation(), GPU evidence collection with retry/serialization, dstack TDX quotesauth.rs—RequireAuthaxum extractor (validates Bearer token)routes/— thin handlers that parse request, call proxy helpers
dstack-sdkis on crates.io now (was previously a git dependency). Path:dstack_sdk::dstack_client::DstackClientgenerate_attestation()takesAttestationParamsstruct + optional&AttestationCache- Attestation nonces are cryptographically bound to GPU evidence and TDX quotes — cannot cache across different nonces
- Nonce-less attestation requests can be cached (caller accepts whatever nonce we generate)
- GPU evidence collection spawns a Python subprocess calling NVIDIA's
cc_admin.collect_gpu_evidence_remote()— serialized behind a semaphore to avoid NVML driver contention reqwest::multipart::Part::mime_str()consumes self — use.expect()not?in chains- Streaming uses
tokio::spawn+mpscchannel: background task hashes chunks and signs on stream completion strip_empty_tool_callsinroutes/chat.rsis a vLLM bug workaround (still needed as of vLLM v0.15.1)- Signed text format:
"{model_name}:{sha256_request}:{sha256_response}"signed by both algos serde_json::to_stringmatches Python'sjson.dumps(separators=(",",":"))
futures-util— needed forStreamExton reqwest byte streamshttp-body-util— needed forBodyExt::frame()/BodyExt::collect()on axum Bodyk256— secp256k1 ECDSA withPrehashSignertrait fromsignature::hazmatsha3— Keccak256 for Ethereum address derivation and EIP-191
- Run
cargo fmtbefore committing - Error types in
error.rsproduce OpenAI-compatible JSON:{"error": {"message": ..., "type": ..., "param": null, "code": null}} - Internal errors hide details from clients (returns generic "Internal server error")
- Named route upstream errors are passed through verbatim (status code + body)
- Catch-all upstream errors are parsed and re-wrapped via
AppError::UpstreamParsed(never leaks raw body) parse_upstream_error()inproxy.rshandles both vLLM flat ({"message":"..."}) and nested ({"error":{"message":"..."}}) formats- All upstream errors logged with
warn!includingerror_messageanderror_type(never body content) request_id_middlewarewraps requests in a tracing span — all log lines automatically includerequest_id,method,path- Unit tests live in
#[cfg(test)] mod testswithin each source file - Integration tests in
tests/integration.rsusetower::ServiceExt::oneshotwith wiremock - When adding fields to
ConfigorAppState, also update bothbuild_test_app_*helpers intests/integration.rs
- Docker image:
nearaidev/vllm-proxy-rs(published with digest-pinned refs in cvm-conf) - Deployed via compose files in nearai/cvm-compose-files
- Each proxy instance needs:
MODEL_NAME,TOKEN,VLLM_BASE_URL,TLS_CERT_PATH - Optional:
CLOUD_API_URL(enables usage reporting +sk-API key auth via cloud-api),LOG_FORMAT=json(structured JSON logs) ATTESTATION_CACHE_TTL(default 300s) — TTL for cached nonce-less attestation reports; background refresh runs at half-TTL
CLOUD_API_URLenables two features: (1)sk-live-/sk-test-API key validation viaPOST /v1/check_api_key, (2) fire-and-forget usage reporting viaPOST /v1/usage- Usage reporting is in
proxy.rs:spawn_usage_report()sends model name + token counts; failures only logged as warnings - MODEL_NAME must exactly match
model_namein cloud-api's model table — cloud-api does NOT check model aliases, so a mismatch causes silent 404s on usage reporting