Skip to content

Latest commit

 

History

History
346 lines (248 loc) · 9.69 KB

File metadata and controls

346 lines (248 loc) · 9.69 KB

RuVector-Postgres v2.0.0 Security Audit Report

Date: 2025-12-26 Auditor: Claude Code Security Review Scope: /crates/ruvector-postgres/src/**/*.rs Branch: feat/ruvector-postgres-v2 Status: CRITICAL issues FIXED


Executive Summary

Severity Count Status
CRITICAL 3 FIXED
HIGH 2 ⚠️ Documented for future improvement
MEDIUM 3 ⚠️ Documented for future improvement
LOW 2 ✅ Acceptable
INFO 3 ✅ Acceptable patterns noted

Security Fixes Applied (2025-12-26)

  1. Created validation.rs module - Input validation for tenant IDs and identifiers
  2. Fixed SQL injection in isolation.rs - All SQL now uses quote_identifier() and parameterized queries
  3. Fixed SQL injection in operations.rs - AuditLogEntry now properly escapes all values
  4. Added ValidatedTenantId type - Type-safe tenant ID validation
  5. Query routing uses $1 placeholders - Parameterized queries prevent injection

CRITICAL Findings

CVE-PENDING-001: SQL Injection in Tenant Isolation Module ✅ FIXED

Location: src/tenancy/isolation.rs Lines: 233, 454, 461, 477, 491 Status:FIXED on 2025-12-26

Original Vulnerable Code:

// Line 233 - Direct table name interpolation
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", partition_name))

// Line 454 - Direct tenant_id interpolation
filter: format!("tenant_id = '{}'", tenant_id),

Applied Fix:

// Now uses validated identifiers with quote_identifier()
validate_identifier(partition_name)?;
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", quote_identifier(partition_name)))

// Now uses parameterized queries with $1 placeholder
filter: "tenant_id = $1".to_string(),
tenant_param: Some(tenant_id.to_string()),

Changes Made:

  • Added validate_tenant_id() calls before any SQL generation
  • All table/schema/partition names now use quote_identifier()
  • Query routing returns tenant_id = $1 placeholder instead of direct interpolation
  • Added tenant_param field to QueryRoute::SharedWithFilter for binding

CVE-PENDING-002: SQL Injection in Tenant Audit Logging ✅ FIXED

Location: src/tenancy/operations.rs Lines: 515-527 Status:FIXED on 2025-12-26

Original Vulnerable Code:

format!("'{}'", u)  // Direct user_id interpolation
format!("'{}'", ip)  // Direct IP interpolation

Applied Fix:

// New parameterized version
pub fn insert_sql_parameterized(&self) -> (String, Vec<Option<String>>) {
    let sql = "INSERT INTO ruvector.tenant_audit_log ... VALUES ($1, $2, $3, $4, $5, $6, $7)";
    // Params bound safely
}

// Legacy version now escapes properly
let escaped_user_id = escape_string_literal(u);
// IP validated: if validate_ip_address(ip) { Some(...) } else { None }

Changes Made:

  • Added insert_sql_parameterized() for new code (preferred)
  • Legacy insert_sql() now uses escape_string_literal() for all values
  • Added IP address validation - invalid IPs become NULL
  • Tenant ID validated before SQL generation

CVE-PENDING-003: SQL Injection via Drop Partition ✅ FIXED

Location: src/tenancy/isolation.rs:227-234 Status:FIXED on 2025-12-26

Original Vulnerable Code:

Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", partition_name))  // UNSAFE

Applied Fix:

// Validate inputs
validate_tenant_id(tenant_id)?;
validate_identifier(partition_name)?;

// Verify partition belongs to tenant (authorization check)
let partition_exists = self.partitions.get(tenant_id)
    .map(|p| p.iter().any(|p| p.partition_name == partition_name))
    .unwrap_or(false);
if !partition_exists {
    return Err(IsolationError::PartitionNotFound(partition_name.to_string()));
}

// Use quoted identifier
Ok(format!("DROP TABLE IF EXISTS {} CASCADE;", quote_identifier(partition_name)))

Changes Made:

  • Added input validation for both tenant_id and partition_name
  • Added authorization check - partition must belong to tenant
  • Used quote_identifier() for safe SQL generation

HIGH Findings

HIGH-001: Excessive Panic/Unwrap Usage

Location: Multiple files (63 files affected) Count: 462 occurrences of unwrap(), expect(), panic!

Description: Unhandled panics in PostgreSQL extensions can crash the database backend process.

Impact:

  • Denial of Service through crafted inputs
  • Database backend crashes
  • Service unavailability

Affected Patterns:

.unwrap()           // 280+ occurrences
.expect("...")      // 150+ occurrences
panic!("...")       // 32 occurrences

Remediation:

  1. Replace unwrap() with unwrap_or_default() or proper error handling
  2. Use pgrx::error!() for graceful PostgreSQL error reporting
  3. Implement Result<T, E> return types for public functions
  4. Add input validation before operations that can panic

HIGH-002: Unsafe Integer Casts

Location: Multiple files Count: 392 occurrences

Description: Unchecked integer casts between types (e.g., as usize, as i32, as u64) can cause overflow/underflow.

Affected Patterns:

value as usize   // Can panic on 32-bit systems
len as i32       // Can overflow for large vectors
index as u64     // Can truncate on edge cases

Remediation:

  1. Use TryFrom/try_into() with error handling
  2. Add bounds checking before casts
  3. Use saturating_cast or checked_cast patterns
  4. Validate dimension/size limits at API boundary

MEDIUM Findings

MEDIUM-001: Unsafe Pointer Operations in Index Storage

Location: src/index/ivfflat_storage.rs, src/index/hnsw_am.rs

Description: Index access methods use raw pointer operations for performance, which are inherently unsafe.

Affected Patterns:

  • std::ptr::read()
  • std::ptr::write()
  • std::slice::from_raw_parts()
  • std::slice::from_raw_parts_mut()

Mitigation Applied:

  • Operations are gated behind unsafe blocks
  • Required for pgrx PostgreSQL integration
  • No user-controlled data reaches pointers directly

Recommendation:

  1. Add bounds checking assertions before pointer access
  2. Document safety invariants for each unsafe block
  3. Consider #[deny(unsafe_op_in_unsafe_fn)] lint

MEDIUM-002: Unbounded Vector Allocations

Location: Multiple modules

Description: Some operations allocate vectors based on user-provided dimensions without upper limits.

Affected Areas:

  • Vec::with_capacity(dimension) in type constructors
  • .collect() on unbounded iterators
  • Graph traversal result sets

Remediation:

  1. Define MAX_VECTOR_DIMENSION constant (e.g., 16384)
  2. Validate dimensions at input boundaries
  3. Add configurable limits via GUC parameters

MEDIUM-003: Missing Rate Limiting on Tenant Operations

Location: src/tenancy/operations.rs

Description: Tenant creation and audit logging have no rate limiting, allowing potential abuse.

Remediation:

  1. Add configurable rate limits per tenant
  2. Implement quota checking before operations
  3. Add throttling for expensive operations

LOW Findings

LOW-001: Debug Output in Tests Only

Location: src/distance/simd.rs Count: 7 println! statements

Status: ACCEPTABLE - All debug output is in #[cfg(test)] modules only.


LOW-002: Error Messages May Reveal Internal Paths

Location: Various error handling code

Description: Some error messages include internal details that could aid attackers.

Example:

format!("Failed to spawn worker: {}", e)
format!("Failed to decode operation: {}", e)

Remediation:

  1. Use generic user-facing error messages
  2. Log detailed errors internally only
  3. Implement error code system for debugging

INFO - Acceptable Patterns

INFO-001: No Command Execution Found

No Command::new(), exec, or shell execution patterns found. ✅

INFO-002: No File System Operations

No std::fs, File::open, or path manipulation in production code. ✅

INFO-003: No Hardcoded Credentials

No passwords, API keys, or secrets in source code. ✅


Security Checklist Summary

Category Status Notes
SQL Injection ❌ FAIL 3 critical findings in tenancy module
Command Injection ✅ PASS No shell execution
Path Traversal ✅ PASS No file operations
Memory Safety ⚠️ WARN Acceptable unsafe for pgrx, but review recommended
Input Validation ⚠️ WARN Missing on tenant/partition names
DoS Prevention ⚠️ WARN Panic-prone code paths
Auth/AuthZ ✅ PASS No bypasses found
Crypto ✅ PASS No cryptographic code present
Information Disclosure ✅ PASS Debug output test-only

Remediation Priority

Immediate (Before Release)

  1. Fix SQL injection in tenancy module - Use parameterized queries
  2. Validate tenant_id format - Alphanumeric only, max length 64

Short Term (Next Sprint)

  1. Replace critical unwrap() calls with proper error handling
  2. Add dimension limits to vector operations
  3. Implement input validation helpers

Medium Term

  1. Add rate limiting to tenant operations
  2. Audit and document all unsafe blocks
  3. Convert integer casts to checked variants

Testing Recommendations

  1. Fuzz testing: Apply cargo-fuzz to SQL-generating functions
  2. Property testing: Test boundary conditions with proptest
  3. Integration tests: Add SQL injection test vectors
  4. Negative tests: Verify malformed inputs are rejected

Appendix: Files Reviewed

  • 80+ source files in /crates/ruvector-postgres/src/
  • 148 #[pg_extern] function definitions
  • Focus areas: tenancy, index, distance, types, graph

Report generated by Claude Code security analysis