Skip to content

Latest commit

 

History

History
583 lines (436 loc) · 12.1 KB

File metadata and controls

583 lines (436 loc) · 12.1 KB

Contributing to Ruvector

Thank you for your interest in contributing to Ruvector! This document provides guidelines and instructions for contributing.

Table of Contents

  1. Code of Conduct
  2. Getting Started
  3. Development Setup
  4. Code Style
  5. Testing
  6. Pull Request Process
  7. Commit Guidelines
  8. Documentation
  9. Performance
  10. Community

Code of Conduct

Our Pledge

We pledge to make participation in our project a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.

Our Standards

Positive behavior includes:

  • Using welcoming and inclusive language
  • Being respectful of differing viewpoints
  • Gracefully accepting constructive criticism
  • Focusing on what is best for the community
  • Showing empathy towards other community members

Unacceptable behavior includes:

  • Trolling, insulting/derogatory comments, and personal attacks
  • Public or private harassment
  • Publishing others' private information without permission
  • Other conduct which could reasonably be considered inappropriate

Getting Started

Prerequisites

  • Rust 1.77+: Install from rustup.rs
  • Node.js 16+: For Node.js bindings testing
  • Git: For version control
  • cargo-nextest (optional but recommended): cargo install cargo-nextest

Fork and Clone

  1. Fork the repository on GitHub
  2. Clone your fork:
    git clone https://github.com/YOUR_USERNAME/ruvector.git
    cd ruvector
  3. Add upstream remote:
    git remote add upstream https://github.com/ruvnet/ruvector.git

Development Setup

Build the Project

# Build all crates
cargo build

# Build with optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Build specific crate
cargo build -p ruvector-core

Run Tests

# Run all tests
cargo test

# Run tests with nextest (parallel, faster)
cargo nextest run

# Run specific test
cargo test test_hnsw_search

# Run with logging
RUST_LOG=debug cargo test

# Run benchmarks
cargo bench

Check Code

# Format code
cargo fmt

# Check formatting without changes
cargo fmt -- --check

# Run clippy lints
cargo clippy --all-targets --all-features -- -D warnings

# Check all crates
cargo check --all-features

Code Style

Rust Style Guide

We follow the Rust Style Guide with these additions:

Naming Conventions

// Structs: PascalCase
struct VectorDatabase { }

// Functions: snake_case
fn insert_vector() { }

// Constants: SCREAMING_SNAKE_CASE
const MAX_DIMENSIONS: usize = 65536;

// Type parameters: Single uppercase letter or PascalCase
fn generic<T>() { }
fn generic<TMetric: DistanceMetric>() { }

Documentation

All public items must have doc comments:

/// A high-performance vector database.
///
/// # Examples
///
/// ```
/// use ruvector_core::VectorDB;
///
/// let db = VectorDB::new(DbOptions::default())?;
/// ```
pub struct VectorDB { }

/// Insert a vector into the database.
///
/// # Arguments
///
/// * `entry` - The vector entry to insert
///
/// # Returns
///
/// The ID of the inserted vector
///
/// # Errors
///
/// Returns `RuvectorError` if insertion fails
pub fn insert(&self, entry: VectorEntry) -> Result<VectorId> {
    // ...
}

Error Handling

  • Use Result<T, RuvectorError> for fallible operations
  • Use thiserror for error types
  • Provide context with error messages
use thiserror::Error;

#[derive(Error, Debug)]
pub enum RuvectorError {
    #[error("Vector dimension mismatch: expected {expected}, got {got}")]
    DimensionMismatch { expected: usize, got: usize },

    #[error("IO error: {0}")]
    Io(#[from] std::io::Error),
}

Performance

  • Use #[inline] for hot path functions
  • Profile before optimizing
  • Document performance characteristics
/// Distance calculation (hot path, inlined)
#[inline]
pub fn euclidean_distance(a: &[f32], b: &[f32]) -> f32 {
    // SIMD-optimized implementation
}

TypeScript/JavaScript Style

For Node.js bindings:

// Use TypeScript for type safety
interface VectorEntry {
    id?: string;
    vector: Float32Array;
    metadata?: Record<string, any>;
}

// Async/await for async operations
async function search(query: Float32Array): Promise<SearchResult[]> {
    return await db.search({ vector: query, k: 10 });
}

// Use const/let, never var
const db = new VectorDB(options);
let results = await db.search(query);

Testing

Test Structure

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_insert() {
        // Arrange
        let db = VectorDB::new(DbOptions::default()).unwrap();
        let entry = VectorEntry {
            id: None,
            vector: vec![0.1; 128],
            metadata: None,
        };

        // Act
        let id = db.insert(entry).unwrap();

        // Assert
        assert!(!id.is_empty());
    }

    #[test]
    fn test_error_handling() {
        let db = VectorDB::new(DbOptions::default()).unwrap();
        let wrong_dims = vec![0.1; 64]; // Wrong dimensions

        let result = db.insert(VectorEntry {
            id: None,
            vector: wrong_dims,
            metadata: None,
        });

        assert!(result.is_err());
    }
}

Property-Based Testing

Use proptest for property-based tests:

use proptest::prelude::*;

proptest! {
    #[test]
    fn test_distance_symmetry(
        a in prop::collection::vec(any::<f32>(), 128),
        b in prop::collection::vec(any::<f32>(), 128)
    ) {
        let d1 = euclidean_distance(&a, &b);
        let d2 = euclidean_distance(&b, &a);
        assert!((d1 - d2).abs() < 1e-5);
    }
}

Benchmarking

Use criterion for benchmarks:

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_search(c: &mut Criterion) {
    let db = setup_db();
    let query = vec![0.1; 128];

    c.bench_function("search 1M vectors", |b| {
        b.iter(|| {
            db.search(black_box(&SearchQuery {
                vector: query.clone(),
                k: 10,
                filter: None,
                include_vectors: false,
            }))
        })
    });
}

criterion_group!(benches, benchmark_search);
criterion_main!(benches);

Test Coverage

Aim for:

  • Unit tests: 80%+ coverage
  • Integration tests: All major features
  • Property tests: Core algorithms
  • Benchmarks: Performance-critical paths

Pull Request Process

Before Submitting

  1. Create an issue first for major changes
  2. Fork and branch: Create a feature branch
    git checkout -b feature/my-new-feature
  3. Write tests: Ensure new code has tests
  4. Run checks:
    cargo fmt
    cargo clippy --all-targets --all-features -- -D warnings
    cargo test
    cargo bench
  5. Update documentation: Update relevant docs
  6. Add changelog entry: Update CHANGELOG.md

PR Template

## Description

Brief description of changes

## Motivation

Why is this change needed?

## Changes

- Change 1
- Change 2

## Testing

How was this tested?

## Performance Impact

Any performance implications?

## Checklist

- [ ] Tests added/updated
- [ ] Documentation updated
- [ ] Changelog updated
- [ ] Code formatted (`cargo fmt`)
- [ ] Lints passing (`cargo clippy`)
- [ ] All tests passing (`cargo test`)

Review Process

  1. Automated checks: CI must pass
  2. Code review: At least one maintainer approval
  3. Discussion: Address reviewer feedback
  4. Merge: Squash and merge or rebase

Commit Guidelines

Commit Message Format

<type>(<scope>): <subject>

<body>

<footer>

Types:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation changes
  • style: Code style changes (formatting)
  • refactor: Code refactoring
  • perf: Performance improvements
  • test: Test additions/changes
  • chore: Build process or auxiliary tool changes

Examples:

feat(hnsw): add parallel index construction

Implement parallel HNSW construction using rayon for faster
index building on multi-core systems.

- Split graph construction across threads
- Use atomic operations for thread-safe updates
- Achieve 4x speedup on 8-core system

Closes #123
fix(quantization): correct product quantization distance calculation

The distance calculation was not using precomputed lookup tables,
causing incorrect results.

Fixes #456

Commit Hygiene

  • One logical change per commit
  • Write clear, descriptive messages
  • Reference issues/PRs when applicable
  • Keep commits focused and atomic

Documentation

Code Documentation

  • Public APIs: Comprehensive rustdoc comments
  • Examples: Include usage examples in doc comments
  • Safety: Document unsafe code thoroughly
  • Panics: Document panic conditions

User Documentation

Update relevant docs:

  • README.md: Overview and quick start
  • guides/: User guides and tutorials
  • api/: API reference documentation
  • CHANGELOG.md: User-facing changes

Documentation Style

/// A vector database with HNSW indexing.
///
/// `VectorDB` provides fast approximate nearest neighbor search using
/// Hierarchical Navigable Small World (HNSW) graphs. It supports:
///
/// - Sub-millisecond query latency
/// - 95%+ recall with proper tuning
/// - Memory-mapped storage for large datasets
/// - Multiple distance metrics (Euclidean, Cosine, etc.)
///
/// # Examples
///
/// ```
/// use ruvector_core::{VectorDB, VectorEntry, DbOptions};
///
/// let mut options = DbOptions::default();
/// options.dimensions = 128;
///
/// let db = VectorDB::new(options)?;
///
/// let entry = VectorEntry {
///     id: None,
///     vector: vec![0.1; 128],
///     metadata: None,
/// };
///
/// let id = db.insert(entry)?;
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
///
/// # Performance
///
/// - Search: O(log n) with HNSW
/// - Insert: O(log n) amortized
/// - Memory: ~640 bytes per vector (M=32)
pub struct VectorDB { }

Performance

Performance Guidelines

  1. Profile first: Use cargo flamegraph or perf
  2. Measure impact: Benchmark before/after
  3. Document trade-offs: Explain performance vs. other concerns
  4. Use SIMD: Leverage SIMD intrinsics for hot paths
  5. Avoid allocations: Reuse buffers in hot loops

Benchmarking Changes

# Benchmark baseline
git checkout main
cargo bench -- --save-baseline main

# Benchmark your changes
git checkout feature-branch
cargo bench -- --baseline main

Performance Checklist

  • Profiled hot paths
  • Benchmarked changes
  • No performance regressions
  • Documented performance characteristics
  • Considered memory usage

Community

Getting Help

  • GitHub Issues: Bug reports and feature requests
  • Discussions: Questions and general discussion
  • Pull Requests: Code contributions

Reporting Bugs

Use the bug report template:

**Describe the bug**
Clear description of the bug

**To Reproduce**
1. Step 1
2. Step 2
3. See error

**Expected behavior**
What you expected to happen

**Environment**
- OS: [e.g., Ubuntu 22.04]
- Rust version: [e.g., 1.77.0]
- Ruvector version: [e.g., 0.1.0]

**Additional context**
Any other relevant information

Feature Requests

Use the feature request template:

**Is your feature request related to a problem?**
Clear description of the problem

**Describe the solution you'd like**
What you want to happen

**Describe alternatives you've considered**
Other solutions you've thought about

**Additional context**
Any other relevant information

License

By contributing to Ruvector, you agree that your contributions will be licensed under the MIT License.

Questions?

Feel free to open an issue or discussion if you have questions about contributing!


Thank you for contributing to Ruvector! 🚀