Skip to content

Conversation

@cpegeric
Copy link
Contributor

@cpegeric cpegeric commented Dec 10, 2025

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #23254

What this PR does / why we need it:

improve IVFFLAT assign center in ProductL2 by usearch exact search and GPU

Current status:

  1. Need to wait for usearch to merge the bug fix with their exact search golang api
  1. Need to wait for cuvs bug fix and merge fix with their golang API

@cpegeric cpegeric requested a review from fengttt as a code owner December 10, 2025 12:42
@cpegeric cpegeric changed the title FEATURE: improve IVFFLAT assign center in ProductL2 by usearch exact search and GPU IGNOREME FEATURE: improve IVFFLAT assign center in ProductL2 by usearch exact search and GPU Dec 10, 2025
@cpegeric cpegeric marked this pull request as draft December 10, 2025 12:42
@qodo-code-review qodo-code-review bot changed the title IGNOREME FEATURE: improve IVFFLAT assign center in ProductL2 by usearch exact search and GPU FEATURE: improve IVFFLAT assign center in ProductL2 by usearch exact search and GPU Dec 10, 2025
@matrix-meow matrix-meow added the size/XL Denotes a PR that changes [1000, 1999] lines label Dec 10, 2025
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Nil checks missing: Code assumes ctr.brute_force and vector dimensions are valid without explicit nil/empty
handling in probe/build paths, risking runtime panics when inputs are null or index
creation fails.

Referred Code
	switch ctr.inBat.Vecs[tblColPos].GetType().Oid {
	case types.T_array_float32:
		return probeRun[float32](ctr, ap, proc, result)
	case types.T_array_float64:
		return probeRun[float64](ctr, ap, proc, result)
	}
	return nil
}

func (ctr *container) release() {
	if ctr.brute_force != nil {
		ctr.brute_force.Destroy()
		ctr.brute_force = nil
	}
}

func probeRun[T types.RealNumbers](ctr *container, ap *Productl2, proc *process.Process, result *vm.CallResult) error {
	probeCount := ctr.inBat.RowCount()
	tblColPos := ap.OnExpr.GetF().GetArgs()[1].GetCol().GetColPos()

	ncpu := runtime.NumCPU()


 ... (clipped 47 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No auditing: New feature paths (building/searching brute force index and GPU flows) add critical
operations without any audit logging of actions, actors, or outcomes.

Referred Code
centroidColPos := productl2.OnExpr.GetF().GetArgs()[0].GetCol().GetColPos()
switch ctr.bat.Vecs[centroidColPos].GetType().Oid {
case types.T_array_float32:
	ctr.brute_force, err = getIndex[float32](productl2, proc, analyzer)
	if err != nil {
		return err
	}
case types.T_array_float64:
	ctr.brute_force, err = getIndex[float64](productl2, proc, analyzer)
	if err != nil {
		return err
	}
}

return nil

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unsafe pointer: The use of unsafe pointers with external library calls lacks explicit input validation and
bounds checks on flattened datasets/queries, which may pose memory safety risks if
upstream inputs are malformed.

Referred Code
keys_ui64, distances_f32, err := usearch.ExactSearchUnsafe(
	util.UnsafePointer(&(idx.Dataset[0])),
	util.UnsafePointer(&(flatten[0])),
	uint(idx.Count),
	uint(len(queries)),
	idx.Dimension*idx.ElementSize,
	idx.Dimension*idx.ElementSize,
	idx.Dimension,
	idx.Metric,
	idx.Quantization,
	limit,
	rt.NThreads)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@mergify mergify bot added the kind/feature label Dec 10, 2025
@cpegeric cpegeric changed the title FEATURE: improve IVFFLAT assign center in ProductL2 by usearch exact search and GPU IGNOREME FEATURE: improve IVFFLAT assign center in ProductL2 by usearch exact search and GPU Dec 10, 2025
@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
GPU implementation is incomplete and disabled

The GPU implementation is disabled via commented-out code and a CPU fallback,
despite being a key feature of the PR. The suggestion is to either complete the
GPU implementation or split the PR to merge the CPU improvements separately.

Examples:

pkg/vectorindex/brute_force/gpu.go [44-84]
func NewBruteForceIndex[T types.RealNumbers](dataset [][]T,
	dimension uint,
	m metric.MetricType,
	elemsz uint) (cache.VectorIndexSearchIf, error) {

	return NewCpuBruteForceIndex[T](dataset, dimension, m, elemsz)
}

// cuvs library has bug.  comment out the GPU version until cuvs fix the bug
/*

 ... (clipped 31 lines)

Solution Walkthrough:

Before:

// File: pkg/vectorindex/brute_force/gpu.go

// This function is active but falls back to CPU
func NewBruteForceIndex[T types.RealNumbers](...) (cache.VectorIndexSearchIf, error) {
	return NewCpuBruteForceIndex[T](...)
}

// The actual GPU implementation is commented out
/*
func NewBruteForceIndex[T types.RealNumbers](...) (cache.VectorIndexSearchIf, error) {
	switch dset := any(dataset).(type) {
	case [][]float32:
		idx := &GpuBruteForceIndex[float32]{}
		// ... GPU index setup ...
		return idx, nil
	default:
		// fallback for other types
		return NewCpuBruteForceIndex[T](...)
	}
}
*/

After:

// File: pkg/vectorindex/brute_force/gpu.go

// The GPU implementation should be enabled and the fallback removed or made conditional.
func NewBruteForceIndex[T types.RealNumbers](...) (cache.VectorIndexSearchIf, error) {
	// Check if GPU resources are available, if not, fallback to CPU.
	// ...

	switch dset := any(dataset).(type) {
	case [][]float32:
		idx := &GpuBruteForceIndex[float32]{}
		// ... GPU index setup ...
		return idx, nil
	default:
		// For types not supported by GPU, fallback to CPU
		return NewCpuBruteForceIndex[T](...)
	}
}
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies that the advertised GPU support is incomplete and disabled, with the implementation in pkg/vectorindex/brute_force/gpu.go falling back to CPU, which contradicts one of the PR's main goals.

High
Possible issue
Handle empty build-side input gracefully

In the build function, add a check for an empty build-side batch (ctr.bat) and
transition to the End state to prevent a potential panic during the probe phase.

pkg/sql/colexec/productl2/product_l2.go [170-206]

 func (productl2 *Productl2) build(proc *process.Process, analyzer process.Analyzer) error {
 	ctr := &productl2.ctr
 	start := time.Now()
 	defer analyzer.WaitStop(start)
 	mp, err := message.ReceiveJoinMap(productl2.JoinMapTag, false, 0, proc.GetMessageBoard(), proc.Ctx)
 	if err != nil {
 		return err
 	}
 	if mp == nil {
 		return nil
 	}
 	batches := mp.GetBatches()
 	//maybe optimize this in the future
 	for i := range batches {
 		ctr.bat, err = ctr.bat.AppendWithCopy(proc.Ctx, proc.Mp(), batches[i])
 		if err != nil {
 			return err
 		}
 	}
 	mp.Free()
 
+	if ctr.bat.RowCount() == 0 {
+		ctr.state = End
+		return nil
+	}
+
 	centroidColPos := productl2.OnExpr.GetF().GetArgs()[0].GetCol().GetColPos()
 	switch ctr.bat.Vecs[centroidColPos].GetType().Oid {
 	case types.T_array_float32:
 		ctr.brute_force, err = getIndex[float32](productl2, proc, analyzer)
 		if err != nil {
 			return err
 		}
 	case types.T_array_float64:
 		ctr.brute_force, err = getIndex[float64](productl2, proc, analyzer)
 		if err != nil {
 			return err
 		}
 	}
 
 	return nil
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a potential panic when the build-side batch is empty and there are rows on the probe side. The proposed fix of checking the row count and transitioning to the End state is an effective way to prevent this crash.

Medium
Correctly handle NULL probe vectors

In probeRun, modify the logic to skip NULL probe vectors instead of joining them
with a default centroid, ensuring correct SQL join behavior.

pkg/sql/colexec/productl2/product_l2.go [297-313]

 	for j := 0; j < probeCount; j++ {
 
 		if ctr.inBat.Vecs[tblColPos].IsNull(uint64(j)) {
-			leastClusterIndex[j] = 0
+			continue
 		}
 		for k, rp := range ap.Result {
 			if rp.Rel == 0 {
 				if err := ctr.rbat.Vecs[k].UnionOne(ctr.inBat.Vecs[rp.Pos], int64(j), proc.Mp()); err != nil {
 					return err
 				}
 			} else {
 				if err := ctr.rbat.Vecs[k].UnionOne(ctr.bat.Vecs[rp.Pos], int64(leastClusterIndex[j]), proc.Mp()); err != nil {
 					return err
 				}
 			}
 		}
 	}
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out that the current handling of NULL probe vectors is incorrect, as it forces a join with the centroid at index 0. The proposed change to skip NULL vectors aligns with standard SQL join semantics and fixes this correctness bug.

Medium
Prevent panic from nil centroids

In LoadIndex, handle NULL centroids by replacing them with zero-filled vectors
before creating the brute_force index to prevent a potential panic.

pkg/vectorindex/ivfflat/search.go [91-99]

 	if ncenters == 0 {
 		return nil
 	}
 
 	if uint(ncenters) != idxcfg.Ivfflat.Lists {
 		return moerr.NewInternalErrorNoCtx("number of centroids in db != Nlist")
 	}
 
+	for i := range centroids {
+		if centroids[i] == nil {
+			centroids[i] = make([]T, idxcfg.Ivfflat.Dimensions)
+		}
+	}
+
 	bfidx, err := brute_force.NewBruteForceIndex[T](centroids, idxcfg.Ivfflat.Dimensions, metric.MetricType(idxcfg.Ivfflat.Metric), uint(elemsz))
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a potential panic when loading centroids if some are NULL in the database, as this would result in passing nil slices to the brute-force index constructor. The proposed fix of replacing nil centroids with zero vectors is a robust solution to prevent this crash.

Medium
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Review effort 3/5 size/XL Denotes a PR that changes [1000, 1999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants