Skip to content

[PROPOSAL] LLM judgment generation sends vector embedding fields to LLM, wasting tokens and bandwidth #403

@martin-gaievski

Description

@martin-gaievski

When the LLM Judgment API (PUT /judgments with type: LLM_JUDGMENT) generates relevance ratings, it retrieves documents via search queries and sends the document _source content to the LLM for evaluation. When contextFields is not specified by the user, the entire _source is sent — including vector embedding fields (e.g., knn_vector, dense_vector).

Vector embedding fields are arrays of hundreds or thousands of floating-point numbers (commonly 768 or 1536 dimensions) that represent semantic meaning in a format only useful for vector similarity computation. These fields:

  • Add no value for LLM relevance judgment — the LLM cannot interpret raw embedding vectors
  • Consume significant tokens — a single 768-dim vector serializes to ~3,000+ tokens
  • Waste network bandwidth — especially impactful at scale
  • May cause token limit errors — pushing documents over the configured tokenLimit

Impact at scale

For a typical hybrid search setup (text + neural) with 768-dimension embeddings:

Scenario Without vectors With vectors Waste
Per document payload ~500 bytes ~6,500+ bytes +1,200%
1 query × 10 docs ~5 KB ~65 KB +60 KB
1,000 queries × 100 docs ~50 MB ~650 MB ~600 MB
LLM tokens per query (100 docs) ~12,500 ~312,500 ~300,000 wasted

With expandCoverage=true (which ~doubles document count per query), the waste is amplified 2×.

Root cause

In LlmJudgmentsProcessor.getContextSource():

private String getContextSource(SearchHit hit, List<String> contextFields) {
    if (contextFields != null && !contextFields.isEmpty()) {
        // SAFE: Only specified fields are included
        Map<String, Object> filteredSource = new HashMap<>();
        for (String field : contextFields) {
            if (sourceAsMap.containsKey(field)) {
                filteredSource.put(field, sourceAsMap.get(field));
            }
        }
        return OBJECT_MAPPER.writeValueAsString(filteredSource);
    }
    // PROBLEM: Returns ENTIRE _source including vector fields
    return hit.getSourceAsString();
}

When contextFields is not provided (which is common — it's an optional parameter), the full _source is serialized and sent to the LLM. For indices with neural search embeddings, this includes large float arrays like:

{
  "title": "Wireless Headphones",
  "title_embedding": [0.0234, -0.1567, 0.0891, ... /* 768 floats */],
  "description": "High quality noise cancelling..."
}

Existing mitigation

Users can work around this by specifying contextFields in their judgment request:

{
  "type": "LLM_JUDGMENT",
  "contextFields": ["title", "description", "category"],
  ...
}

However, this requires users to know about the issue and manually list all relevant fields while excluding vector fields.

Proposed solution

Auto-exclude vector-like fields when contextFields is not specified.

In the getContextSource() fallback path, detect and skip fields whose values are large numeric arrays (heuristic: List<Number> with length > 32):

// When no contextFields specified, auto-exclude embedding/vector fields
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
Map<String, Object> filteredSource = new HashMap<>();
for (Map.Entry<String, Object> entry : sourceAsMap.entrySet()) {
    Object value = entry.getValue();
    if (isLikelyVectorField(value)) {
        continue; // Skip embedding fields
    }
    filteredSource.put(entry.getKey(), value);
}
return OBJECT_MAPPER.writeValueAsString(filteredSource);

Where isLikelyVectorField checks:

private boolean isLikelyVectorField(Object value) {
    if (value instanceof List) {
        List<?> list = (List<?>) value;
        return list.size() > 32 
            && !list.isEmpty() 
            && list.get(0) instanceof Number;
    }
    return false;
}

Additional improvements:

  • Log a warning when large array fields are detected and auto-excluded, for user visibility
  • Consider querying the index mapping to identify knn_vector typed fields for a more precise exclusion

Alternative approaches

  1. _source excludes in search request — modify search request builder to add _source: { excludes: ["*_embedding", "*_vector"] }. Relies on field naming conventions.

  2. Query index mapping — before searching, query the index mapping to discover all knn_vector/dense_vector typed fields and exclude them from _source. Most precise but adds an extra API call per index.

  3. Introduce excludeFields parameter — add a new optional parameter to the judgment API that lets users specify fields to exclude. Complementary to contextFields.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions