Skip to content

KNN filters on metadata of sibling nested documentsΒ #128803

@tihom88

Description

@tihom88

Description

Description

Currently KNN filters have a very limited scope. #106994 and #113949 enhances KNN filters.

We have another use case for KNN filters. We have two sibling nested documents and we want to prefilter documents based on metadata in sibling nested document before KNN query.
Here is example:

MAPPING:

# Create index with the mapping
PUT /my-test-index
{
  "mappings": {
    "properties": {
      "metadata": {
        "type": "nested",
        "properties": {
          "key": { "type": "keyword" },
          "value": { "type": "keyword" }
        }
      },
      "vectors": {
        "type": "nested",
        "properties": {
          "id": { "type": "keyword" },
          "vector": { 
            "type": "dense_vector", 
            "dims": 3,
            "index": true,
            "similarity": "cosine"
          }
        }
      },
      "title": { "type": "text" },
      "description": { "type": "text" }
    }
  }
}

# Example documents

# Document 1
POST /my-test-index/_doc/1
{
  "title": "Machine Learning Basics",
  "description": "Introduction to machine learning concepts",
  "metadata": [
    { "key": "category", "value": "education" },
    { "key": "level", "value": "beginner" }
  ],
  "vectors": [
    { "id": "concept_vec", "vector": [0.5, 0.2, 0.9] }
  ]
}

# Document 2
POST /my-test-index/_doc/2
{
  "title": "Advanced Neural Networks",
  "description": "Deep dive into neural network architectures",
  "metadata": [
    { "key": "category", "value": "education" },
    { "key": "level", "value": "advanced" }
  ],
  "vectors": [
    { "id": "concept_vec", "vector": [0.8, 0.1, 0.6] }
  ]
}

# Document 3
POST /my-test-index/_doc/3
{
  "title": "Data Visualization Techniques",
  "description": "Methods for effective data visualization",
  "metadata": [
    { "key": "category", "value": "visualization" },
    { "key": "level", "value": "intermediate" }
  ],
  "vectors": [
    { "id": "concept_vec", "vector": [0.3, 0.7, 0.2] }
  ]
}

# Document 4
POST /my-test-index/_doc/4
{
  "title": "Natural Language Processing",
  "description": "Text analysis and language models",
  "metadata": [
    { "key": "category", "value": "nlp" },
    { "key": "level", "value": "intermediate" }
  ],
  "vectors": [
    { "id": "concept_vec", "vector": [0.9, 0.4, 0.3] }
  ]
}

We want to filter knn results based on filters based on nested sibling document "metadata"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions