Skip to content

[DOC] Add LAST_TOKEN pooling mode to text embedding model documentation #12076

@aneesh-db

Description

@aneesh-db

What do you want to do?

  • Request a change to existing documentation
  • Add new documentation
  • Report a technical problem with the documentation
  • Other

Tell us about your request.

The pooling_mode parameter in the Register Model API documentation (_ml-commons-plugin/api/model-apis/register-model.md) currently lists supported values as mean, mean_sqrt_len, max, weightedmean, and cls.

A new lasttoken pooling mode is being added in ml-commons (opensearch-project/ml-commons#4711) to support decoder-only text embedding models (e.g., Qwen3-Embedding, GPT-style models) where the final non-padding token captures cumulative context through causal attention.

The documentation should be updated to:

  1. Add lasttoken to the list of supported pooling_mode values in the model_config object table
  2. Describe that lasttoken uses the last non-padding token's embedding, and is useful for decoder-only models where the final token captures cumulative context

Version: 3.4

What other resources are available?

Metadata

Metadata

Assignees

Labels

Backlog - DEVDeveloper assigned to issue is responsible for creating PR.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions