[DOC] Add LAST_TOKEN pooling mode to text embedding model documentation

**What do you want to do?**

- [x] Request a change to existing documentation
- [ ] Add new documentation
- [ ] Report a technical problem with the documentation
- [ ] Other 

**Tell us about your request.**

The `pooling_mode` parameter in the Register Model API documentation (`_ml-commons-plugin/api/model-apis/register-model.md`) currently lists supported values as `mean`, `mean_sqrt_len`, `max`, `weightedmean`, and `cls`.

A new `lasttoken` pooling mode is being added in ml-commons ([opensearch-project/ml-commons#4711](https://github.com/opensearch-project/ml-commons/pull/4711)) to support decoder-only text embedding models (e.g., Qwen3-Embedding, GPT-style models) where the final non-padding token captures cumulative context through causal attention.

The documentation should be updated to:
1. Add `lasttoken` to the list of supported `pooling_mode` values in the `model_config` object table
2. Describe that `lasttoken` uses the last non-padding token's embedding, and is useful for decoder-only models where the final token captures cumulative context

**Version:** 3.4

**What other resources are available?**
- ml-commons issue: https://github.com/opensearch-project/ml-commons/issues/4709
- ml-commons PR: https://github.com/opensearch-project/ml-commons/pull/4711

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] Add LAST_TOKEN pooling mode to text embedding model documentation #12076

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DOC] Add LAST_TOKEN pooling mode to text embedding model documentation #12076

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions