[Feature] Implement /v1/embeddings endpoint

## Motivation

The `/v1/embeddings` endpoint is a standard OpenAI API supported by vLLM, SGLang, and TGI. Many downstream tools (LangChain, LlamaIndex, RAG pipelines) depend on it to generate text embeddings.

Currently lmdeploy's `/v1/embeddings` is a stub that returns `Unsupported by turbomind`. The infrastructure to pass `last_hidden_state` through the pipeline already exists at the high level (`Response`, `EngineOutput`, `GenOut` all have the field), but the PyTorch engine's internal pipeline never populates it.

## Related resources

- OpenAI Embeddings API: https://platform.openai.com/docs/api-reference/embeddings
- vLLM implementation for reference: https://github.com/vllm-project/vllm
- lmdeploy already has: `EmbeddingsRequest`/`EmbeddingsResponse` protocol classes, `output_last_hidden_state` in `GenerationConfig`, TurboMind C++ engine support for `output_last_hidden_state`
- lmdeploy has related endpoints: `/v1/encode` (tokenization), `/pooling` (pooling API)

## Additional context

I have a working implementation on branch `feat/embeddings-endpoint` that:

1. Replaces the stub with a real endpoint that calls the engine with `max_new_tokens=0` + `output_last_hidden_state='all'`, then mean-pools the hidden states
2. Threads `last_hidden_states` through the PyTorch engine pipeline (`BatchedOutputs` → `InferOutput` → `EngineOutput`), since previously only TurboMind supported hidden state extraction
3. Supports both `float` and `base64` encoding formats per OpenAI spec

**Changes**: ~160 lines across 9 files (mostly plumbing existing types).

Before opening a PR, I'd like to confirm:
- Is this feature direction aligned with the project? (vs. focusing on the existing `/pooling` endpoint)
- Any concerns about the PyTorch engine hidden states pipeline changes?

Happy to open a PR if the direction is approved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Implement /v1/embeddings endpoint #4547

Motivation

Related resources

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Implement /v1/embeddings endpoint #4547

Description

Motivation

Related resources

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions