Batch embedding averaging is incorrect for batch_size > 1

In `fastchat/serve/model_worker.py`, the `get_embeddings` function calculates embeddings incorrectly when processing multiple inputs at once.

`sum_embeddings` is per-sequence but `token_num` sums tokens across the entire batch:

    sum_embeddings = torch.sum(masked_embeddings, dim=1)  # [batch_size, hidden_dim]
    token_num = torch.sum(attention_mask).item()           # scalar across ALL sequences

Each sequence should be divided by its own token count:

    token_counts = attention_mask.sum(dim=1, keepdim=True)  # [batch_size, 1]
    mean_embeddings = sum_embeddings / token_counts

This silently returns wrong embeddings when batch_size > 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch embedding averaging is incorrect for batch_size > 1 #3785

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batch embedding averaging is incorrect for batch_size > 1 #3785

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions