Qwen3ForCausalLM reranking (pre-support) #13957

huydt84 · 2025-06-01T15:44:34Z

Maybe Qwen team is going to release rerankers based on Qwen3ForCausalLM, and the way those models perform ranking is much similar to embedding

token_false_id = tokenizer.convert_tokens_to_ids("no")
token_true_id = tokenizer.convert_tokens_to_ids("yes")

def compute_logits(inputs, **kwargs):
    batch_scores = model(**inputs).logits[:, -1, :]
    true_vector = batch_scores[:, token_true_id]
    false_vector = batch_scores[:, token_false_id]
    batch_scores = torch.stack([false_vector, true_vector], dim=1)
    batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)
    scores = batch_scores[:, 1].exp().tolist()
    return scores

This is very different from reranking API supported by llama-cpp, so this scenario should be handled by /embedding endpoint

cc: @yuhao318

Now you can run by:

Start llama-cpp server: llama-server -m qwen-reranker.gguf --embedding --pooling none ...
Change your HuggingFace ranking code like the following. This is only the sample code:

import requests
...
pairs = [format_instruction(task, query, doc) for query, doc in zip(queries, documents)]

# Calculate embeddings
def get_logits_embeddings(content: list[str]):
    url = "http://localhost:8080/embeddings"
    headers = {"Content-Type": "application/json"}
    data = {
        "content": content
    }

    response = requests.post(url, headers=headers, json=data)
    json_response = response.json()
    return [json_response[i]['embedding'][-1] for i in range(len(json_response))]
    
# Get the scores
def compute_logits(embeddings):
    true_vector = embeddings[:, token_true_id]
    false_vector = embeddings[:, token_false_id]
    batch_scores = torch.stack([false_vector, true_vector], dim=1)
    batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)
    scores = batch_scores[:, 1].exp().tolist()
    return scores

embeddings = get_logits_embeddings(pairs)
print("scores: ", compute_logits(torch.tensor(embeddings)))

ngxson

This implementation is incorrect. The way Qwen3-Reranking work is by simply get the logits of yes and no token output and compare them. There is absolutely no need to patch the internal code of llama.cpp, as we already had llama_get_logits_ith for this purpose.

And we may not even need need to do anything, since we already support returning raw logits via API, client code can read the logits directly

ngxson · 2025-06-01T15:57:21Z

And we may not even need need to do anything, since we already support returning raw logits via API, client code can read the logits directly

llama-server is compatible with OAI logprobs

huydt84 · 2025-06-01T16:06:21Z

And we may not even need need to do anything, since we already support returning raw logits via API, client code can read the logits directly

So we will use completion API for this case?

ngxson · 2025-06-01T16:12:12Z

Yes and you also need the correct prompt

huydt84 · 2025-06-01T16:21:19Z

Since this implementation is incorrect, the PR is closed

qwen3 get embedding from logits

4d1ff87

huydt84 requested a review from ngxson as a code owner June 1, 2025 15:44

github-actions bot added examples server labels Jun 1, 2025

huydt-bti added 2 commits June 2, 2025 00:54

Merge branch 'master' into huydt/qwen3forcausallm-rerank

d032536

fix lint

6a1a9c9

ngxson requested changes Jun 1, 2025

View reviewed changes

huydt84 closed this Jun 1, 2025

yuhao318 mentioned this pull request Jun 4, 2025

Eval bug: Cannot load Qwen3 ranking models #13820

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3ForCausalLM reranking (pre-support) #13957

Qwen3ForCausalLM reranking (pre-support) #13957

Uh oh!

huydt84 commented Jun 1, 2025

Uh oh!

ngxson left a comment

Uh oh!

ngxson commented Jun 1, 2025

Uh oh!

huydt84 commented Jun 1, 2025

Uh oh!

ngxson commented Jun 1, 2025

Uh oh!

huydt84 commented Jun 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Qwen3ForCausalLM reranking (pre-support) #13957

Qwen3ForCausalLM reranking (pre-support) #13957

Uh oh!

Conversation

huydt84 commented Jun 1, 2025

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson commented Jun 1, 2025

Uh oh!

huydt84 commented Jun 1, 2025

Uh oh!

ngxson commented Jun 1, 2025

Uh oh!

huydt84 commented Jun 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants