About the scoring mechanism

Hi，

Thank you for your excellent work on LFG and for making the code publicly available.

I'm currently studying your paper and code to reproduce your results. I have a question regarding the `score_clusters` function in `language_tools.py`.

In the paper, you mention that "logprobs" are inappropriate for representing task-relevant probability. Instead, the method described uses Detic (VLM) to extract textual descriptors, and then queries GPT multiple times (n_s samples) to approximate the task-relevant probability through averaged scores.

However, in the score_clusters implementation, it appears to directly use logprobs from a single GPT query:
```
logprob = choice["logprobs"]["token_logprobs"][0]
top_logprobs = choice["logprobs"]["top_logprobs"][0]
```

Could you clarify the reasoning behind this implementation choice? Is this perhaps a simplified version for efficiency, or am I misunderstanding how the method maps to the paper's description?

Thank you very much for your time and assistance!

Best regards,
Huang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the scoring mechanism #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About the scoring mechanism #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions