Skip to content
Discussion options

You must be logged in to vote

If the keywords are very common words so that all of them can be represented as just one token each using the tokenizer, you can compare the probabilities directly by taking softmax of the logits predicted by the model. If some keywords span multiple tokens, it becomes tricker to compare, you can either use the sum of log probability that mathematically makes more sense, or just the average log probability which works better practically, or something in between like the length penalty used in equation 14 of this paper.

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by jongwook
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants