-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Embedding example in docs is incorrect: model.model() returns MLM logits, not protein embeddings
Description
The embedding example in the documentation does not match the actual behavior of model.model(...) and leads to both runtime errors and silently incorrect embeddings.
The docs show code similar to:
outputs = model.model(
input_ids=tokens["input_ids"],
attention_mask=tokens["attention_mask"]
)
cls_embedding = outputs[:, 0, :]But model.model(...) returns a MaskedLMOutput, not a tensor:
type(outputs) == lobster.model.lm_base._utils.MaskedLMOutput
So the example raises:
TypeError: tuple indices must be integers or slices, not tuple
When hidden states are enabled:
outputs = model.model(..., output_hidden_states=True, return_dict=True)
outputs.hidden_states[-1].shape == (1, 150, 408)
Suggested fix
Update the example to explicitly request hidden states and pool them, e.g.:
outputs = model.model(
input_ids=tokens["input_ids"],
attention_mask=tokens["attention_mask"],
output_hidden_states=True,
return_dict=True
)
hidden = outputs.hidden_states[-1]
reproduction
outputs = model.model(input_ids, attention_mask)
type(outputs) # MaskedLMOutput
outputs[:,0,:] # TypeError
outputs.logits[:,0,:].shape # (vocab_size)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels