Skip to content
Discussion options

You must be logged in to vote

TransformerModel currently doesn't support directly loading from a PyTorch checkpoint as this functionality is not supported by the HuggingFace transformers.AutoModel class.

One alternative would be to initialize the TransfomerModel with the same architecture as your GPT model, deserializing the weights from the PyTorch checkpoint and manually loading them into the internal HuggingFace transformer model:

# Pseudo-code

model = TransformerModel(name='xlm-roberta-base', ...)
checkpoint = torch.load(PATH_TO_CHECKPOINT)
hf_transformer = model.layers[0]
hf_transformer.load_state_dict(checkpoint['model_state_dict'])

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / transformer Feature: Transformer
2 participants