Implementing a custom torch component and a preprocessing layer to handle Docs #11314
-
I am trying to implement a simple token classifier with pytorch and encapsulate it in a custom trainable component. I know that the model the component will use should take a list of Docs as an input. I tried to implement a simple thinc model that takes the transformer output stored in My question is: If I am to implement the preprocessor layer as follows, how can I implement the backprop callback to store the gradient in the doc object to be passed back to the transformer for training? @spacy.registry.architectures("LinearPreprocessor.v1")
def create_linear_preprocessor():
def preprocessor_forward(
model: Model[List[Doc], Floats2d],
docs: List[Doc],
is_train: bool,
) -> Tuple[Floats2d, Callable]:
Y = []
for doc in docs:
embed_size = doc._.trf_data.model_output.last_hidden_state.shape[-1]
Y.append(doc._.trf_data.model_output.last_hidden_state.reshape(-1, embed_size))
Y = model.ops.asarray2f(np.concatenate(Y, axis=0))
def backprop(dY):
return docs #TODO: implement backprop
return Y, backprop
preprocessor_model = Model(
"preprocessor", # string name of layer
preprocessor_forward, # forward function
)
return preprocessor_model
@spacy.registry.architectures("TorchFeedForward.v1")
def create_torch_feedforward(
nO: int, # lables_number
width: int, # hidden_size
hidden_width: int,
dropout: float
) -> Model[Floats2d, Floats2d]:
torch_model = nn.Sequential(
nn.Linear(width, hidden_width),
nn.Dropout(dropout),
nn.ReLU(),
nn.Linear(hidden_width, nO),
nn.Softmax(dim=-1) if nO > 1 else nn.Sigmoid()
) # (batch_size x sequence_length, hidden_size) -> (batch_size x sequence_length, lables_number)
return PyTorchWrapper(torch_model)
@spacy.registry.architectures("TokenClassifierModel.v1")
def create_model(
preprocessor: Model[List[Doc], Floats2d],
classifier: Model[Floats2d, Floats2d],
) -> Model[List[Doc], Floats2d]:
model = chain(preprocessor, with_array(classifier))
return model |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
It looks like you're on the right track, but for backprop you don't put the gradient in the Docs, it's just a return value of your backprop function. It might be helpful to look at this Thinc tutorial if you haven't seen it. If you have a function that actually takes Docs as input, there is no gradient to return, because Docs are not a model that's learned, they're just input data. The gradient you have would be relative to tok2vec output, but if you're freezing your Transformer (for feature extraction) then you can just return an empty gradient. If you actually want to be able to update the Transformer, then you can just return a gradient of the same type and shape as the input to forward your forward pass - the "backward" is really just the opposite of the "forward". It may also be helpful to look at the standard models and see how they handle the tok2vec, rather than handling Docs directly, to get an idea of this. You can find them here. https://github.com/explosion/spaCy/tree/master/spacy/ml/models |
Beta Was this translation helpful? Give feedback.
It looks like you're on the right track, but for backprop you don't put the gradient in the Docs, it's just a return value of your backprop function. It might be helpful to look at this Thinc tutorial if you haven't seen it.
If you have a function that actually takes Docs as input, there is no gradient to return, because Docs are not a model that's learned, they're just input data. The gradient you have would be relative to tok2vec output, but if you're freezing your Transformer (for feature extraction) then you can just return an empty gradient. If you actually want to be able to update the Transformer, then you can just return a gradient of the same type and shape as the input to forwar…