Implementing a custom torch component and a preprocessing layer to handle Docs #11314

iashaheen · 2022-08-15T17:42:10Z

iashaheen
Aug 15, 2022

I am trying to implement a simple token classifier with pytorch and encapsulate it in a custom trainable component. I know that the model the component will use should take a list of Docs as an input. I tried to implement a simple thinc model that takes the transformer output stored in doc._.trf_data from the doc list and passes it to the torch model as a 2d tensor input and chaining the two together. I am attaching the code sample I am using for clarification. I am planning to use a transformer as a feature extractor while using the classifier. I also tried loading the en_core_web_trf pipeline as a side experiment to check the token.tensor and token.vector fields but they both do not contain any values. That's why I decided to access doc._.trf_data directly.

My question is: If I am to implement the preprocessor layer as follows, how can I implement the backprop callback to store the gradient in the doc object to be passed back to the transformer for training?
I am also open to suggestions if I am approaching this in the wrong way. All I need to do for now is get the transformer output and pass it to a pytorch model that takes 2d tensors as input and not a list of doc objects

@spacy.registry.architectures("LinearPreprocessor.v1")
def create_linear_preprocessor():

    def preprocessor_forward(
        model: Model[List[Doc], Floats2d],
        docs: List[Doc],
        is_train: bool,
    ) -> Tuple[Floats2d, Callable]:
        Y = []
        for doc in docs:
            embed_size = doc._.trf_data.model_output.last_hidden_state.shape[-1]
            Y.append(doc._.trf_data.model_output.last_hidden_state.reshape(-1, embed_size))
        Y = model.ops.asarray2f(np.concatenate(Y, axis=0))

        def backprop(dY):
            return docs #TODO: implement backprop

        return Y, backprop

    preprocessor_model = Model(
        "preprocessor",                     # string name of layer
        preprocessor_forward,               # forward function
    )
    return preprocessor_model


@spacy.registry.architectures("TorchFeedForward.v1")
def create_torch_feedforward(
    nO: int,            # lables_number
    width: int,         # hidden_size
    hidden_width: int,
    dropout: float
) -> Model[Floats2d, Floats2d]:
    torch_model = nn.Sequential(
        nn.Linear(width, hidden_width),
        nn.Dropout(dropout),
        nn.ReLU(),
        nn.Linear(hidden_width, nO),
        nn.Softmax(dim=-1) if nO > 1 else nn.Sigmoid()
    )  # (batch_size x sequence_length, hidden_size) -> (batch_size x sequence_length, lables_number)
    return PyTorchWrapper(torch_model)


@spacy.registry.architectures("TokenClassifierModel.v1")
def create_model(
    preprocessor: Model[List[Doc], Floats2d],
    classifier: Model[Floats2d, Floats2d],
) -> Model[List[Doc], Floats2d]:
    model = chain(preprocessor, with_array(classifier))
    return model

Answered by polm

Aug 16, 2022

It looks like you're on the right track, but for backprop you don't put the gradient in the Docs, it's just a return value of your backprop function. It might be helpful to look at this Thinc tutorial if you haven't seen it.

If you have a function that actually takes Docs as input, there is no gradient to return, because Docs are not a model that's learned, they're just input data. The gradient you have would be relative to tok2vec output, but if you're freezing your Transformer (for feature extraction) then you can just return an empty gradient. If you actually want to be able to update the Transformer, then you can just return a gradient of the same type and shape as the input to forwar…

View full answer

polm · 2022-08-16T05:30:48Z

polm
Aug 16, 2022

It looks like you're on the right track, but for backprop you don't put the gradient in the Docs, it's just a return value of your backprop function. It might be helpful to look at this Thinc tutorial if you haven't seen it.

If you have a function that actually takes Docs as input, there is no gradient to return, because Docs are not a model that's learned, they're just input data. The gradient you have would be relative to tok2vec output, but if you're freezing your Transformer (for feature extraction) then you can just return an empty gradient. If you actually want to be able to update the Transformer, then you can just return a gradient of the same type and shape as the input to forward your forward pass - the "backward" is really just the opposite of the "forward".

It may also be helpful to look at the standard models and see how they handle the tok2vec, rather than handling Docs directly, to get an idea of this. You can find them here.

https://github.com/explosion/spaCy/tree/master/spacy/ml/models

5 replies

iashaheen Aug 16, 2022
Author

Thank you for the reply Paul. I am not sure I completely understand your point.

It looks like you're on the right track, but for backprop you don't put the gradient in the Docs, it's just a return value of your backprop function. It might be helpful to look at this Thinc tutorial if you haven't seen it.

I have checked the tutorial and the Thinc Model API documentation. From what I understand (and please correct me if I am wrong) the backprop callback should have the same (input - output) type and shape as the forward but opposite (this is what you referred to as well).

If you have a function that actually takes Docs as input, there is no gradient to return, because Docs are not a model that's learned, they're just input data.

The problem is, I do have a function preprocessor_forward that does exactly that. In this case, its backprop callback should return an object of type List[Doc] for compatibility. The gradient however is a Floats2d object. If I pass it down as it is, it will not work from what I understand. If I don't, how would the transformer component update its weights.
The reason I have this preprocessor_forward is to extract the transformer component data and reshape it.

It may also be helpful to look at the standard models and see how they handle the tok2vec, rather than handling Docs directly, to get an idea of this. You can find them here.

I see that the models take a different approach here by chaining the tok2vec directly. I am not sure however that this will work for my case. The tok2vec produces a List[Floats2d], Can I pass it directly to a torch model given that the list can have variable size Floats2d objects? Will the output be the same as the data stored in doc._.trf_data.model_output.last_hidden_state or the same as the one stored in doc.tensor. If it is the latter, the en_core_web_trf pipeline I tried did not have any values in the tensor object. Even if it did and I can't access it outside of another component, I would have loved to retained the transformer's raw output on wordpeice level if possible (hence why I am using a preprocessing layer).

polm Aug 17, 2022

From what I understand (and please correct me if I am wrong) the backprop callback should have the same (input - output) type and shape as the forward but opposite (this is what you referred to as well).

This is basically correct.

The problem is, I do have a function preprocessor_forward that does exactly that. In this case, its backprop callback should return an object of type List[Doc] for compatibility. The gradient however is a Floats2d object. If I pass it down as it is, it will not work from what I understand.

If you have a layer that takes a list of Docs as input and generates some value, it is technically correct that the backward function should also return a list of Docs. But since there's no gradient for Docs it's fine to just return an empty list, since there's no further processing to be done.

You should be using chain() or something, as is done with the tok2vec in the standard spaCy models, to connect the Transformer to later parts of the pipeline - if you're only modifying the input to the Transformer then chain can take care of backprop later in the pipeline. (The model in the standard components is called tok2vec, but the same code is used for transformers, with the details handled by spacy-transformers so the API is the same.)

Will the output be the same as the data stored in doc._.trf_data.model_output.last_hidden_state or the same as the one stored in doc.tensor.

The output of the forward pass is basically the same as doc._.trf_data, you can check the output of a model like this:

import spacy
nlp = spacy.load("en_core_web_trf")
trf = nlp.get_pipe("transformer")
doc = nlp.make_doc("This is a test") # use just the tokenizer
out, backprop = trf.model([doc], is_train=False) # this is a forward pass

I would have loved to retained the transformer's raw output on wordpeice level if possible (hence why I am using a preprocessing layer).

spacy-transformers is designed to manage the Doc/wordpiece mapping so you don't have to worry about it. If you want to work with the wordpieces directly, the Thinc tutorials that don't use spaCy should be a better reference.

It sounds like the structure of your model may be similar to the span resolver in the in-development coref model; it's not merged yet, but you can look at an example of a layer that takes docs as input here.

iashaheen Aug 18, 2022
Author

Okay I think I have been thinking about this the wrong way. I though of the tok2vec (transformer in my case) as a separate pipeline component taking List[Doc] and returning List[Doc] with docs tokenized. My component would then take this list of docs and work with it separately. From what I understand now, the tok2vec is a shared component and the model should use it as the first preprocessing layer.
The code you shared gave me an idea of what to do in case I need something besides the tokenized sentences as an input to my model so thank you for that.
However, now I don't think I can access the doc._.trf_data in my component and I should use the output of the tok2vec (transformer) as is. If there is any other way you can think of that I can pass the doc._.trf_data to my custom torch model please let me know. If not I just have one more question.
The tok2vec component returns a List[Floats2d]. I am assuming the list doesn't contain elements of uniform shape (uneven sentence length). If that's the case, will chaining the tok2vec and the torch model be enough to handle the non uniform shape of the input to the model?

polm Aug 19, 2022

However, now I don't think I can access the doc._.trf_data in my component and I should use the output of the tok2vec (transformer) as is.

That's correct. Normally all you want is the embeddings, so that's enough. Can you clarify what other part of the trf_data you need? Upthread you mentioned wanting the last_hidden_state or wordpieces, but normally it should be fine to just work with the embeddings spaCy gives you.

You can have a component that calls another component internally and gets the full output Docs, but then you also have to manually handle the backprop. (It's also valid to not backprop and just use the input model for feature extraction.) Because this will be more complicated, if you want to do it, I recommend you do the simple thing first to get the pipeline working, and then adjust things from there.

The tok2vec component returns a List[Floats2d]. I am assuming the list doesn't contain elements of uniform shape (uneven sentence length). If that's the case, will chaining the tok2vec and the torch model be enough to handle the non uniform shape of the input to the model?

It's correct that each Floats2d will have a variable size corresponding to the number of tokens in the input doc. You'll need to deal with variable sizes in your component somehow, either directly or by preprocessing with padding, truncation, averaging, etc.

iashaheen Aug 20, 2022
Author

That's correct. Normally all you want is the embeddings, so that's enough. Can you clarify what other part of the trf_data you need? Upthread you mentioned wanting the last_hidden_state or wordpieces, but normally it should be fine to just work with the embeddings spaCy gives you.

Yes, that's basically it. I will try using the embeddings and see if the model performs well.

It's correct that each Floats2d will have a variable size corresponding to the number of tokens in the input doc. You'll need to deal with variable sizes in your component somehow, either directly or by preprocessing with padding, truncation, averaging, etc.

Will do that. Thank you so much for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implementing a custom torch component and a preprocessing layer to handle Docs #11314

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Implementing a custom torch component and a preprocessing layer to handle Docs #11314

Uh oh!

iashaheen Aug 15, 2022

Replies: 1 comment · 5 replies

Uh oh!

polm Aug 16, 2022

Uh oh!

iashaheen Aug 16, 2022 Author

Uh oh!

Uh oh!

polm Aug 17, 2022

Uh oh!

iashaheen Aug 18, 2022 Author

Uh oh!

polm Aug 19, 2022

Uh oh!

iashaheen Aug 20, 2022 Author

iashaheen
Aug 15, 2022

Replies: 1 comment 5 replies

polm
Aug 16, 2022

iashaheen Aug 16, 2022
Author

iashaheen Aug 18, 2022
Author

iashaheen Aug 20, 2022
Author