Position Embedding not needed with BERT? #11172

Larsdegroot · 2022-07-20T08:53:03Z

Larsdegroot
Jul 20, 2022

Let me start of with that i'm not sure if this is the right place for this question. This question is more focused on NLP and less about spacy. However i couldn't find a other active community to post this question in. If there is one please let me know that i'll take my question there.

I'm trying to change this rel component (from spacy) because it was meant more as a tutorial and not meant to be used in production.

I want to add more features to it, specifically the sequence of words between two entities. To add the information about the position of a token in a sentence you'd need to add something or do a operation to it's vector.
Right now i found two ways this might be done:

Simply concatenating the word vector with it's positional index
( which seem to be crude and should be avoided because you want all values in a vector to be in the same range. )
Applying the method used in the BERT model as positional encoding, laid out here

The question i have is that i'm planning to use a BERT model which already has some positional information in it in the form of positional encoding. Do i still have to add positional information as the input of the rel component?
And if yes what are common methods to add positional information to a word vector. i'm struggling to find concrete examples of this so any help is much appreciated!

Answered by polm

Jul 24, 2022

This discussion seems like it might fit in the HuggingFace forums, depending on how you approach the problem. Keep in mind that the actual Transformers implementation in spaCy is just HuggingFace Transformers, and spacy-transformers is a wrapper around that.

If you're using a CNN tok2vec in spaCy, you can specify the token position as one of the attributes in the model config. Because the actual BERT implementation is inside the HuggingFace library, I don't think there's a similar option to control how that's handled in the spaCy config.

If you want to add context, take a look at how the EntityLinker is implemented - it uses context to disambiguate entities. When adding context what you d…

View full answer

polm · 2022-07-24T08:43:20Z

polm
Jul 24, 2022

This discussion seems like it might fit in the HuggingFace forums, depending on how you approach the problem. Keep in mind that the actual Transformers implementation in spaCy is just HuggingFace Transformers, and spacy-transformers is a wrapper around that.

If you're using a CNN tok2vec in spaCy, you can specify the token position as one of the attributes in the model config. Because the actual BERT implementation is inside the HuggingFace library, I don't think there's a similar option to control how that's handled in the spaCy config.

If you want to add context, take a look at how the EntityLinker is implemented - it uses context to disambiguate entities. When adding context what you do is get a vector representation of the surrounding text. You don't need to explicitly represent positional indices to get a good context representation.

Since BERT already has positional information internally, I'm skeptical that adding a more explicit representation would help.

2 replies

Larsdegroot Aug 2, 2022
Author

Thank you for the answer!

I've been looking through the script of the entity linker to see how it includes context into it's model. I almost understand it except i can't seem to find out what one line of code does.

From entity_linker.py line 400 to 437.

def predict(self, docs: Iterable[Doc]) -> List[str]:
        """Apply the pipeline's model to a batch of docs, without modifying them.
        Returns the KB IDs for each entity in each doc, including NIL if there is
        no prediction.
        docs (Iterable[Doc]): The documents to predict.
        RETURNS (List[str]): The models prediction for each document.
        DOCS: https://spacy.io/api/entitylinker#predict
        """
        self.validate_kb()
        entity_count = 0
        final_kb_ids: List[str] = []
        xp = self.model.ops.xp
        if not docs:
            return final_kb_ids
        if isinstance(docs, Doc):
            docs = [docs]
        for i, doc in enumerate(docs):
            if len(doc) == 0:
                continue
            sentences = [s for s in doc.sents]
            # Looping through each entity (TODO: rewrite)
            for ent in doc.ents:
                sent_index = sentences.index(ent.sent)
                assert sent_index >= 0

                if self.incl_context:
                    # get n_neighbour sentences, clipped to the length of the document
                    start_sentence = max(0, sent_index - self.n_sents)
                    end_sentence = min(len(sentences) - 1, sent_index + self.n_sents)
                    start_token = sentences[start_sentence].start
                    end_token = sentences[end_sentence].end
                    sent_doc = doc[start_token:end_token].as_doc()
                    # currently, the context is the same for each entity in a sentence (should be refined)
                    sentence_encoding = self.model.predict([sent_doc])[0]
                    sentence_encoding_t = sentence_encoding.T # <---------------------------- i can't find out what method T is. 
                    sentence_norm = xp.linalg.norm(sentence_encoding_t)

... (function carries on, however this is not needed for understanding)

here is what i found out from digging though the code in entity_linker.py and model/entity_linker.py:

in the predict method of the pipeline factory (which is entity_linker.py) it creates a doc object of the sentence containing a entity (line 433). This is the sentence that will be used as context.

carrying on, in the next line (line 435) the forward function of the entity_linker model architecture (which lives here model/entity_linker.py) is used on this sentence. The forward function should return a list of thinc's Ragged objects (see code below).

However this doesn't make sense with the next line (line 436) sentence_encoding_t = sentence_encoding.T. which brings me to my question. What does the T method do? i've scanned all of the thinc api documentation looking for '.T' and can only find a single instance of a T method which doesn't bring light on it's function.

My best guess would be that it does some kind of Tensor transformation but i can't know for sure.

implementation of the entity linkers forward function from model/entity_linker.py :

def span_maker_forward(model, docs: List[Doc], is_train) -> Tuple[Ragged, Callable]:
    ops = model.ops
    n_sents = model.attrs["n_sents"]
    candidates = []
    for doc in docs:
        cands = []
        try:
            sentences = [s for s in doc.sents]
        except ValueError:
            # no sentence info, normal in initialization
            for tok in doc:
                tok.is_sent_start = tok.i == 0
            sentences = [doc[:]]
        for ent in doc.ents:
            try:
                # find the sentence in the list of sentences.
                sent_index = sentences.index(ent.sent)
            except AttributeError:
                # Catch the exception when ent.sent is None and provide a user-friendly warning
                raise RuntimeError(Errors.E030) from None
            # get n previous sentences, if there are any
            start_sentence = max(0, sent_index - n_sents)
            # get n posterior sentences, or as many < n as there are
            end_sentence = min(len(sentences) - 1, sent_index + n_sents)
            # get token positions
            start_token = sentences[start_sentence].start
            end_token = sentences[end_sentence].end
            # save positions for extraction
            cands.append((start_token, end_token))

        candidates.append(ops.asarray2i(cands))
    candlens = ops.asarray1i([len(cands) for cands in candidates])
    candidates = ops.xp.concatenate(candidates)
    outputs = Ragged(candidates, candlens)
    # because this is just rearranging docs, the backprop does nothing
    return outputs, lambda x: []

polm Aug 3, 2022

However this doesn't make sense with the next line (line 436) sentence_encoding_t = sentence_encoding.T. which brings me to my question. What does the T method do? i've scanned all of the thinc api documentation looking for '.T' and can only find a single instance of a T method which doesn't bring light on it's function.

.T transposes the matrix, following on mathematical notation. See the numpy docs, for example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Position Embedding not needed with BERT? #11172

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Position Embedding not needed with BERT? #11172

Uh oh!

Larsdegroot Jul 20, 2022

Replies: 1 comment · 2 replies

Uh oh!

polm Jul 24, 2022

Uh oh!

Larsdegroot Aug 2, 2022 Author

Uh oh!

polm Aug 3, 2022

Larsdegroot
Jul 20, 2022

Replies: 1 comment 2 replies

polm
Jul 24, 2022

Larsdegroot Aug 2, 2022
Author