Array bounds exceeded while searching for root word. This likely means the parse tree is in an invalid state. #9837

swati1411 · 2021-12-07T10:57:03Z

swati1411
Dec 7, 2021

import spacy
import string

nlp = spacy.load("fr_core_news_sm")
punctuation = string.punctuation + '...' + '¿'+ '¡'+'。'+'！'+'？'+'…' +  '।'
doc = nlp("Quand>>Force de police insuffisante>> Ou certains cas que nous ne voulons pas prendre")
wordList=["Quand",">>","Force","de","police","insuffisante",">>","Ou","certains","cas","que","nous","ne","voulons","pas","prendre"]

print("Before:", [token.text for token in doc])
i = 0
inx = 0
while i < len(wordList):
    word = wordList[i]
    #print (doc[inx].text)
    if word == doc[inx].text.strip():
        i += 1
        inx += 1
    elif doc[inx].text.strip() in word and inx < len(doc) -1:
        #import pdb;pdb.set_trace()
        with doc.retokenize() as retokenizer:
            retokenizer.merge(doc[inx:inx+2])
    elif word in doc[inx].text and i < len(wordList)-1:
        splits = []
        headList = []
        posList = []
        depList = []
        #import pdb;pdb.set_trace()
        word2 = doc[inx].text[len(word):]
        splits = [word, word2]
        if doc[inx].head != doc[inx]:
            head = doc[inx].head
        elif inx < len(doc) -1:
            head = doc[inx+1]
        else:
            head = doc[inx -1]
        headList = [head,head]
        dep = doc[inx].dep_
        pos = doc[inx].pos_
        if word2 in punctuation:
            dep = 'punct'
            pos = 'PUNCT'
        posList = [doc[inx].pos_,pos]
        depList = [doc[inx].dep_,dep]
        attrs = {"POS": posList,
                "DEP": depList}
        with doc.retokenize() as retokenizer:
            retokenizer.split(doc[inx], splits, heads=headList, attrs=attrs) 
print("After:", [token.text for token in doc])

spacy version:2.3.2
python version:3.6.9
fr_core_news_sm:2.3.0

At the point when ">" after insuffisante is merged with ">" ,the following error pops up
File "merge.py", line 21, in
retokenizer.merge(doc[inx:inx+2])
File "_retokenize.pyx", line 120, in spacy.tokens._retokenize.Retokenizer.exit
File "_retokenize.pyx", line 179, in spacy.tokens._retokenize._merge
File "span.pyx", line 575, in spacy.tokens.span.Span.root.get
File "span.pyx", line 757, in spacy.tokens.span._count_words_to_root
RuntimeError: [E039] Array bounds exceeded while searching for root word. This likely means the parse tree is in an invalid state.

I am struggling to understand why is there issue in merging?

Answered by adrianeboyd

Dec 8, 2021

Something is going wrong with the heads set in the split which causes problems for the following merge. It looks like the split introduces a cycle in the dependency tree and also several internal edges labeled ROOT, which probably isn't what you intended.

I think you have a cycle between "Quand", "de", and "police". You can check for cycles with spacy.pipeline._parser_internals.nonproj.contains_cycle, which takes a list of the token.head.i values for a doc.

View full answer

adrianeboyd · 2021-12-08T14:30:41Z

adrianeboyd
Dec 8, 2021

Something is going wrong with the heads set in the split which causes problems for the following merge. It looks like the split introduces a cycle in the dependency tree and also several internal edges labeled ROOT, which probably isn't what you intended.

I think you have a cycle between "Quand", "de", and "police". You can check for cycles with spacy.pipeline._parser_internals.nonproj.contains_cycle, which takes a list of the token.head.i values for a doc.

1 reply

swati1411 Dec 24, 2021
Author

Thanks for replying,
The splitting is setting head in the wrong way.I need to set head in such a way that cycle is not introduced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Array bounds exceeded while searching for root word. This likely means the parse tree is in an invalid state. #9837

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Array bounds exceeded while searching for root word. This likely means the parse tree is in an invalid state. #9837

Uh oh!

swati1411 Dec 7, 2021

Replies: 1 comment · 1 reply

Uh oh!

adrianeboyd Dec 8, 2021

Uh oh!

swati1411 Dec 24, 2021 Author

swati1411
Dec 7, 2021

Replies: 1 comment 1 reply

adrianeboyd
Dec 8, 2021

swati1411 Dec 24, 2021
Author