Array bounds exceeded while searching for root word. This likely means the parse tree is in an invalid state. #9837
-
import spacy
import string
nlp = spacy.load("fr_core_news_sm")
punctuation = string.punctuation + '...' + '¿'+ '¡'+'。'+'!'+'?'+'…' + '।'
doc = nlp("Quand>>Force de police insuffisante>> Ou certains cas que nous ne voulons pas prendre")
wordList=["Quand",">>","Force","de","police","insuffisante",">>","Ou","certains","cas","que","nous","ne","voulons","pas","prendre"]
print("Before:", [token.text for token in doc])
i = 0
inx = 0
while i < len(wordList):
word = wordList[i]
#print (doc[inx].text)
if word == doc[inx].text.strip():
i += 1
inx += 1
elif doc[inx].text.strip() in word and inx < len(doc) -1:
#import pdb;pdb.set_trace()
with doc.retokenize() as retokenizer:
retokenizer.merge(doc[inx:inx+2])
elif word in doc[inx].text and i < len(wordList)-1:
splits = []
headList = []
posList = []
depList = []
#import pdb;pdb.set_trace()
word2 = doc[inx].text[len(word):]
splits = [word, word2]
if doc[inx].head != doc[inx]:
head = doc[inx].head
elif inx < len(doc) -1:
head = doc[inx+1]
else:
head = doc[inx -1]
headList = [head,head]
dep = doc[inx].dep_
pos = doc[inx].pos_
if word2 in punctuation:
dep = 'punct'
pos = 'PUNCT'
posList = [doc[inx].pos_,pos]
depList = [doc[inx].dep_,dep]
attrs = {"POS": posList,
"DEP": depList}
with doc.retokenize() as retokenizer:
retokenizer.split(doc[inx], splits, heads=headList, attrs=attrs)
print("After:", [token.text for token in doc]) spacy version:2.3.2 At the point when ">" after insuffisante is merged with ">" ,the following error pops up I am struggling to understand why is there issue in merging? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Something is going wrong with the heads set in the split which causes problems for the following merge. It looks like the split introduces a cycle in the dependency tree and also several internal edges labeled I think you have a cycle between "Quand", "de", and "police". You can check for cycles with |
Beta Was this translation helpful? Give feedback.
Something is going wrong with the heads set in the split which causes problems for the following merge. It looks like the split introduces a cycle in the dependency tree and also several internal edges labeled
ROOT
, which probably isn't what you intended.I think you have a cycle between "Quand", "de", and "police". You can check for cycles with
spacy.pipeline._parser_internals.nonproj.contains_cycle
, which takes a list of thetoken.head.i
values for a doc.