using data debug: how to get which document is corrupted? #8221
-
I've ran into an issue after running a model for 1200 samples:
Now I'm trying to figure out which document caused the havoc. How can I make it to show where the problem exactly is? I used code with
I tried to check it manually:
and it looks good, no "I" w/o "B". |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
I'm having trouble finding it, but we recently had a similar issue caused by having Adding some output to help identify the document at issue here would be good, I'll have a look at that. |
Beta Was this translation helpful? Give feedback.
-
What I observe now, is that
runs smoothly, but
returns an error:
Any ideas how to debug from here? |
Beta Was this translation helpful? Give feedback.
I'm having trouble finding it, but we recently had a similar issue caused by having
max_length
not equal to 0 in the configs. If you're unlucky the splitting can split entities and cause issues. We recently changed the defaultmax_length
to fix that, but if your config isn't brand new that might be an issue.Adding some output to help identify the document at issue here would be good, I'll have a look at that.