I’m not sure this should be considered a bug, but it seems that when duplicate sentence IDs appear in the corpus, only one of the sentences is preserved. It might be helpful to raise an exception in such cases, since otherwise users may struggle to understand why some sentences are missing from the parsed corpus—or worse, keep working without realizing that part of the corpus is being ignored.