Skip to content

Release 4.0.0 of datasets is incompatible #80

@MarTnquesada

Description

@MarTnquesada

It seems like the latest datasets release conflicts with span_marker.tokenizer.py. This conflicts with the dependency transformers>=4.23.0,<5, since older versions of transformers' tokenizer seem to be incompatible with the new datasets. This prevents several methods like SpanMarkerModel.from_pretrained(...).predict from working independently of the input format, and instead always showing the following error:

ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

For now users can fix it by specifically selecting an older datasets version (i.e. 3.6.0) but I would suggest having the datasets dependency be changed to datasets>=2.14.0,<4. Does that sound alright (I can open a PR for it) or were you thinking on moving to a newer transformers version instead @tomaarsen ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions