Skip to content
Discussion options

You must be logged in to vote

You don't show the part where you save the file, but if you are saving pretest_data as json to a file that won't work because it's not jsonl, it's just a JSON blob. You can tell because it starts with a [ and probably doesn't have newlines.

You should do something like this:

with open("data.jsonl", "w") as outfile:
    for item in cleaned_data:
        text = item["text"]
        outfile.write(json.dumps({"text":text}) + "\n")

If you want to make a spaCy file see the serialization docs.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@lalvaradop
Comment options

@polm
Comment options

Answer selected by polm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / training Feature: Training utils, Example, Corpus and converters
2 participants
Converted from issue

This discussion was converted from issue #10870 on May 30, 2022 05:12.