Skip to content

Could not locate any .jsonl files in path 'pretest_data.spacy'. How can i make a file for pretraining in spacy 3? #10870

@lalvaradop

Description

@lalvaradop

Hello,

First of all, thank you very much for all your work and the new features supported by v3. I'm having trouble with the pre-training data.

This is what explains espaciy for pretraining .
The raw text can be provided in spaCy’s binary .spacy format consisting of serialized Doc objects or as a JSONL (newline-delimited JSON) with a key "text" per entry. This allows the data to be read in line by line, while also allowing you to include newlines in the texts.

{"text": "Can I ask where you work now and what you do, and if you enjoy it?"}
{"text": "They may just pull out of the Seattle market completely, at least until they have autonomous vehicles."}

There is something I am not understanding well
My code

pretest_data = []

#Creating data
for i in range(len(cleaned_data)):
text = data[i]['text']

doc = nlp.make_doc(text) # create doc object from text

pretest_data.append({'text': doc})

I have tried to pass the pretest_data to .jsonl and/or .spacy, but it always gives me the same error.

Error
python3.9/site-packages/spacy/training/corpus.py:80: UserWarning: [W090] Could not locate any .jsonl files in path 'pretest_data.spacy'.
warnings.warn(Warnings.W090.format(path=orig_path, format=file_type))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions