Hello,
First of all, thank you very much for all your work and the new features supported by v3. I'm having trouble with the pre-training data.
This is what explains espaciy for pretraining .
The raw text can be provided in spaCy’s binary .spacy format consisting of serialized Doc objects or as a JSONL (newline-delimited JSON) with a key "text" per entry. This allows the data to be read in line by line, while also allowing you to include newlines in the texts.
{"text": "Can I ask where you work now and what you do, and if you enjoy it?"}
{"text": "They may just pull out of the Seattle market completely, at least until they have autonomous vehicles."}
There is something I am not understanding well
My code
pretest_data = []
#Creating data
for i in range(len(cleaned_data)):
text = data[i]['text']
doc = nlp.make_doc(text) # create doc object from text
pretest_data.append({'text': doc})
I have tried to pass the pretest_data to .jsonl and/or .spacy, but it always gives me the same error.
Error
python3.9/site-packages/spacy/training/corpus.py:80: UserWarning: [W090] Could not locate any .jsonl files in path 'pretest_data.spacy'.
warnings.warn(Warnings.W090.format(path=orig_path, format=file_type))