Description
When you try to process documents with the Pipeline.Process(IEnumrable docs) method, a "Python.Runtime.PythonException: 'generator' object is not subscriptable" exception is thrown.
Reproduce
- Initialize spacy normally with Spacy.Initialize(...) and Spacy.For(...)
- After that call pipeline.Process(docs) with a list of more than one document
- While debugging, you'll see that the method will throw the mentioned exception
Bug
I'm pretty sure, i found the bug in this line in Spacy.Pipeline.cs:
var s_docs = _nlp.pipe(docs.Select(d => d.Value).ToArray());
for (int i = 0; i < docs.Count; i++)
{
SyncBack(s_docs[i], docs[i]);
}
Because nlp.pipe() returns a generator, s_docs[i] throws the exception, because a generator cannot be indexed in python.
Proposed Fix
Iterating over the generator instead of indexing it helped me fix the issue:
var s_docs = _nlp.pipe(docs.Select(d => d.Value).ToArray());
var i = 0;
foreach (var s_doc in s_docs)
{
SyncBack(s_doc, docs[i]);
i++;
}