Skip to content

Pipeline.Process throws PythonException: 'generator' object is not subscriptable when using nlp.pipe()Β #125

@TomNeggib

Description

@TomNeggib

Description

When you try to process documents with the Pipeline.Process(IEnumrable docs) method, a "Python.Runtime.PythonException: 'generator' object is not subscriptable" exception is thrown.

Reproduce

  1. Initialize spacy normally with Spacy.Initialize(...) and Spacy.For(...)
  2. After that call pipeline.Process(docs) with a list of more than one document
  3. While debugging, you'll see that the method will throw the mentioned exception

Bug

I'm pretty sure, i found the bug in this line in Spacy.Pipeline.cs:

var s_docs = _nlp.pipe(docs.Select(d => d.Value).ToArray());
for (int i = 0; i < docs.Count; i++)
{
SyncBack(s_docs[i], docs[i]);
}

Because nlp.pipe() returns a generator, s_docs[i] throws the exception, because a generator cannot be indexed in python.

Proposed Fix

Iterating over the generator instead of indexing it helped me fix the issue:
var s_docs = _nlp.pipe(docs.Select(d => d.Value).ToArray());
var i = 0;
foreach (var s_doc in s_docs)
{
SyncBack(s_doc, docs[i]);
i++;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions