Dealing with hash issue in preparing .spacy object for spancat training #11827

egumasa · 2022-11-18T08:36:23Z

egumasa
Nov 18, 2022

Hi,

I am currently setting up an experiment with SpanCat component. I was able to train an initial model on my data with a cloned repo, so basic settings should be fine. Right now, I have an issue in converting another set of IOB data into .spacy object. To the initial dataset, I added additional sets of tags in the IOB data, the preprocessing script (based on preprocess_genia) spits out issues related to hash.

Here is what the iob data look like.

The	O	O	O	O	O
liberal	O	O	O	O	O
economic	O	O	O	O	O
system	O	O	O	O	O
he	O	O	O	O	O
so	O	O	O	O	O
despises	B-MONOGLOSS	O	O	O	O
,	O	O	O	O	O
which	O	O	O	O	O
keeps	B-MONOGLOSS	O	O	O	O
state	O	O	O	O	O
control	O	O	O	O	O
far	O	O	O	O	O
away	O	O	O	O	O
from	O	O	O	O	O
production	O	O	O	O	O
,	O	O	O	O	O
alienates	B-MONOGLOSS	O	O	O	O
workers	O	O	O	O	O
so	O	B-JUSTIFYING	O	O	O
they	O	O	O	O	O
do	B-DENY	O	O	O	O
not	I-DENY	O	O	O	O
see	I-DENY	O	O	O	O
that	O	O	O	O	O
as	O	O	O	O	O
a	O	O	O	O	O
class	O	O	O	O	O
,	O	O	O	O	O
they	O	O	O	O	O
really	O	O	O	O	O
live	B-MONOGLOSS	O	O	O	O
as	O	O	O	O	O
slaves	O	O	O	O	O
under	O	O	O	O	O
the	O	O	O	O	O
capitalist	O	O	O	O	O
system	O	O	O	O	O
.	O	O	O	O	O

It	O	O	O	O	O
is	B-MONOGLOSS	O	O	O	O
only	B-ENTERTAIN	O	O	O	O
when	I-ENTERTAIN	O	O	O	O
workers	I-ENTERTAIN	O	O	O	O
recognize	I-ENTERTAIN	O	O	O	O
that	I-ENTERTAIN	O	O	O	O
they	I-ENTERTAIN	O	O	O	O
are	I-ENTERTAIN	B-MONOGLOSS	O	O	O
part	I-ENTERTAIN	O	O	O	O
of	I-ENTERTAIN	O	O	O	O
an	I-ENTERTAIN	O	O	O	O
oppressed	I-ENTERTAIN	O	O	O	O
and	I-ENTERTAIN	O	O	O	O
alienated	I-ENTERTAIN	O	O	O	O
class	I-ENTERTAIN	O	O	O	O
that	O	O	O	O	O
they	O	O	O	O	O
can	B-ENTERTAIN	O	O	O	O
gain	O	O	O	O	O
class	O	O	O	O	O
consciousness	O	O	O	O	O
and	O	O	O	O	O
break	O	O	O	O	O
free	O	O	O	O	O
from	O	O	O	O	O
the	O	O	O	O	O
chains	O	O	O	O	O
of	O	O	O	O	O
capitalism	O	O	O	O	O
and	O	O	O	O	O
toward	O	O	O	O	O
a	O	O	O	O	O
communist	O	O	O	O	O
state	O	O	O	O	O
.	O	O	O	O	O

Thus	O	B-JUSTIFYING	O	O	O
,	O	O	O	O	O
Marx	O	B-SOURCES	O	O	O
is	B-ATTRIBUTE	O	O	O	O
arguing	I-ATTRIBUTE	O	O	O	O
that	O	O	O	O	O
it	O	O	O	O	O
is	B-MONOGLOSS	O	O	O	O
the	O	O	O	O	O
capitalist	O	O	O	O	O
system	O	O	O	O	O
that	O	O	O	O	O
perpetuates	B-MONOGLOSS	O	O	O	O
the	O	O	O	O	O
estrangement	O	O	O	O	O
of	O	O	O	O	O
men	O	O	O	O	O
and	O	O	O	O	O
prevents	B-MONOGLOSS	O	O	O	O
them	O	O	O	O	O
from	O	O	O	O	O
seeing	O	O	O	O	O
the	O	O	O	O	O
reality	O	O	O	O	O
of	O	O	O	O	O
their	O	O	O	O	O
circumstances	O	O	O	O	O
and	O	O	O	O	O
recognizing	O	O	O	O	O
their	O	O	O	O	O
oppression	O	O	O	O	O
.	O	O	O	O	O

-DOCSTART- -X- O O
He	O	O	O	O	O
proceeded	O	B-MONOGLOSS	O	O	O
,	O	O	O	O	O
in	O	O	O	O	O
response	O	O	O	O	O
,	O	O	O	O	O
to	O	O	O	O	O
stop	O	O	O	O	O
the	O	O	O	O	O
conversation	O	O	O	O	O
and	O	O	O	O	O
to	O	O	O	O	O
quiz	O	O	O	O	O
me	O	O	O	O	O
,	O	O	O	O	O
demanding	O	B-MONOGLOSS	O	O	O
that	O	O	O	O	O
I	O	O	O	O	O
name	O	B-MONOGLOSS	O	O	O
six	O	O	O	O	O
starting	O	O	O	O	O
offensive	O	O	O	O	O
players	O	O	O	O	O
for	O	O	O	O	O
the	O	O	O	O	O
Notre	O	O	O	O	O
Dame	O	O	O	O	O
2006	O	O	O	O	O
team	O	O	O	O	O
.	O	O	O	O	O

Two	O	O	O	O	O
weeks	O	O	O	O	O
later	O	O	O	O	O
when	O	O	O	O	O
back	O	O	O	O	O
at	O	O	O	O	O
school	O	O	O	O	O
in	O	O	O	O	O
Ann	O	O	O	O	O
Arbor	O	O	O	O	O
,	O	O	O	O	O
I	O	O	O	O	O
mentioned	O	B-MONOGLOSS	O	O	O
to	O	O	O	O	O
a	O	O	O	O	O
male	O	O	O	O	O
colleague	O	O	O	O	O
of	O	O	O	O	O
mine	O	O	O	O	O
that	O	O	O	O	O
I	O	O	O	O	O
had	O	B-MONOGLOSS	O	O	O
been	O	I-MONOGLOSS	O	O	O
invited	O	I-MONOGLOSS	O	O	O
to	O	O	O	O	O
a	O	O	O	O	O
"	O	O	O	O	O
gays	O	O	O	O	O
only	O	O	O	O	O
gathering	O	O	O	O	O
"	O	O	O	O	O
for	O	O	O	O	O
new	O	O	O	O	O
students	O	O	O	O	O
the	O	O	O	O	O
following	O	O	O	O	O
week	O	O	O	O	O
,	O	O	O	O	O
though	O	B-COUNTER	O	O	O
I	O	I-COUNTER	O	O	O
am	B-MONOGLOSS	I-COUNTER	O	O	O
heterosexual	O	I-COUNTER	O	O	O
.	O	O	O	O	O

In	O	O	O	O	O
response	O	O	O	O	O
,	O	O	O	O	O
he	B-SOURCES	O	O	O	O
joked	O	B-ATTRIBUTE	O	O	O
,	O	O	O	O	O
"	O	O	O	O	O
It	O	O	O	O	O
must	O	B-ENTERTAIN	O	O	O
be	O	O	O	O	O
the	O	O	O	O	O
football	O	O	O	O	O
thing	O	O	O	O	O
.	O	O	O	O	O
"	O	O	O	O	O

Here is the code I used to convert IOB data to the .spacy object: (again thank you for sharing the example project for this component, they are extremely helpful!!!)

from pathlib import Path
from typing import List

import typer
from spacy.tokens import Doc, DocBin, SpanGroup
from spacy.training.converters import conll_ner_to_docs
from wasabi import msg

DOC_DELIMITER = "-DOCSTART- -X- O O\n"


def parse_genia(data: str,
                span_key: str,
                num_levels: int = 4,
                doc_delimiter: str = DOC_DELIMITER) -> List[Doc]:
    """Parse GENIA dataset into spaCy docs

    Our strategy here is to reuse the conll -> ner method from
    spaCy and re-apply that n times. We don't want to write our
    own ConLL/IOB parser.

    Parameters
    ----------
    data: str
        The raw string input as read from the IOB file
    num_levels: int, default is 4
        Represents how many times a label has been nested. In
        GENIA, a label was nested four times at maximum.

    Returns
    -------
    List[Doc]
    """

    docs = data.split("\n\n")  #separate into sents
    iob_per_level = []
    for level in range(num_levels):
        doc_list = []
        for doc in docs:  #iterate each chunk
            tokens = [t for t in doc.split("\n") if t]  #tokens
            token_list = []
            for token in tokens:  #iterate tokens
                annot = token.split("\t")  #list of annotations
                # First element is always the token text
                text = annot[0]
                # subsequent layers are relevant annotations
                label = annot[level + 1]

                # "text label" as format
                _token = " ".join([text, label])
                token_list.append(_token)
            doc_list.append("\n".join(token_list))
        annotations = doc_delimiter.join(doc_list)
        iob_per_level.append(annotations)

    # We then copy all the entities from doc.ents into
    # doc.spans later on. But first, let's have a "canonical" docs
    # to copy into
    # conll_ner_to_docs internally identifies whether sentence segmentation is done
    docs_per_level = [list(conll_ner_to_docs(iob)) for iob in iob_per_level]
    docs_with_spans: List[Doc] = []

    for docs in zip(*docs_per_level):
        spans = [ent for doc in docs for ent in doc.ents]
        doc = docs[0]
        group = SpanGroup(doc, name=span_key, spans=spans)
        doc.spans[span_key] = group
        docs_with_spans.append(doc)

    return docs_with_spans


def parse_engagement_v2(data: str,
                        span_key: str,
                        num_levels: int = 4,
                        doc_delimiter: str = DOC_DELIMITER) -> List[Doc]:
    """Parse ENGAGEMENT dataset into spaCy docs
    This is a modified version of the genia_preprocess code:
    1) I included Doc delimiter to reflect natural doc boundaries

    Our strategy here is to reuse the conll -> ner method from
    spaCy and re-apply that n times. We don't want to write our
    own ConLL/IOB parser.

    Parameters
    ----------
    data: str
        The raw string input as read from the IOB file
    num_levels: int, default is 4
        Represents how many times a label has been nested. In
        GENIA, a label was nested four times at maximum.

    Returns
    -------
    List[Doc]
    """

    # docs = data.split("\n\n") #separate into sents
    docs = data.split(doc_delimiter)

    iob_per_level = []
    for level in range(num_levels):
        doc_list = []
        for doc in docs:  #iterate each chunk
            # print(doc)
            sent_list = []
            for sent in doc.split("\n\n"):
                tokens = [t for t in sent.split("\n") if t]  #tokens
                token_list = []
                for token in tokens:  #iterate tokens
                    annot = token.split("\t")  #list of annotations
                    # First element is always the token text
                    text = annot[0]
                    # text = text.replace("#", "_") #tested whether "#" was doing the trick
                    # subsequent layers are relevant annotations
                    label = annot[level + 1]

                    # "text label" as format
                    _token = " ".join([text, label])
                    token_list.append(_token)
                sent_list.append("\n".join(token_list))
            doc_list.append("\n\n".join(sent_list))
        annotations = doc_delimiter.join(doc_list)
        iob_per_level.append(annotations)

    # We then copy all the entities from doc.ents into
    # doc.spans later on. But first, let's have a "canonical" docs
    # to copy into
    # conll_ner_to_docs internally identifies whether sentence segmentation is done
    docs_per_level = [list(conll_ner_to_docs(iob)) for iob in iob_per_level]

    docs_with_spans: List[Doc] = []

    for docs in zip(*docs_per_level):
        for d in docs:
            print(d.ents)
        spans = [ent for doc in docs for ent in doc.ents]
        # print(spans)
        # print([span.label_ for span in spans])
        doc = docs[0]
        group = SpanGroup(doc, name=span_key, spans=spans)
        doc.spans[span_key] = group
        docs_with_spans.append(doc)

    return docs_with_spans


def main(input_path: Path, output_path: Path, span_key: str):
    msg.good(f"Processing Engagement dataset ")
    with input_path.open("r", encoding="utf-8") as f:
        data = f.read()

    docs = parse_engagement_v2(data, span_key=span_key, num_levels=3)
    # docs = parse_genia(data, span_key=span_key)
    doc_bin = DocBin(docs=docs)
    doc_bin.to_disk(output_path)

    msg.good(f"Processing Engagement dataset done")


if __name__ == "__main__":
    typer.run(main)

When I run this script. I get:

Traceback (most recent call last):
  File "/Users/masakieguchi/Dropbox/0_Projects/0_basenlp/SFLAnalyzer/Engagement_span_finder/scripts/preprocessing/preprocess_engagement_v2.py", line 160, in <module>
    typer.run(main)
  File "/Users/masakieguchi/opt/miniforge3/envs/spacy-exp/lib/python3.9/site-packages/typer/main.py", line 864, in run
    app()
  File "/Users/masakieguchi/opt/miniforge3/envs/spacy-exp/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/Users/masakieguchi/opt/miniforge3/envs/spacy-exp/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/masakieguchi/opt/miniforge3/envs/spacy-exp/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/masakieguchi/opt/miniforge3/envs/spacy-exp/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/masakieguchi/opt/miniforge3/envs/spacy-exp/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/masakieguchi/opt/miniforge3/envs/spacy-exp/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/masakieguchi/Dropbox/0_Projects/0_basenlp/SFLAnalyzer/Engagement_span_finder/scripts/preprocessing/preprocess_engagement_v2.py", line 153, in main
    doc_bin = DocBin(docs=docs)
  File "/Users/masakieguchi/opt/miniforge3/envs/spacy-exp/lib/python3.9/site-packages/spacy/tokens/_serialize.py", line 88, in __init__
    self.add(doc)
  File "/Users/masakieguchi/opt/miniforge3/envs/spacy-exp/lib/python3.9/site-packages/spacy/tokens/_serialize.py", line 126, in add
    self.strings.add(span.label_)
  File "spacy/tokens/span.pyx", line 809, in spacy.tokens.span.Span.label_.__get__
  File "spacy/strings.pyx", line 132, in spacy.strings.StringStore.__getitem__
KeyError: "[E018] Can't retrieve string for hash '13473611918682418374'. This usually refers to an issue with the `Vocab` or `StringStore`."

One of my guesses is that some of the labels never occurs in the first layer of the IOB data, and this may cause spacy not to be able to assign hash to labels. So, I randomly switched the position in the iob data, but this did not work either (maybe just because of the sheer probability).

If the lack of a label in the Spacy Vocabulary is an issue, is there any way we can set a default tag set into the spancat layer so that we do not have to deal with this issue? If not this was the source of the issue, I would appreciate any guesses why adding additional tags to the dataset caused an issue?

Thank you so much again for the awesome ecosystem you have been building in the NLP community. I really cannot do my work without SpaCy.

Answered by adrianeboyd

Nov 18, 2022

The problem is that these docs are coming back with different vocabs:

docs_per_level = [list(conll_ner_to_docs(iob, model=nlp)) for iob in iob_per_level]

I haven't tested this, but I think you can fix it by providing a single model (so they all use the same model vocab) to use for all the conversions:

nlp = spacy.blank("en")  # or some appropriate blank model
docs_per_level = [list(conll_ner_to_docs(iob, model=nlp)) for iob in iob_per_level]

View full answer

adrianeboyd · 2022-11-18T08:52:58Z

adrianeboyd
Nov 18, 2022

The problem is that these docs are coming back with different vocabs:

docs_per_level = [list(conll_ner_to_docs(iob, model=nlp)) for iob in iob_per_level]

I haven't tested this, but I think you can fix it by providing a single model (so they all use the same model vocab) to use for all the conversions:

nlp = spacy.blank("en")  # or some appropriate blank model
docs_per_level = [list(conll_ner_to_docs(iob, model=nlp)) for iob in iob_per_level]

9 replies

adrianeboyd Nov 18, 2022

I think the easiest way to do this is to reload the docs from a DocBin with the same vocab. A sketch of how to do this:

import spacy
from spacy.tokens import DocBin
from spacy.training.converters import conll_ner_to_docs

conll_data = """When    O
Sebastian       B-PERSON
Thrun   I-PERSON
started O
working O
on      O
self    O
-       O
driving O
cars    O
at      O
Google  B-ORG
in      O
2007    B-DATE
"""

docs = [list(conll_ner_to_docs(conll_data))[0] for _ in range(10)]
nlp = spacy.blank("en")
doc_bin = DocBin(docs=docs)
new_docs = list(doc_bin.get_docs(nlp.vocab))

for doc in new_docs:
    assert new_docs[0].vocab == doc.vocab

This has come up in other issues (#7872) and it would be easier if you could pass in the full model or the vocab. The individual converters are not consistent in terms of how this is handled.

egumasa Nov 18, 2022
Author

Thank you for pointing me in the right direction. This is a great step forward I think. The issue I am having right now is that the code you showed me is giving only the first element of the iob output. see the slicing in your code.

docs = [list(conll_ner_to_docs(conll_data))[0] for _ in range(10)] # this slicing by [0] gives me only the first sentence in the iob document?
nlp = spacy.blank("en")
doc_bin = DocBin(docs=docs)
new_docs = list(doc_bin.get_docs(nlp.vocab))

I thought about maybe flattening the entire list may work for the DocBin, but then now I have another issue of how to retrieve the original multilayered structure.

I found this because when I passed the newly created DocBin object to the subsequent pipeline, then what it used to return Doc was treated as Token, indicating the loss of one layer in the data structure.

def parse_engagement_v3(data: str,
                        span_key: str,
                        num_levels: int = 4,
                        doc_delimiter: str = DOC_DELIMITER,
                        nlp=None) -> List[Doc]:
    """Parse ENGAGEMENT dataset into spaCy docs

    Our strategy here is to reuse the conll -> ner method from
    spaCy and re-apply that n times. We don't want to write our
    own ConLL/IOB parser.

    Parameters
    ----------
    data: str
        The raw string input as read from the IOB file
    num_levels: int, default is 4
        Represents how many times a label has been nested. In
        GENIA, a label was nested four times at maximum.

    Returns
    -------
    List[Doc]
    """
    # docs = data.split("\n\n") #separate into sents
    docs = data.split(doc_delimiter)

    iob_per_level = []
    for level in range(num_levels):
        doc_list = []
        for doc in docs:  #iterate each chunk
            # print(doc)
            sent_list = []
            for sent in doc.split("\n\n"):
                tokens = [t for t in sent.split("\n") if t]  #tokens
                token_list = []
                for token in tokens:  #iterate tokens
                    annot = token.split("\t")  #list of annotations
                    # First element is always the token text
                    text = annot[0]
                    # text = text.replace("#", "_") #tested whether "#" was doing the trick
                    # subsequent layers are relevant annotations
                    label = annot[level + 1]

                    # "text label" as format
                    _token = " ".join([text, label])
                    token_list.append(_token)
                sent_list.append("\n".join(token_list))
            doc_list.append("\n\n".join(sent_list))
        annotations = doc_delimiter.join(doc_list)
        iob_per_level.append(annotations)

        # We then copy all the entities from doc.ents into
        # doc.spans later on. But first, let's have a "canonical" docs
        # to copy into
        # conll_ner_to_docs internally identifies whether sentence segmentation is done
    docs_per_level = [list(conll_ner_to_docs(iob))[0] for iob in iob_per_level]

   # this part is based on your suggestion
    nlp = spacy.blank("en")
    doc_bin = DocBin(docs=docs_per_level)
    new_docs = list(doc_bin.get_docs(nlp.vocab))
    docs_with_spans: List[Doc] = []

    for docs in zip(*new_docs):
        spans = [ent for doc in docs for ent in doc.ents] # this part breaks because "docs" is now a tuple of Tokens, not Doc.
        # print(spans)
        # print([span.label_ for span in spans])
        doc = docs[0]
        group = SpanGroup(doc, name=span_key, spans=spans)
        doc.spans[span_key] = group
        docs_with_spans.append(doc)

    return docs_with_spans

Originally it was as follows:

    docs_per_level = [list(conll_ner_to_docs(iob)) for iob in iob_per_level] # this gives me a nested list

    docs_with_spans: List[Doc] = []

    for docs in zip(*docs_per_level):
        for doc in docs:
            print(type(doc))
        spans = [ent for doc in docs for ent in doc.ents] # then this line works because the docs_per_level a list of docs with multiple sentences in it.
        # print(spans)
        # print([span.label_ for span in spans])
        doc = docs[0]
        group = SpanGroup(doc, name=span_key, spans=spans)
        doc.spans[span_key] = group
        docs_with_spans.append(doc)

    return docs_with_spans

egumasa Nov 18, 2022
Author

I think I was able to figure this out. I created DocBin for each layer but assigned the same nlp.vocab for each layer (based on your code).
Does this adjustment still keeps the effect of having the same vocab?

def parse_engagement_v3(data: str,
                        span_key: str,
                        num_levels: int = 4,
                        doc_delimiter: str = DOC_DELIMITER,
                        nlp=None) -> List[Doc]:
    """Parse ENGAGEMENT dataset into spaCy docs

    Our strategy here is to reuse the conll -> ner method from
    spaCy and re-apply that n times. We don't want to write our
    own ConLL/IOB parser.

    Parameters
    ----------
    data: str
        The raw string input as read from the IOB file
    num_levels: int, default is 4
        Represents how many times a label has been nested. In
        GENIA, a label was nested four times at maximum.

    Returns
    -------
    List[Doc]
    """
    # docs = data.split("\n\n") #separate into sents
    docs = data.split(doc_delimiter)

    iob_per_level = []
    for level in range(num_levels):
        doc_list = []
        for doc in docs:  #iterate each chunk
            # print(doc)
            sent_list = []
            for sent in doc.split("\n\n"):
                tokens = [t for t in sent.split("\n") if t]  #tokens
                token_list = []
                for token in tokens:  #iterate tokens
                    annot = token.split("\t")  #list of annotations
                    # First element is always the token text
                    text = annot[0]
                    # text = text.replace("#", "_") #tested whether "#" was doing the trick
                    # subsequent layers are relevant annotations
                    label = annot[level + 1]

                    # "text label" as format
                    _token = " ".join([text, label])
                    token_list.append(_token)
                sent_list.append("\n".join(token_list))
            doc_list.append("\n\n".join(sent_list))
        annotations = doc_delimiter.join(doc_list)
        iob_per_level.append(annotations)

        # We then copy all the entities from doc.ents into
        # doc.spans later on. But first, let's have a "canonical" docs
        # to copy into
        # conll_ner_to_docs internally identifies whether sentence segmentation is done
    docs_per_level = [list(conll_ner_to_docs(iob)) for iob in iob_per_level] # I removed index, to consider all the sentences in the iob document

    nlp = spacy.blank("en")

    # doc_bin = DocBin(docs=docs_per_level) # this is the previous version of code, which needs flattened list
    # new_docs = list(doc_bin.get_docs(nlp.vocab))

    new_docs = [] # holder for the new aligned docs
    for doc in docs_per_level:
        doc_bin = DocBin(docs=doc) # create DocBin
        new_doc = list(doc_bin.get_docs(nlp.vocab)) # assign nlp.vocab to each bin
        new_docs.append(new_doc) 

    docs_with_spans: List[Doc] = []

    for docs in zip(*new_docs):
        # print(type(docs))
        spans = [ent for doc in docs for ent in doc.ents]
        # print(spans)
        # print([span.label_ for span in spans])
        doc = docs[0]
        group = SpanGroup(doc, name=span_key, spans=spans)
        doc.spans[span_key] = group
        docs_with_spans.append(doc)

    return docs_with_spans

adrianeboyd Nov 19, 2022

Yes, you can use the DocBin step with the same nlp.vocab in as many places as you need. It will be more efficient if you process a batch of docs at once, but if speed isn't a concern then this should be fine.

Usually the convert methods are only used from the CLI, where the output .spacy is also just a DocBin, and then each time you load it with spacy train or other CLI commands, it gets read in with the current pipeline's nlp.vocab.

Still, this situation is really confusing if you're using the internal API and we should think about how to make this easier in general.

egumasa Nov 20, 2022
Author

Great! @adrianeboyd Thank you so much! I am very interested in the spacy's internal APIs, and sometimes my coding skills still do not catch up with the complexity of the program. I really appreciate your guidance and community support!

Uh oh!

Dealing with hash issue in preparing .spacy object for spancat training #11827

Uh oh!

egumasa Nov 18, 2022

Replies: 3 comments · 9 replies

Uh oh!

adrianeboyd Nov 18, 2022

Uh oh!

adrianeboyd Nov 18, 2022

Uh oh!

Uh oh!

egumasa Nov 18, 2022 Author

Uh oh!

Uh oh!

egumasa Nov 18, 2022 Author

Uh oh!

adrianeboyd Nov 19, 2022

Uh oh!

Uh oh!

egumasa Nov 20, 2022 Author

egumasa
Nov 18, 2022

Replies: 3 comments 9 replies

adrianeboyd
Nov 18, 2022

egumasa Nov 18, 2022
Author

egumasa Nov 18, 2022
Author

egumasa Nov 20, 2022
Author