FAQ: spaCy and Reproducibility #11169

polm · 2022-07-20T05:42:53Z

polm
Jul 20, 2022

It's hard to explain exactly what leads to any individual prediction in spaCy (see #3052), but we make an effort to keep all predictions reproducible - that is, the same input should reliably give the same output. However, due to the large number of low-level APIs involved in internal math, there are limits to to how consistent things can be.

In general, inference (predictions) in the same environment should always be reproducible. Usually keeping the same model version and same minor version of spaCy (like 3.2) are enough to ensure reproducible predictions, though in some cases different versions of OS, processor, Python, or dependencies can also affect predictions.

If you think you're getting inconsistent predictions, here are some things to check before assuming it's a low-level issue:

Are you using the same model version?
Are your versions of Numpy / Torch / etc. the same?
Is it possible a spaCy bugfix modified your results?

Training is a little more complicated. Within the same environment, training on CPU should always be reproducible with the same random seed. However, training with a GPU is not reproducible due to limitations on the underlying APIs, though the reported differences should be relatively small. See Models are not deterministic / reproducible on GPU for more details.

If some of this is not clear, or if you think you've found a bug related to consistency or reproducibility, please feel free to open an Issue or Discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FAQ: spaCy and Reproducibility #11169

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

FAQ: spaCy and Reproducibility #11169

Uh oh!

polm Jul 20, 2022

Replies: 0 comments

polm
Jul 20, 2022