FAQ: spaCy and Reproducibility #11169
Locked
polm
started this conversation in
Help: Best practices
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
It's hard to explain exactly what leads to any individual prediction in spaCy (see #3052), but we make an effort to keep all predictions reproducible - that is, the same input should reliably give the same output. However, due to the large number of low-level APIs involved in internal math, there are limits to to how consistent things can be.
In general, inference (predictions) in the same environment should always be reproducible. Usually keeping the same model version and same minor version of spaCy (like 3.2) are enough to ensure reproducible predictions, though in some cases different versions of OS, processor, Python, or dependencies can also affect predictions.
If you think you're getting inconsistent predictions, here are some things to check before assuming it's a low-level issue:
Training is a little more complicated. Within the same environment, training on CPU should always be reproducible with the same random seed. However, training with a GPU is not reproducible due to limitations on the underlying APIs, though the reported differences should be relatively small. See Models are not deterministic / reproducible on GPU for more details.
If some of this is not clear, or if you think you've found a bug related to consistency or reproducibility, please feel free to open an Issue or Discussion.
Beta Was this translation helpful? Give feedback.
All reactions