Skip to content
Discussion options

You must be logged in to vote

There are some hard-coded symbols that are always accessible through the StringStore even though they aren't actually saved in the StringStore:

import spacy
x1 = spacy.strings.StringStore()
assert len(x1) == 0
assert ("IS_ALPHA" in x1) == True

The full list is here: https://github.com/explosion/spaCy/blob/4890db63399d24f088ff6978aa157a0e4672e2eb/spacy/symbols.pxd

This is clearly confusing and hopefully at some point it will be possible to treat all of them the same way as any other string and avoid this hard-coded enum, but for now that's why you see root in every StringStore.

The StringStore doesn't tokenize or analyze root.abc.com in any way, so you should be able to see:

assert ("root…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage
2 participants
Converted from issue

This discussion was converted from issue #5059 on December 11, 2020 00:43.