Why the move to _ some methods on token in spacy 3? #9937
-
I'm following along with a Youtube tutorial that has great content but is a little dated. One of the code snippets includes: def has_golang(doc):
for t in doc:
if t.lower_ in ["go", "golang"]:
if t.pos_ == "NOUN":
return True
return False ...which fails because t.lower, somewhat confusingly now returns a number (the hash / vector?), instead of the lower case variant of the token. As someone very new to spacy and somewhat inexperienced with Python, I'm wondering what the rationale is for this? Is it to make vector comparisons work as expected? Am I right in interpreting the underscore at the end as indicating this method is internal and should not be relied upon? Thanks for the incredible library, really enjoying it! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The code in the video (time link), including as you have pasted it, uses The underscore methods are for token attributes that are strings. For efficiency purposes they are stored as hashes (integers), and the underscore versions convert the hashes back to strings. If you want to read more about this you can see the dev docs, though you don't have to know all that to use the API. Also, the general convention in Python is that members that start with an underscore are for internal use. The spaCy underscore properties can be used freely. |
Beta Was this translation helpful? Give feedback.
The code in the video (time link), including as you have pasted it, uses
lower_
with an underscore. This hasn't changed between v2 and v3.The underscore methods are for token attributes that are strings. For efficiency purposes they are stored as hashes (integers), and the underscore versions convert the hashes back to strings. If you want to read more about this you can see the dev docs, though you don't have to know all that to use the API.
Also, the general convention in Python is that members that start with an underscore are for internal use. The spaCy underscore properties can be used freely.