Cython access to lemma string #11229
-
Hey there! I am trying to rewrite some text preprocessing pieces in Cython. Regarding with that, I have this snippet to filter stopwords:
What I expected was that Lexeme.get_struct_attr(c.lex, LEMMA) return a string instead of an integer (in fact, it always return 0, seems it can't find any entry). Currently, filter_stop return the StringStore hash number, but what I am trying to retrieve is the lemma for those words whose aren't be flagged as stop. What would be the right way to do this in Cython? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
The underlying problem above is that |
Beta Was this translation helpful? Give feedback.
Token/Lexeme.get_struct_attr
(and basically all the cython methods) work with the string store hashes internally rather than strings, so it's expected to get an integer (attr_t
which isuint64_t
).The underlying problem above is that
LEMMA
is only aToken
attribute, not aLexeme
attribute. The only attribute that's stored on both underneath isNORM
, butToken.get_struct_attr
backs off toLexeme.get_struct_attr
for any unknown attributes. And thenLexeme.get_struct_attr
returns0
for any unknown attributes.