Cython access to lemma string #11229
-
|
Hey there! I am trying to rewrite some text preprocessing pieces in Cython. Regarding with that, I have this snippet to filter stopwords: What I expected was that Lexeme.get_struct_attr(c.lex, LEMMA) return a string instead of an integer (in fact, it always return 0, seems it can't find any entry). Currently, filter_stop return the StringStore hash number, but what I am trying to retrieve is the lemma for those words whose aren't be flagged as stop. What would be the right way to do this in Cython? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
|
The underlying problem above is that |
Beta Was this translation helpful? Give feedback.
Token/Lexeme.get_struct_attr(and basically all the cython methods) work with the string store hashes internally rather than strings, so it's expected to get an integer (attr_twhich isuint64_t).The underlying problem above is that
LEMMAis only aTokenattribute, not aLexemeattribute. The only attribute that's stored on both underneath isNORM, butToken.get_struct_attrbacks off toLexeme.get_struct_attrfor any unknown attributes. And thenLexeme.get_struct_attrreturns0for any unknown attributes.