"Short" form of token shape #6561
kinghuang
started this conversation in
New Features & Project Ideas
Replies: 1 comment
-
I see your point, but isn't this easily achievable with some of the code you've already written, and accessing the results from a custom token attribute? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The Token shape attribute shows the orthographic features of the token's string. Sequences of the same characters are truncated after length 4.
Sometimes, I am more interested in something that generalizes the shape of the text even more. For example, say I have the following text.
Given the token for
Product
, I want to scan previous tokens that share the same basic shape. The current shapes forMy Great Product
areXx Xxxxx Xxxxx
. I can further truncate the shapes by doing something the following.This truncates the shapes of those tokens to
Xx Xx Xx
, making it possible to correlate the shapes of the three words.I would be interested in a Token attribute that acted like the current
shape
, but truncated sequences of the same character after length 1, instead of length 4. Something like ashort_shape
/short_shape_
attribute.Xxxxx
Xx
Xx
Xx
d,ddd.dd
d,d.d
dddd.dd
d.d
d.dddd
d.d
Your Environment
Beta Was this translation helpful? Give feedback.
All reactions