-
-
Notifications
You must be signed in to change notification settings - Fork 11
Specify all presentation sequences #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It's a lot of them. My recentish pr only covered a small part. |
I'll be the first to bring up automation here: That means any concerns about such automation should not block this PR. |
I had to find a way to be confident in the content of this PR anyway, so I figured I could make tests so that the automation part is already done. I left a commit with the tests failing, so that you can see the output in CI. Notably, there was one mistake introduced in ae98eb0: I added the text presentation selector to For now, I query the list of presentation sequences defined by Unicode from the internet every time the tests are run. Feel free to suggest a better way. |
sym
and emoji
This PR now contains meta changes. |
How about caching it in a file Also, I'd like the web-request part of this test to be opt-in to begin with (the test can still run without it if the cache file already exists). IMO running |
Initially I thought it may be better to download the file as part of a build script for tests only (which would cache it for as long as the source code is not modified), but I'm not sure how to run something only for building tests? Also, maybe this is a bad idea for some other reason. I think your solution is probably better anyway. Maybe the file could even be part of the repo so that it doesn't need to be downloaded every time, but somehow that doesn't feel right to me. Also, that might have some licensing issues (we would probably need to include https://www.unicode.org/license.txt as well). Additionally, I want to clarify that the dependency on |
Yeah I got that, but it's still a huge dep tree that not everyone may want to have to download before running [the rest of] the tests. |
If we're gonna go with the downloading thing, then |
I switched to |
According to The Cargo Book, it is not possible to have a build dependency for tests only, so this feature trick is necessary:
|
src/lib.rs
Outdated
@@ -285,6 +287,8 @@ mod test { | |||
.collect::<HashSet<_>>(); | |||
assert!( | |||
are_all_variants_valid(EMOJI, |c| { | |||
// All text presentations are exactly 2 codepoints long as of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this say "emoji" instead of "text"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to "emoji variation sequences" everywhere, as this is how text/emoji presentation sequences seem to be referred to by Unicode.
Would it be possible to have some shorthand that specifies text vs emoji form? |
This PR fixes #21, fixes #23, and closes #25, by adding the appropriate variation selectors to symbols that exist in both
sym
andemoji
. I verified that all the variation sequences are defined in by Unicode.12I have marked this PR as a draft because the next step is to add variation selectors to all symbols that allow it, whether present in both
sym
andemoji
or not, to prevent ambiguity.This made me realize that some emojis are poorly named, but improving this is a task for a separate PR.
Related: typst/typst#6489 (comment).
Footnotes
https://www.unicode.org/reports/tr51/#Emoji_Variation_Sequences ↩
https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-variation-sequences.txt ↩