Skip to content

Implement optional modifiers #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

T0mstone
Copy link
Collaborator

@T0mstone T0mstone commented Jul 4, 2025

Closes #51.

This is mostly done, but there are some unresolved questions and some issues and it still needs some finishing touches like documentation, which is why it's a draft for now.

This is also a breaking change since it changes the way ModifierSet works, so I should probably remember to bump the version in Cargo.toml before we merge this.

Changes to observable syntax

The way I've implemented it, the modifier? syntax applies directly in the ModifierSet, so it would be exposed to the users in typst too. (Or typst would have to do some extra sanitizing).

And also, it's not really clear how this whole new resolution algorithm would interact with user-defined symbols from typst: I've kept the code in best_match_in mostly the same for now, but we could strip it down way more if the match was actually guaranteed to be unique. (But of course, this may not be true for user-defined symbols.)

Changes to variant resolution

I have another branch where I wrote some ad-hoc code (mostly identical to what is now the no_overlap test) to brute-force check every possible modifier set for every symbol and generate a list of which ones differ between the old algorithm and the new one.

This is that list:

emoji.arrow : Some(("↙", None)) => Some(("➡", None))
emoji.bubble : Some(("💭", None)) => Some(("💬", None))
emoji.cloud.hidden : Some(("🌥", None)) => None
emoji.dancing.bunny : Some(("👯", None)) => None
emoji.dancing.women : Some(("👯", None)) => None
emoji.face.not : Some(("🫢", None)) => None
emoji.face.slight : Some(("🙁", None)) => None
emoji.face.withheld : Some(("🥹", None)) => None
emoji.faith : Some(("✝", None)) => None
emoji.faith.dot : Some(("🔯", None)) => None
emoji.finger.alt : Some(("☝", None)) => None
emoji.globe.af : Some(("🌍", None)) => None
emoji.globe.as : Some(("🌏", None)) => None
emoji.globe.au : Some(("🌏", None)) => None
emoji.globe.eu : Some(("🌍", None)) => None
emoji.handholding : Some(("👬", None)) => None
emoji.leaf.four : Some(("🍀", None)) => None
emoji.leaf.three : Some(("☘", None)) => None
emoji.monkey.not : Some(("🙉", None)) => None
emoji.moon.one : Some(("🌖", None)) => None
emoji.moon.face.three : Some(("🌜", None)) => None
emoji.moon.face.two : Some(("🌛", None)) => None
emoji.playback.once : Some(("🔂", None)) => None
emoji.playback.v : Some(("🔃", None)) => None
emoji.suit : Some(("♣", None)) => None
sym.angle.t : Some(("⦡", None)) => None
sym.angle.top : Some(("⦡", Some("`angle.spheric.top` is deprecated, use `angle.spheric.t` instead"))) => None
sym.arrow.half : Some(("↶", None)) => None
sym.arrows.stop : Some(("↹", None)) => None
sym.ballot.heavy : Some(("🗹", None)) => None
sym.colon.op : Some(("⫶", None)) => None
sym.dash.double : Some(("〰", None)) => None
sym.dash.three : Some(("⸻", None)) => None
sym.dash.two : Some(("⸺", None)) => None
sym.divides.rev : Some(("⫮", None)) => None
sym.dot.big : Some(("⨀", None)) => None
sym.emptyset.l : Some(("⦴", None)) => None
sym.emptyset.r : Some(("⦳", None)) => None
sym.eq.down : Some(("≒", None)) => None
sym.eq.up : Some(("≓", None)) => None
sym.gender.male.r : Some(("⚩", None)) => None
sym.gender.male.t : Some(("⚨", None)) => None
sym.gt.nested : Some(("⫸", None)) => None
sym.gt.slant : Some(("⩾", None)) => None
sym.integral.hook : Some(("⨗", None)) => None
sym.lt.nested : Some(("⫷", None)) => None
sym.lt.slant : Some(("⩽", None)) => None
sym.note.alt : Some(("♩", None)) => None
sym.note.beamed : Some(("♫", None)) => None
sym.note.slash : Some(("𝆔", None)) => None
sym.nothing.l : Some(("⦴", None)) => None
sym.nothing.r : Some(("⦳", None)) => None
sym.parallel.slanted : Some(("⧣", None)) => None
sym.parallel.slanted.tilde : Some(("⧤", None)) => None
sym.plus.arrow : Some(("⟴", None)) => None
sym.plus.big : Some(("⨁", None)) => None
sym.prec.curly : Some(("≼", None)) => None
sym.prec.curly.not : Some(("⋠", None)) => None
sym.prec.eq.not : Some(("⋠", None)) => None
sym.rest.measure : Some(("𝄩", None)) => None
sym.space.narrow : Some(("\u{202f}", None)) => None
sym.subset.not.sq : Some(("⋢", None)) => None
sym.succ.curly : Some(("≽", None)) => None
sym.succ.curly.not : Some(("⋡", None)) => None
sym.succ.eq.not : Some(("⋡", None)) => None
sym.suit : Some(("♣", None)) => None
sym.suit.filled : Some(("♣", None)) => None
sym.suit.stroked : Some(("♧", None)) => None
sym.supset.not.sq : Some(("⋣", None)) => None
Most of these should be net positives, but there are also some cases like emoji.globe where a variant has two optional modifiers, of which at least one needs to be present since it is not the default variant. This could in theory be encoded by duplicating the variant, but that'd be observable in the repr and also a bit hacky, so I haven't done it for now.

Other stuff

I have some other minor things I'd like to say, but I'm too tired now, so I'll add them later.

@T0mstone T0mstone added meta Discussion about the structure of this repo breaking This involves a breaking change labels Jul 4, 2025
@knuesel
Copy link
Collaborator

knuesel commented Jul 4, 2025

This looks great! The list shows how useful this is to avoid ambiguous cases that would easily break in the future.

For the globe it is a bit unfortunate as it makes sense to write globe.eu for the globe that shows "Europe and I don't care what else".

Maybe a solution would be to define

globe
  .as?.au 🌏
  .as.au? 🌏
  .eu?.af 🌍
  .eu.af? 🌍

and have the validation code recognize that .as?.au and .as.au? don't clash because it's the same variant?

@T0mstone
Copy link
Collaborator Author

T0mstone commented Jul 4, 2025

Maybe a solution would be to define

globe
  .as?.au 🌏
  .as.au? 🌏
  .eu?.af 🌍
  .eu.af? 🌍

and have the validation code recognize that .as?.au and .as.au? don't clash because it's the same variant?

Yes, that's what I meant with "duplicating the variant". Tho your example wouldn't work since globe.as.au would be ambiguous, but the following would work:

globe
  .as.au? 🌏
  .au 🌏
  .eu.af? 🌍
  .af 🌍

The issue with this is something that ties into the next thing I want to talk about here (see my comment below this).

@T0mstone
Copy link
Collaborator Author

T0mstone commented Jul 4, 2025

These changes to ModifierSet require us to add some extra code to typst if we don't want them to be observable to the end users, because if we use it as-is, the modifier? syntax will be exposed. (I've already said this above.)

Now my point here is that I actually think exposing this is a good thing and we should give typst users access to the new resolution system too, so custom symbols aren't second-class citizens.
But that poses the question of what to do when it comes to verifying unique abbreviations.

In codex, we can just run the test no_overlap, which is very inefficient and takes a few seconds on my machine, but typst has stricter performance requirements.
In particular, I see two options there:

  1. Somehow implement uniqueness checking for typst as well.(maybe pull it from the test into a global function that works with any Symbol-likes) This would probably require some heavy optimization of which I'm not sure whether it's possible.
  2. Let users define symbols with overlapping abbreviations and retain the extra code in the loop in best_match_in to handle these ambiguities when they exist. This has a huge performance benefit, but comes at the cost of not letting package authors get the same guarantees that codex now has for forward-compatibility. (Maybe we could expose a uniqueness check explicitly as a method on typst's symbol, so package authors can incorporate it into their tests?)

@T0mstone
Copy link
Collaborator Author

T0mstone commented Jul 4, 2025

Another small thing that confused me a bit (and why I had to double-check that sym.supset.not.sq should be invalid) is that DejaVu Sans Mono displays without the "or equal" part. I guess it's technically not totally wrong since that's also a notation some set theorists use, but it really feels like an error.

@knuesel
Copy link
Collaborator

knuesel commented Jul 14, 2025

Somehow implement uniqueness checking for typst as well.(maybe pull it from the test into a global function that works with any Symbol-likes) This would probably require some heavy optimization of which I'm not sure whether it's possible.

I think a fast implementation in Typst should be possible. To detect overlaps it should be sufficient to populate a dictionary where the keys are all "aliases" of all variants. The initial list of keys could be generated statically when building the Typst executable. Then checking overlap for a new symbol would just be a dict insertion for each new alias (negligible runtime) which would fail if the key already exists (unless I'm missing something and the check must be more sophisticated?).

@T0mstone
Copy link
Collaborator Author

Keeping a dict might be fine in practice, but it does have exponential space complexity in the worst case (symbol(("a?.b?.c?.d?.e?.f?.g?.h?", "🫨")) has 256 "aliases").

@knuesel
Copy link
Collaborator

knuesel commented Jul 15, 2025

If that's a concern, what about an algorithm that dynamically generates a dict for the symbol under consideration? I.e. when the user wants to define a.b?.c?.d, we can generate the dict of "aliases" for a. This should be very fast.

@T0mstone
Copy link
Collaborator Author

If that's a concern, what about an algorithm that dynamically generates a dict for the symbol under consideration? I.e. when the user wants to define a.b?.c?.d, we can generate the dict of "aliases" for a. This should be very fast.

That's already what I meant. But I suppose the memory usage isn't actually that bad for all sane applications...
And we explicitly don't need such a dict for the built-in symbols after all, since those are already checked with the no_overlap test.

Copy link
Collaborator

@MDLC01 MDLC01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First and foremost, I'm sorry for taking so much time to consider this PR. I think the concept of optional should be implemented at some point, but I did not find the time to review your implementation until now.

This is also a breaking change since it changes the way ModifierSet works, so I should probably remember to bump the version in Cargo.toml before we merge this.

The public interface was already changed multiple times in already merged PRs after the latest release. Laurenz will take care of the version bump when making a new release.

I'm wondering whether it would be better to first implement optional modifiers, keeping all modifiers optional, and only then make a PR where we decide which modifiers become non-optional. I did not review changes to the symbol lists for now.

Comment on lines 99 to 100
/// 1. Number of modifiers in common with `self` (more is better).
/// 2. Total number of modifiers (fewer is better).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// 1. Number of modifiers in common with `self` (more is better).
/// 2. Total number of modifiers (fewer is better).
/// 1. Number of optional modifiers in common with `self` (more is better).
/// 2. Total number of optional modifiers (fewer is better).

I'm wondering whether this would be easier to understand?

src/lib.rs Outdated
Comment on lines 267 to 273
if *last < max_index {
*last += 1;
} else {
next_subseq(left, max_index - 1)?;
*last = left.last().copied().map_or(*last, |x| x + 1);
}
Some(())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if *last < max_index {
*last += 1;
} else {
next_subseq(left, max_index - 1)?;
*last = left.last().copied().map_or(*last, |x| x + 1);
}
Some(())
if *last < max_index {
*last += 1;
} else {
*last = next_subseq(left, max_index - 1)?;
}
Some(*last)

This seems more idiomatic to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. next_subseq shouldn't return anything since the "return value" is already its first argument. Adding a second return value makes the API harder to understand IMO.

Comment on lines +226 to +232
let mset = indices.iter().map(|i| modifs[*i]).fold(
ModifierSet::<String>::default(),
|mut res, m| {
res.insert_raw(m);
res
},
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have an impl<S: Deref<Target = str>> FromIterator<S> for ModifierSet<String> instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like that, see #46 (comment).
Maybe we could have an inherent method from_iter_raw that documents all of those requirements (S::default() must be empty, each item must be a single modifier). To use that like collect, we'd also need an iterator extension trait, which seems overkill, but maybe calling it directly on an iterator is already ergonomic enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh indeed you are right! I had forgotten I already suggested a similar thing before.

I think having from_iter_raw could be useful.

@MDLC01 MDLC01 requested a review from laurmaedje August 3, 2025 15:27
Copy link
Member

@laurmaedje laurmaedje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the code too closely, but I'm okay with this in principle. I just noticed one thing while going through it. Otherwise, I'd defer to @MDLC01's judgement.

@@ -123,9 +158,44 @@ impl<S: Default> Default for ModifierSet<S> {
}
}

#[derive(Copy, Clone)]
pub struct Modifier<'a>(&'a str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type appears in the public API, but is not re-exported, so it's a deadlink in the docs. If it's supposed to properly public, it would also be good to have a bit of docs on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This involves a breaking change meta Discussion about the structure of this repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve forward compatibility with a notion of minimal modifier set
4 participants