-
-
Notifications
You must be signed in to change notification settings - Fork 557
Description
The instructions of the micro-blog exercise say:
The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits.
I understand that we want to keep things simple, but I think this is misleading. For example, in the Roc track the instructions led some people to split the string into codepoints when in fact there's actually a very simple function to split the string into graphemes instead: the tests pass in both cases because they only include graphemes composed of a single codepoint, but they would fail if the tests included flags, or characters with multiple diacritics, or complex emojis, or basically any grapheme composed of multiple codepoints (i.e., extended grapheme clusters).
In short: we shouldn't encourage people to work with codepoints when they can just as easily work with graphemes.
I suggest at least updating the instructions to cover graphemes, but also including some tests with extended grapheme clusters. If we're going to handle unicode, we should try to handle all possible characters. Handling graphemes might be harder in some languages, but in that case they can just disable the extended grapheme tests.
Edit: I'm happy to submit a PR if there's an agreement on this issue.