Micro-blog instructions should explain graphemes and possibly test extended graphemes

The instructions of the `micro-blog` exercise say:

> The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits.

I understand that we want to keep things simple, but I think this is misleading. For example, in the Roc track the instructions led some people to split the string into codepoints when in fact there's actually a very simple function to split the string into graphemes instead: the tests pass in both cases because they only include graphemes composed of a single codepoint, but they would fail if the tests included flags, or characters with multiple diacritics, or complex emojis, or basically any grapheme composed of multiple codepoints (i.e., extended grapheme clusters).

In short: we shouldn't encourage people to work with codepoints when they can just as easily work with graphemes.

I suggest at least updating the instructions to cover graphemes, but also including some tests with extended grapheme clusters. If we're going to handle unicode, we should try to handle all possible characters. Handling graphemes might be harder in some languages, but in that case they can just disable the extended grapheme tests.

**Edit**: I'm happy to submit a PR if there's an agreement on this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Micro-blog instructions should explain graphemes and possibly test extended graphemes #2483

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Micro-blog instructions should explain graphemes and possibly test extended graphemes #2483

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions