Fix #1619: Implement grapheme cluster counting for character count co… by moshaid · Pull Request #1765 · nhsuk/nhsuk-frontend

moshaid · 2026-01-13T12:11:17Z

This PR implements grapheme cluster counting for the character count component to ensure client-side validation matches Python's len() behaviour for server-side consistency:

Add graphemeCount utility function with Intl.Segmenter support and fallback
Update character count component to use grapheme counting instead of UTF-16 code units
Add comprehensive tests for Unicode, emoji, and complex characters
Update component documentation with counting methodology
Ensures client-side validation matches Python len() behaviour for server-side consistency

…nt component - Add graphemeCount utility function with Intl.Segmenter support and fallback - Update character count component to use grapheme counting instead of UTF-16 code units - Add comprehensive tests for Unicode, emoji, and complex characters - Update component documentation with counting methodology - Ensures client-side validation matches Python len() behavior for server-side consistency

colinrotherham · 2026-01-16T11:59:12Z

Thanks for raising this @moshaid

Better compatibility with Python len() and our nhsuk-frontend-jinja port would be great

Knowing we only have Intl.Segmenter in Baseline 2024, what should we do about the support gap?

Server-side consistency

Because the error message set server-side might include a different count to client-side, should we:

Support grapheme counting with string counting fallback
Support grapheme counting with no fallback, textarea only
Support grapheme counting via config only

Have you tried this already on your service?

Thanks

moshaid · 2026-01-16T21:43:22Z

Appreciate you reviewing this @colinrotherham

I have added a fall back to support the gap.
For the server consistency, i tried the 3rd option which i consider the best.

moshaid · 2026-01-27T22:03:06Z

@colinrotherham i just ran a prettier check on the file and it is well formatted. I am stuck on what could be wrong with the prettier check fail. Is there any option you could recommend on this?.

Thanks

colinrotherham · 2026-01-28T08:07:58Z

Can you run npm install and try again?

I'm getting the following change on npm run lint:prettier:fix

Thanks @moshaid

--- A/packages/nhsuk-frontend/src/nhsuk/i18n.mjs
+++ B/packages/nhsuk-frontend/src/nhsuk/i18n.mjs
@@ -141,7 +141,7 @@
   hasIntlPluralRulesSupport() {
     return Boolean(
       'PluralRules' in window.Intl &&
+      Intl.PluralRules.supportedLocalesOf(this.locale).length
-        Intl.PluralRules.supportedLocalesOf(this.locale).length
     )
   }

moshaid · 2026-01-28T22:54:52Z

Thanks for the tip @colinrotherham

MatMoore · 2026-02-05T08:35:28Z

packages/nhsuk-frontend/src/nhsuk/components/character-count/README.md

+
+## How characters are counted
+
+By default, the character count component uses **code point counting**, which matches Python's `len()` function for Unicode strings. This ensures consistency between client-side (JavaScript) and server-side (Python) validation in `nhsuk-frontend-jinja`, preventing mismatched error messages.


Not everyone is using python and jinja, so I don't know if this paragraph should be geared at a broader audience than teams using python.

nhsuk-frontend-jinja is a port of the nunjucks components, so it's not actually the thing providing server-side validation, but wherever the template relies on the len filter, it will be using the code point counting, so it's good that its consistent.

This is a great contribution btw 👍🏻

What do you both think about a customisable count function?

Character count component counts code points, not characters alphagov/govuk-frontend#1104

Character count's character/word count functions should be customisable alphagov/govuk-frontend#1364

It might need to support promises (should a server-side count be needed)

Similar to a config option, you could pass in pre-exported count functions that we publish. Or alternatively use your own if necessary

I like the idea in theory. I think calling the server for the count is a bit excessive though. Just being able to specify a count function on the client side to match what the server does seems like it would be flexible enough.

Counting bytes seems like the main use case other than the options we've got now (graphemes, codepoints, words). The byte count would depend on the encoding the server/database uses, so if we export a function for this it probably shouldn't assume utf-8.

I'm sure there are plenty of cases where we're constrained by some legacy backend thing, but I'm not sure the ideal UX in that case. With graphemes - and to a lesser extent, the existing codepoint counting - the display will track with what the user is entering, but it will lie about whether they are over the limit if they enter any multi-byte characters. But if they customise the count function to use byte counting, the count will jump by more than 1 when they type multi-byte characters, which might confuse users as well. 🤷🏻

I wonder if we can partially mitigate mis-matches by:

recommending that you set quite a high limit where possible

recommending that teams use a threshold, so that most users don’t see the character count at all

even if both client side and server-side are counting characters, you could allow an extra 5 or so characters server-side vs client-side, just in case there’s a discrepancy in the counting functions...

MatMoore · 2026-02-05T08:40:17Z

packages/nhsuk-frontend/src/nhsuk/components/character-count/README.md

+- **Characters with combining marks**: Accented characters like é, ñ, and ü are counted correctly regardless of whether they're stored as a single code point or as a base character plus combining mark
+- **Complex scripts**: Non-Latin scripts (Chinese, Japanese, Korean, Arabic, etc.) are counted accurately
+
+**Important**: Only enable grapheme counting if your server-side validation also uses grapheme counting. Otherwise, you may see different counts between client and server validation messages.


Would it be helpful to signpost the way to do this in javascript and the various other languages we use on the backend? (In python I think you probably need a 3rd party library like https://grapheme.readthedocs.io/en/latest/grapheme.html)

moshaid force-pushed the fix/1619-grapheme-counting branch from b923c57 to 27cf1c1 Compare January 15, 2026 15:28

Moshood Alabidun and others added 2 commits January 15, 2026 15:38

Fixed test errors

45afec1

Merge branch 'nhsuk:main' into fix/1619-grapheme-counting

956899e

Moshood Alabidun added 2 commits January 16, 2026 21:44

-Added fallback to support gap and server consistency

8575123

tests updated and passing

2bcca4b

prettier test checked --passed

cd27a4d

moshaid closed this Jan 28, 2026

moshaid reopened this Jan 28, 2026

MatMoore reviewed Feb 5, 2026

View reviewed changes

colinrotherham deployed to nhsuk-frontend-pr-1765 February 9, 2026 12:55 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #1619: Implement grapheme cluster counting for character count co…#1765

Fix #1619: Implement grapheme cluster counting for character count co…#1765
moshaid wants to merge 6 commits intonhsuk:mainfrom
moshaid:fix/1619-grapheme-counting

moshaid commented Jan 13, 2026

Uh oh!

colinrotherham commented Jan 16, 2026

Uh oh!

moshaid commented Jan 16, 2026

Uh oh!

moshaid commented Jan 27, 2026

Uh oh!

colinrotherham commented Jan 28, 2026

Uh oh!

moshaid commented Jan 28, 2026

Uh oh!

MatMoore Feb 5, 2026

Uh oh!

colinrotherham Feb 5, 2026 •

edited

Loading

Uh oh!

MatMoore Feb 5, 2026 •

edited

Loading

Uh oh!

frankieroberto Feb 10, 2026

Uh oh!

MatMoore Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		## How characters are counted

		By default, the character count component uses code point counting, which matches Python's `len()` function for Unicode strings. This ensures consistency between client-side (JavaScript) and server-side (Python) validation in `nhsuk-frontend-jinja`, preventing mismatched error messages.

Conversation

moshaid commented Jan 13, 2026

Uh oh!

colinrotherham commented Jan 16, 2026

Server-side consistency

Uh oh!

moshaid commented Jan 16, 2026

Uh oh!

moshaid commented Jan 27, 2026

Uh oh!

colinrotherham commented Jan 28, 2026

Uh oh!

moshaid commented Jan 28, 2026

Uh oh!

MatMoore Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

colinrotherham Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MatMoore Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frankieroberto Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

MatMoore Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

colinrotherham Feb 5, 2026 •

edited

Loading

MatMoore Feb 5, 2026 •

edited

Loading