Skip to content

Normalize local-part to NFKC; normalize IDN to Unicode rather than ASCII#28

Merged
bnkc merged 3 commits intobnkc:mainfrom
andersk:unicode
Nov 9, 2025
Merged

Normalize local-part to NFKC; normalize IDN to Unicode rather than ASCII#28
bnkc merged 3 commits intobnkc:mainfrom
andersk:unicode

Conversation

@andersk
Copy link
Contributor

@andersk andersk commented Nov 6, 2025

Rust’s str and String types already guarantee their contents to be
valid UTF-8 as a safety invariant, so this doesn’t check anything.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
@andersk andersk changed the title Normalize local-part to NFKC; bormalize IDN to Unicode rather than ASCII Normalize local-part to NFKC; normalize IDN to Unicode rather than ASCII Nov 6, 2025
@andersk andersk force-pushed the unicode branch 2 times, most recently from 469343e to 3ee0bae Compare November 6, 2025 02:10
This is required by
https://www.unicode.org/reports/tr39/#Email_Security_Profiles.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Change normalized and domain_name to use the Unicode form of the
domain.  Return the ASCII form of the domain as ascii_domain, and the
ASCII form of the email (if it exists) as ascii_email.  These changes
match the behavior of email-validator.

Fixes bnkc#24.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
@pytest.mark.parametrize(
("s", "expected_error"),
[
("\u2005", "FOUR-PER-EM SPACE"), # four-per-em space (Zs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This gets NFKC-normalized to an ASCII space character, resulting in a different error message.)

@bnkc bnkc merged commit be4809a into bnkc:main Nov 9, 2025
16 checks passed
@andersk andersk deleted the unicode branch January 2, 2026 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow normalizing IDN to Unicode rather than ASCII

2 participants