Skip to content

Conversation

alexander-beedie
Copy link
Contributor

@alexander-beedie alexander-beedie commented Jan 15, 2025

Adds parsing support for IS [NOT] [<form>] NORMALIZED → bool syntax:

Details from the PostgreSQL string function docs:
https://www.postgresql.org/docs/current/functions-string.html

Checks whether the string is in the specified Unicode normalization
form. The optional 'form' keyword specifies the form: NFC (the default),
NFD, NFKC, or NFKD. This expression can only be used when the server
encoding is UTF8. Note that checking for normalization using this
expression is often faster than normalizing possibly already
normalized strings.
  • NFC: Canonical Decomposition, followed by Canonical Composition.
  • NFD: Canonical Decomposition.
  • NFKC: Compatibility Decomposition, followed by Canonical Composition.
  • NFKD: Compatibility Decomposition.

As the normalised forms are fixed (there are only these four), it seemed reasonable to return the parsed form as a new Option<NormalizationForm> Enum (which helps the caller as they don't have to check the string or case-normalise it, and can then jump straight into some associated match block, etc).

(Also: fixed a few minor typos).

Examples

Default/omitted form:

strcol IS NORMALIZED
strcol IS NOT NORMALIZED

Specific form:

strcol IS NFKC NORMALIZED
strcol IS NOT NFKD NORMALIZED

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alexander-beedie -- I think this looks quite nice and well tested

fyi @iffyio

/// so you can call slice methods on it and iterate over items
/// # Examples
/// Acessing as a slice:
/// Accessing as a slice:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these cleanups

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment regarding the API signature, otherwise this looks good to me!

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @alexander-beedie!

@iffyio iffyio merged commit e9498d5 into apache:main Jan 17, 2025
9 checks passed
@alexander-beedie alexander-beedie deleted the is-normalized branch January 17, 2025 10:00
hansott added a commit to hansott/datafusion-sqlparser-rs that referenced this pull request Jan 23, 2025
…o escape-literals

* 'main' of github.com:hansott/datafusion-sqlparser-rs:
  National strings: check if dialect supports backslash escape (apache#1672)
  Add support for Create Iceberg Table statement for Snowflake parser (apache#1664)
  Add support for Snowflake account privileges (apache#1666)
  Update rat_exclude_file.txt (apache#1670)
  Update verson to 0.54.0 and update changelog (apache#1668)
  Add support for Snowflake AT/BEFORE (apache#1667)
  Add support for qualified column names in JOIN ... USING (apache#1663)
  Add support for `IS [NOT] [form] NORMALIZED` (apache#1655)
  fix parsing of `INSERT INTO ... SELECT ... RETURNING ` (apache#1661)
  Add support for Snowflake column aliases that use SQL keywords (apache#1632)
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin added a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
ayman-sigma pushed a commit to sigmacomputing/sqlparser-rs that referenced this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants