stop using the '␣' to denote a space character in function examples

c-herrewijn · c-herrewijn · commit 1094d251bce6 · 2025-05-19T15:09:27.000+02:00
diff --git a/docs/preview/sql/functions/text.md b/docs/preview/sql/functions/text.md
@@ -59,11 +59,11 @@ This section describes functions and operators for examining and manipulating [`
 | [`like_escape(string, like_specifier, escape_character)`](#like_escapestring-like_specifier-escape_character) | Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/preview/sql/functions/pattern_matching.md %})) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
 | [`lower(string)`](#lowerstring) | Converts `string` to lower case. |
 | [`lpad(string, count, character)`](#lpadstring-count-character) | Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. |
-| [`ltrim(string[, characters])`](#ltrimstring-characters) | Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. In the example, the `␣` symbol denotes a space character. |
+| [`ltrim(string[, characters])`](#ltrimstring-characters) | Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. |
 | [`md5(string)`](#md5string) | Returns the MD5 hash of the `string` as a `VARCHAR`. |
 | [`md5_number(string)`](#md5_numberstring) | Returns the MD5 hash of the `string` as a `HUGEINT`. |
-| [`md5_number_lower(string)`](#md5_number_lowerstring) | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `BIGINT`. |
-| [`md5_number_upper(string)`](#md5_number_upperstring) | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `BIGINT`. |
+| [`md5_number_lower(string)`](#md5_number_lowerstring) | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
+| [`md5_number_upper(string)`](#md5_number_upperstring) | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
 | [`nfc_normalize(string)`](#nfc_normalizestring) | Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. |
 | [`not_ilike_escape(string, like_specifier, escape_character)`](#not_ilike_escapestring-like_specifier-escape_character) | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/preview/sql/functions/pattern_matching.md %})) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
 | [`not_like_escape(string, like_specifier, escape_character)`](#not_like_escapestring-like_specifier-escape_character) | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/preview/sql/functions/pattern_matching.md %})) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. |
@@ -91,7 +91,7 @@ This section describes functions and operators for examining and manipulating [`
 | [`right(string, count)`](#rightstring-count) | Extract the right-most `count` characters. |
 | [`right_grapheme(string, count)`](#right_graphemestring-count) | Extracts the right-most `count` grapheme clusters. |
 | [`rpad(string, count, character)`](#rpadstring-count-character) | Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. |
-| [`rtrim(string[, characters])`](#rtrimstring-characters) | Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. In the example, the `␣` symbol denotes a space character. |
+| [`rtrim(string[, characters])`](#rtrimstring-characters) | Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. |
 | [`sha1(value)`](#sha1value) | Returns a `VARCHAR` with the SHA-1 hash of the `value`. |
 | [`sha256(value)`](#sha256value) | Returns a `VARCHAR` with the SHA-256 hash of the `value` |
 | [`split(string, separator)`](#splitstring-separator) | Splits the `string` along the `separator`. |
@@ -109,12 +109,12 @@ This section describes functions and operators for examining and manipulating [`
 | [`substring(string, start[, length])`](#substringstring-start-length) | Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`. |
 | [`substring_grapheme(string, start[, length])`](#substring_graphemestring-start-length) | Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`. |
 | [`suffix(string, search_string)`](#suffixstring-search_string) | Returns `true` if `string` ends with `search_string`. |
-| [`to_base(number, raxid[, min_length])`](#to_basenumber-raxid-min_length) |  Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`. |
+| [`to_base(number, radix[, min_length])`](#to_basenumber-radix-min_length) | Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`. |
 | [`to_base64(blob)`](#to_base64blob) | Converts a `blob` to a base64 encoded string. |
 | [`to_binary(string)`](#to_binarystring) | Converts the `string` to binary representation. |
 | [`to_hex(string)`](#to_hexstring) | Converts the `string` to hexadecimal representation. |
 | [`translate(string, from, to)`](#translatestring-from-to) | Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted. |
-| [`trim(string[, characters])`](#trimstring-characters) | Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. In the example, the `␣` symbol denotes a space character. |
+| [`trim(string[, characters])`](#trimstring-characters) | Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. |
 | [`ucase(string)`](#ucasestring) | Converts `string` to upper case. |
 | [`unbin(value)`](#unbinvalue) | Converts a `value` from binary representation to a blob. |
 | [`unhex(value)`](#unhexvalue) | Converts a `value` from hexadecimal representation to a blob. |
@@ -216,7 +216,7 @@ This section describes functions and operators for examining and manipulating [`
 
 | **Description** | Draws a band whose width is proportional to (`x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80. |
 | **Example** | `bar(5, 0, 20, 10)` |
-| **Result** | `██▌␣␣␣␣␣␣␣` |
+| **Result** | `██▌       ` |
 
 #### `base64(blob)`
 
@@ -497,9 +497,9 @@ This section describes functions and operators for examining and manipulating [`
 
 <div class="nostroke_table"></div>
 
-| **Description** | Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. In the example, the `␣` symbol denotes a space character. |
-| **Example 1** | `ltrim('␣␣␣␣test␣␣')` |
-| **Result** | `test␣␣` |
+| **Description** | Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. |
+| **Example 1** | `ltrim('    test  ')` |
+| **Result** | `test  ` |
 | **Example 2** | `ltrim('>>>>test<<', '><')` |
 | **Result** | `test<<` |
 
@@ -523,15 +523,15 @@ This section describes functions and operators for examining and manipulating [`
 
 <div class="nostroke_table"></div>
 
-| **Description** | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `BIGINT`. |
+| **Description** | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
 | **Example** | `md5_number_lower('abc')` |
 | **Result** | `8250560606382298838` |
 
 #### `md5_number_upper(string)`
 
 <div class="nostroke_table"></div>
 
-| **Description** | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `BIGINT`. |
+| **Description** | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. |
 | **Example** | `md5_number_upper('abc')` |
 | **Result** | `12704604231530709392` |
 
@@ -758,9 +758,9 @@ This section describes functions and operators for examining and manipulating [`
 
 <div class="nostroke_table"></div>
 
-| **Description** | Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. In the example, the `␣` symbol denotes a space character. |
-| **Example 1** | `rtrim('␣␣␣␣test␣␣')` |
-| **Result** | `␣␣␣␣test` |
+| **Description** | Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. |
+| **Example 1** | `rtrim('    test  ')` |
+| **Result** | `    test` |
 | **Example 2** | `rtrim('>>>>test<<', '><')` |
 | **Result** | `>>>>test` |
 
@@ -917,7 +917,9 @@ This section describes functions and operators for examining and manipulating [`
 | **Result** | `true` |
 | **Alias** | `ends_with` |
 
-#### `to_base(number, raxid[, min_length])`
+#### `to_base(number, radix[, min_length])`
+
+<div class="nostroke_table"></div>
 
 | **Description** | Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`. |
 | **Example** | `to_base(42, 16, 5)` |
@@ -962,8 +964,8 @@ This section describes functions and operators for examining and manipulating [`
 
 <div class="nostroke_table"></div>
 
-| **Description** | Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. In the example, the `␣` symbol denotes a space character. |
-| **Example 1** | `trim('␣␣␣␣test␣␣')` |
+| **Description** | Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. |
+| **Example 1** | `trim('    test  ')` |
 | **Result** | `test` |
 | **Example 2** | `trim('>>>>test<<', '><')` |
 | **Result** | `test` |
diff --git a/scripts/generate_sql_function_docs.py b/scripts/generate_sql_function_docs.py
@@ -383,8 +383,7 @@ def generate_example_rows(func: DocFunction):
             try:
                 if func.name in BINARY_OPERATORS:
                     example = f"({example})"
-                run_example = re.sub(r'␣', ' ', example)
-                query_result = duckdb.sql(rf"select {run_example}::VARCHAR").fetchall()
+                query_result = duckdb.sql(rf"select {example}::VARCHAR").fetchall()
                 if len(query_result) != 1:
                     example_result = 'Multiple rows: ' + ', '.join(
                         (
@@ -398,16 +397,6 @@ def generate_example_rows(func: DocFunction):
                     example_result = (
                         f"{query_result[0][0]}" if query_result[0][0] else "NULL"
                     )
-                    # replace leading and trailing spaces by '␣'
-                    nr_leading_spaces = 0
-                    nr_trailing_spaces = 0
-                    while example_result and example_result[0] == ' ':
-                        example_result = example_result[1:]
-                        nr_leading_spaces += 1
-                    while example_result and example_result[-1] == ' ':
-                        example_result = example_result[:-1]
-                        nr_trailing_spaces += 1
-                    example_result = f"{nr_leading_spaces * '␣'}{example_result}{nr_trailing_spaces * '␣'}"
             except duckdb.ParserException as e:
                 print(
                     f"Error for function '{func.name}', could not calculate example: '{example}'. Consider adding it via OVERRIDES'. {e}"