Skip to content

Commit 9de192a

Browse files
karuppuchamysureshclaude
andauthored
docs: update data_types.md to reflect current Arrow type mappings (#20072)
## Which issue does this PR close? - Closes #18314 ## Rationale for this change The documentation in `data_types.md` was outdated and showed `Utf8` as the default mapping for character types (CHAR, VARCHAR, TEXT, STRING), but the current implementation defaults to `Utf8View`. This caused confusion for users reading the documentation as it didn't match the actual behavior. Additionally, the "Supported Arrow Types" section at the end was redundant since `arrow_typeof` now supports all Arrow types, making the comprehensive list unnecessary. ## What changes are included in this PR? 1. **Updated Character Types table**: Changed the Arrow DataType column from `Utf8` to `Utf8View` for CHAR, VARCHAR, TEXT, and STRING types 2. **Added configuration note**: Documented the `datafusion.sql_parser.map_string_types_to_utf8view` setting that allows users to switch back to `Utf8` if needed 3. **Removed outdated section**: Deleted the "Supported Arrow Types" section (39 lines) as it's no longer necessary ## Are these changes tested? This is a documentation-only change. The documentation accurately reflects the current behavior of DataFusion: - The default mapping to `Utf8View` is the current implementation behavior - The `datafusion.sql_parser.map_string_types_to_utf8view` configuration option exists and works as documented ## Are there any user-facing changes? Yes, documentation changes only. Users will now see accurate information about: - The correct default Arrow type mappings for character types - How to configure the string type mapping behavior if they need the old `Utf8` behavior --------- Co-authored-by: Claude (claude-sonnet-4.5) <[email protected]>
1 parent bc4c245 commit 9de192a

File tree

1 file changed

+23
-57
lines changed

1 file changed

+23
-57
lines changed

docs/source/user-guide/sql/data_types.md

Lines changed: 23 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -69,27 +69,32 @@ select arrow_cast(now(), 'Timestamp(Second, None)') as "now()";
6969

7070
| SQL DataType | Arrow DataType |
7171
| ------------ | -------------- |
72-
| `CHAR` | `Utf8` |
73-
| `VARCHAR` | `Utf8` |
74-
| `TEXT` | `Utf8` |
75-
| `STRING` | `Utf8` |
72+
| `CHAR` | `Utf8View` |
73+
| `VARCHAR` | `Utf8View` |
74+
| `TEXT` | `Utf8View` |
75+
| `STRING` | `Utf8View` |
76+
77+
By default, string types are mapped to `Utf8View`. This can be configured using the `datafusion.sql_parser.map_string_types_to_utf8view` setting. When set to `false`, string types are mapped to `Utf8` instead.
7678

7779
## Numeric Types
7880

79-
| SQL DataType | Arrow DataType |
80-
| ------------------------------------ | :----------------------------- |
81-
| `TINYINT` | `Int8` |
82-
| `SMALLINT` | `Int16` |
83-
| `INT` or `INTEGER` | `Int32` |
84-
| `BIGINT` | `Int64` |
85-
| `TINYINT UNSIGNED` | `UInt8` |
86-
| `SMALLINT UNSIGNED` | `UInt16` |
87-
| `INT UNSIGNED` or `INTEGER UNSIGNED` | `UInt32` |
88-
| `BIGINT UNSIGNED` | `UInt64` |
89-
| `FLOAT` | `Float32` |
90-
| `REAL` | `Float32` |
91-
| `DOUBLE` | `Float64` |
92-
| `DECIMAL(precision, scale)` | `Decimal128(precision, scale)` |
81+
| SQL DataType | Arrow DataType |
82+
| ------------------------------------------------ | :----------------------------- |
83+
| `TINYINT` | `Int8` |
84+
| `SMALLINT` | `Int16` |
85+
| `INT` or `INTEGER` | `Int32` |
86+
| `BIGINT` | `Int64` |
87+
| `TINYINT UNSIGNED` | `UInt8` |
88+
| `SMALLINT UNSIGNED` | `UInt16` |
89+
| `INT UNSIGNED` or `INTEGER UNSIGNED` | `UInt32` |
90+
| `BIGINT UNSIGNED` | `UInt64` |
91+
| `FLOAT` | `Float32` |
92+
| `REAL` | `Float32` |
93+
| `DOUBLE` | `Float64` |
94+
| `DECIMAL(precision, scale)` where precision ≤ 38 | `Decimal128(precision, scale)` |
95+
| `DECIMAL(precision, scale)` where precision > 38 | `Decimal256(precision, scale)` |
96+
97+
The maximum supported precision for `DECIMAL` types is 76.
9398

9499
## Date/Time Types
95100

@@ -131,42 +136,3 @@ You can create binary literals using a hex string literal such as
131136
| `ENUM` | _Not yet supported_ |
132137
| `SET` | _Not yet supported_ |
133138
| `DATETIME` | _Not yet supported_ |
134-
135-
## Supported Arrow Types
136-
137-
The following types are supported by the `arrow_typeof` function:
138-
139-
| Arrow Type |
140-
| ----------------------------------------------------------- |
141-
| `Null` |
142-
| `Boolean` |
143-
| `Int8` |
144-
| `Int16` |
145-
| `Int32` |
146-
| `Int64` |
147-
| `UInt8` |
148-
| `UInt16` |
149-
| `UInt32` |
150-
| `UInt64` |
151-
| `Float16` |
152-
| `Float32` |
153-
| `Float64` |
154-
| `Utf8` |
155-
| `LargeUtf8` |
156-
| `Binary` |
157-
| `Timestamp(Second, None)` |
158-
| `Timestamp(Millisecond, None)` |
159-
| `Timestamp(Microsecond, None)` |
160-
| `Timestamp(Nanosecond, None)` |
161-
| `Time32` |
162-
| `Time64` |
163-
| `Duration(Second)` |
164-
| `Duration(Millisecond)` |
165-
| `Duration(Microsecond)` |
166-
| `Duration(Nanosecond)` |
167-
| `Interval(YearMonth)` |
168-
| `Interval(DayTime)` |
169-
| `Interval(MonthDayNano)` |
170-
| `FixedSizeBinary(<len>)` (e.g. `FixedSizeBinary(16)`) |
171-
| `Decimal128(<precision>, <scale>)` e.g. `Decimal128(3, 10)` |
172-
| `Decimal256(<precision>, <scale>)` e.g. `Decimal256(3, 10)` |

0 commit comments

Comments
 (0)