Skip to content

Commit 1ffc766

Browse files
paleolimbotpitrou
andauthored
GH-46599: [C++][Doc][Parquet] Update supported types documentation (#46620)
### Rationale for this change We now support more types but the documentation suggested that some weren't supported. ### What changes are included in this PR? Documentation was updated to reflect the status of supported types. ### Are these changes tested? No code changes! ### Are there any user-facing changes? No * GitHub Issue: #46599 Lead-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
1 parent c2fb0e3 commit 1ffc766

File tree

1 file changed

+51
-35
lines changed

1 file changed

+51
-35
lines changed

docs/source/cpp/parquet.rst

Lines changed: 51 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -450,38 +450,46 @@ Logical types
450450
Specific logical types can override the default Arrow type mapping for a given
451451
physical type.
452452

453-
+-------------------+-----------------------------+----------------------------+---------+
454-
| Logical type | Physical type | Mapped Arrow type | Notes |
455-
+===================+=============================+============================+=========+
456-
| NULL | Any | Null | \(1) |
457-
+-------------------+-----------------------------+----------------------------+---------+
458-
| INT | INT32 | Int8 / UInt8 / Int16 / | |
459-
| | | UInt16 / Int32 / UInt32 | |
460-
+-------------------+-----------------------------+----------------------------+---------+
461-
| INT | INT64 | Int64 / UInt64 | |
462-
+-------------------+-----------------------------+----------------------------+---------+
463-
| DECIMAL | INT32 / INT64 / BYTE_ARRAY | Decimal128 / Decimal256 | \(2) |
464-
| | / FIXED_LENGTH_BYTE_ARRAY | | |
465-
+-------------------+-----------------------------+----------------------------+---------+
466-
| DATE | INT32 | Date32 | \(3) |
467-
+-------------------+-----------------------------+----------------------------+---------+
468-
| TIME | INT32 | Time32 (milliseconds) | |
469-
+-------------------+-----------------------------+----------------------------+---------+
470-
| TIME | INT64 | Time64 (micro- or | |
471-
| | | nanoseconds) | |
472-
+-------------------+-----------------------------+----------------------------+---------+
473-
| TIMESTAMP | INT64 | Timestamp (milli-, micro- | |
474-
| | | or nanoseconds) | |
475-
+-------------------+-----------------------------+----------------------------+---------+
476-
| STRING | BYTE_ARRAY | String / LargeString / | |
477-
| | | StringView | |
478-
+-------------------+-----------------------------+----------------------------+---------+
479-
| LIST | Any | List | \(4) |
480-
+-------------------+-----------------------------+----------------------------+---------+
481-
| MAP | Any | Map | \(5) |
482-
+-------------------+-----------------------------+----------------------------+---------+
483-
| FLOAT16 | FIXED_LENGTH_BYTE_ARRAY | HalfFloat | |
484-
+-------------------+-----------------------------+----------------------------+---------+
453+
+-------------------+-----------------------------+------------------------------+-----------+
454+
| Logical type | Physical type | Mapped Arrow type | Notes |
455+
+===================+=============================+==============================+===========+
456+
| NULL | Any | Null | \(1) |
457+
+-------------------+-----------------------------+------------------------------+-----------+
458+
| INT | INT32 | Int8 / UInt8 / Int16 / | |
459+
| | | UInt16 / Int32 / UInt32 | |
460+
+-------------------+-----------------------------+------------------------------+-----------+
461+
| INT | INT64 | Int64 / UInt64 | |
462+
+-------------------+-----------------------------+------------------------------+-----------+
463+
| DECIMAL | INT32 / INT64 / BYTE_ARRAY | Decimal128 / Decimal256 | \(2) |
464+
| | / FIXED_LENGTH_BYTE_ARRAY | | |
465+
+-------------------+-----------------------------+------------------------------+-----------+
466+
| DATE | INT32 | Date32 | \(3) |
467+
+-------------------+-----------------------------+------------------------------+-----------+
468+
| TIME | INT32 | Time32 (milliseconds) | |
469+
+-------------------+-----------------------------+------------------------------+-----------+
470+
| TIME | INT64 | Time64 (micro- or | |
471+
| | | nanoseconds) | |
472+
+-------------------+-----------------------------+------------------------------+-----------+
473+
| TIMESTAMP | INT64 | Timestamp (milli-, micro- | |
474+
| | | or nanoseconds) | |
475+
+-------------------+-----------------------------+------------------------------+-----------+
476+
| STRING | BYTE_ARRAY | String / LargeString / | |
477+
| | | StringView | |
478+
+-------------------+-----------------------------+------------------------------+-----------+
479+
| LIST | Any | List | \(4) |
480+
+-------------------+-----------------------------+------------------------------+-----------+
481+
| MAP | Any | Map | \(5) |
482+
+-------------------+-----------------------------+------------------------------+-----------+
483+
| FLOAT16 | FIXED_LENGTH_BYTE_ARRAY | HalfFloat | |
484+
+-------------------+-----------------------------+------------------------------+-----------+
485+
| UUID | FIXED_LENGTH_BYTE_ARRAY | Extension (``arrow.uuid``) | \(6) |
486+
+-------------------+-----------------------------+------------------------------+-----------+
487+
| JSON | BYTE_ARRAY | Extension (``arrow.json``) | \(6) |
488+
+-------------------+-----------------------------+------------------------------+-----------+
489+
| GEOMETRY | BYTE_ARRAY | Extension (``geoarrow.wkb``) | \(6) \(7) |
490+
+-------------------+-----------------------------+------------------------------+-----------+
491+
| GEOGRAPHY | BYTE_ARRAY | Extension (``geoarrow.wkb``) | \(6) \(7) |
492+
+-------------------+-----------------------------+------------------------------+-----------+
485493

486494
* \(1) On the write side, the Parquet physical type INT32 is generated.
487495

@@ -496,9 +504,14 @@ physical type.
496504
in contradiction with the
497505
`Parquet specification <https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps>`__.
498506

499-
*Unsupported logical types:* JSON, BSON, UUID. If such a type is encountered
507+
* \(6) Requires that ``arrow_extensions_enabled`` in ``ArrowReaderProperties`` is ``true``.
508+
When ``false``, the underlying storage type is read.
509+
510+
* \(7) Requires that the ``geoarrow.wkb`` extension type is registered.
511+
512+
*Unsupported logical types:* BSON. If such a type is encountered
500513
when reading a Parquet file, the default physical type mapping is used (for
501-
example, a Parquet JSON column may be read as Arrow Binary or FixedSizeBinary).
514+
example, a Parquet BSON column may be read as Arrow Binary or FixedSizeBinary).
502515

503516
Converted types
504517
~~~~~~~~~~~~~~~
@@ -513,7 +526,10 @@ Special cases
513526

514527
An Arrow Extension type is written out as its storage type. It can still
515528
be recreated at read time using Parquet metadata (see "Roundtripping Arrow
516-
types" below).
529+
types" below). Some extension types have Parquet LogicalType equivalents
530+
(e.g., UUID, JSON, GEOMETRY, GEOGRAPHY). These are created automatically
531+
if the appropriate option is set in the ``ArrowReaderProperties`` even if
532+
there was no Arrow schema stored in the Parquet metadata.
517533

518534
An Arrow Dictionary type is written out as its value type. It can still
519535
be recreated at read time using Parquet metadata (see "Roundtripping Arrow

0 commit comments

Comments
 (0)