Skip to content

Commit e92bc7b

Browse files
Support type mapping in Dataframe queries (#494).
1 parent e3ac333 commit e92bc7b

31 files changed

+2164
-145
lines changed

doc/src/api_manual/async_connection.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ AsyncConnection Methods
105105

106106
.. versionchanged:: 3.4.0
107107

108-
The ``fetch_decimals`` parameter was added.
108+
The ``fetch_decimals`` and ``requested_schema`` parameters were added.
109109

110110
.. versionadded:: 3.0.0
111111

@@ -115,7 +115,7 @@ AsyncConnection Methods
115115

116116
.. versionchanged:: 3.4.0
117117

118-
The ``fetch_decimals`` parameter was added.
118+
The ``fetch_decimals`` and ``requested_schema`` parameters were added.
119119

120120
.. versionadded:: 3.0.0
121121

doc/src/api_manual/connection.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ Connection Methods
9595

9696
.. versionchanged:: 3.4.0
9797

98-
The ``fetch_decimals`` parameter was added.
98+
The ``fetch_decimals`` and ``requested_schema`` parameters were added.
9999

100100
.. versionadded:: 3.0.0
101101

@@ -107,7 +107,7 @@ Connection Methods
107107

108108
.. versionchanged:: 3.4.0
109109

110-
The ``fetch_decimals`` parameter was added.
110+
The ``fetch_decimals`` and ``requested_schema`` parameters were added.
111111

112112
.. versionadded:: 3.0.0
113113

doc/src/release_notes.rst

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ Thin Mode Changes
2121

2222
#) Added support for Oracle Database's :ref:`Direct Path Load
2323
<directpathloads>` functionality which is very efficient for loading large
24-
datasets into a database.
24+
datasets into a database. Data may be a list of sequences or a DataFrame
25+
object.
2526
#) Fixed bug when setting values of type ``datetime.date`` on variables (such
2627
as created by :meth:`Cursor.var()` or implicitly by
2728
:meth:`Cursor.setinputsizes()`) of types
@@ -44,16 +45,26 @@ Thick Mode Changes
4445
Common Changes
4546
++++++++++++++
4647

47-
#) Added support for all of the signed and unsigned fixed width integer types
48-
when ingesting data frames supporting the Arrow PyCapsule interface.
49-
Previously only ``int64`` was supported.
50-
#) Added support for types ``date32`` and ``date64`` when ingesting data
51-
frames supporting the Arrow PyCapsule interface as requested
52-
(`issue 535 <https://github.com/oracle/python-oracledb/issues/535>`__).
48+
#) Changes to :ref:`data frame <dataframeformat>` support:
49+
50+
- Support for data frames is no longer considered a pre-release.
51+
- Added parameter ``requested_schema`` to :meth:`Connection.fetch_df_all()`
52+
and :meth:`Connection.fetch_df_batches()` to support type mapping when
53+
querying.
54+
- Added support for all of the signed and unsigned fixed width integer
55+
types when ingesting data frames supporting the Arrow PyCapsule
56+
interface. Previously only ``int64`` was supported.
57+
- Added support for types ``date32`` and ``date64`` when ingesting data
58+
frames supporting the Arrow PyCapsule interface as requested
59+
(`issue 535 <https://github.com/oracle/python-oracledb/issues/535>`__).
60+
- Data frames with multiple chunks are now supported.
61+
- Fixed bug when fetching NCHAR and NVARCHAR2 column data.
62+
- Fixed bug when attempting to convert an integer that cannot be
63+
represented as a native C ``int`` value to an Arrow data frame.
64+
5365
#) Added a ``batch_size`` parameter to :meth:`Cursor.executemany()` and
5466
:meth:`AsyncCursor.executemany()` to let these methods operate on data in
5567
batches.
56-
#) Data frames with multiple chunks are now supported.
5768
#) Added ``fetch_lobs`` and ``fetch_decimals`` parameters where applicable to
5869
the methods used for fetching rows or data frames from the database. Note
5970
that for the creation of pipeline operations, if these parameters are not
@@ -81,12 +92,8 @@ Common Changes
8192
DocumentDisplay?id=742060.1>`__.
8293
#) Pin Cython to 3.1.x instead of 3.1.0 as requested
8394
(`issue 530 <https://github.com/oracle/python-oracledb/issues/530>`__).
84-
#) Support for :ref:`data frames <dataframeformat>` is no longer considered a
85-
pre-release.
8695
#) Fixed bug when attempting to execute an empty statement
8796
(`issue 525 <https://github.com/oracle/python-oracledb/issues/525>`__).
88-
#) Fixed bug when attempting to convert an integer that cannot be represented
89-
as a native C ``int`` value to an Arrow data frame.
9097
#) Fixed bug when attempting to append an element to a
9198
:ref:`DbObject <dbobjecttype>` which is not actually a collection.
9299
#) API documentation is now generated from the source code.

doc/src/user_guide/batch_statement.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -629,7 +629,7 @@ You can control the data transfer by changing your SELECT statement.
629629
Direct Path Loads
630630
=================
631631

632-
Direct Path Loads allows data being inserted into Oracle Database to bypass
632+
Direct Path Loads allow data being inserted into Oracle Database to bypass
633633
code layers such as the database buffer cache. Also there are no INSERT
634634
statements used. This can be very efficient for ingestion of huge amounts of
635635
data but, as a consequence of the architecture, there are restrictions on when

doc/src/user_guide/dataframes.rst

Lines changed: 101 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -110,14 +110,18 @@ Or to iterate:
110110
Data Frame Type Mapping
111111
-----------------------
112112

113+
Default Data Frame Type Mapping
114+
+++++++++++++++++++++++++++++++
115+
113116
Internally, python-oracledb's :ref:`DataFrame <oracledataframeobj>` support
114117
makes use of `Apache nanoarrow <https://arrow.apache.org/nanoarrow/>`__
115118
libraries to build data frames.
116119

117-
The following data type mapping occurs from Oracle Database types to the Arrow
118-
types used in python-oracledb DataFrame objects. Querying any other data types
119-
from Oracle Database will result in an exception. :ref:`Output type handlers
120-
<outputtypehandlers>` cannot be used to map data types.
120+
When querying, the following default data type mapping occurs from Oracle
121+
Database types to the Arrow types used in python-oracledb DataFrame
122+
objects. Querying any other data types from Oracle Database will result in an
123+
exception. :ref:`Output type handlers <outputtypehandlers>` cannot be used to
124+
map data types.
121125

122126
.. list-table-with-summary:: Mapping from Oracle Database to Arrow data types
123127
:header-rows: 1
@@ -258,6 +262,99 @@ When converting Oracle Database DATEs and TIMESTAMPs:
258262
* - 7 - 9
259263
- nanoseconds
260264

265+
Explicit Data Frame Type Mapping
266+
++++++++++++++++++++++++++++++++
267+
268+
You can explicitly set the data types and names that a :ref:`DataFrame
269+
<oracledataframeobj>` will use for query results. This provides fine-grained
270+
control over the physical data representation of the resulting Arrow arrays. It
271+
allows you to specify a representation that is more efficient for its specific
272+
use case. This can reduce memory consumption and improve processing speed.
273+
274+
The parameter ``requested_schema`` parameter to
275+
:meth:`Connection.fetch_df_all()`, :meth:`Connection.fetch_df_batches()`,
276+
:meth:`AsyncConnection.fetch_df_all()`, or
277+
:meth:`AsyncConnection.fetch_df_batches()` should be an object implementing the
278+
`Arrow PyCapsule schema interface
279+
<https://arrow.apache.org/docs/python/generated/pyarrow.Schema.html>`__.
280+
281+
For example, the ``pyarrow.schema()`` factory function can be used to create a
282+
new schema. This takes a list of field definitions as input. Each field can be
283+
a tuple of ``(name, DataType)``:
284+
285+
.. code-block:: python
286+
287+
import pyarrow
288+
289+
# Default fetch
290+
291+
odf = connection.fetch_df_all(
292+
"select 123 c1, 'Scott' c2 from dual"
293+
)
294+
tab = pyarrow.table(odf)
295+
print("Default Output:", tab)
296+
297+
# Fetching with an explicit schema
298+
299+
schema = pyarrow.schema([
300+
("col_1", pyarrow.int16()),
301+
("C2", pyarrow.string())
302+
])
303+
304+
odf = connection.fetch_df_all(
305+
"select 456 c1, 'King' c2 from dual",
306+
requested_schema=schema
307+
)
308+
tab = pyarrow.table(odf)
309+
print("\nNew Output:", tab)
310+
311+
The schema should have an entry for each queried column.
312+
313+
Running the example shows that the number column with the explicit schema was
314+
fetched into the requested type INT16. Its name has also changed::
315+
316+
Default Output: pyarrow.Table
317+
C1: double
318+
C2: string
319+
----
320+
C1: [[123]]
321+
C2: [["Scott"]]
322+
323+
New Output: pyarrow.Table
324+
col_1: int16
325+
C2: string
326+
----
327+
col_1: [[456]]
328+
C2: [["King"]]
329+
330+
**Supported Explicit Type Mapping**
331+
332+
The following table shows the explicit type mappings that are supported. An
333+
error will occur if the database type or the data cannot be represented in the
334+
requested schema type.
335+
336+
.. list-table-with-summary::
337+
:header-rows: 1
338+
:class: wy-table-responsive
339+
:widths: 1 1
340+
:align: left
341+
:summary: The first column is the Oracle Database data type. The second column shows supported Arrow data types.
342+
343+
* - Oracle Database Type
344+
- Arrow Data Types
345+
* - DB_TYPE_NUMBER
346+
- INT8, INT16, INT32, INT64, UINT8, UINT16, UINT32, UINT64, DECIMAL128(p, s), DOUBLE, FLOAT
347+
* - DB_TYPE_RAW, DB_TYPE_LONG_RAW
348+
- BINARY, FIXED SIZE BINARY, LARGE BINARY
349+
* - DB_TYPE_BOOLEAN
350+
- BOOLEAN
351+
* - DB_TYPE_DATE, DB_TYPE_TIMESTAMP, DB_TYPE_TIMESTAMP_LTZ, DB_TYPE_TIMESTAMP_TZ
352+
- DATE32, DATE64, TIMESTAMP
353+
* - DB_TYPE_BINARY_DOUBLE, DB_TYPE_BINARY_FLOAT
354+
- DOUBLE, FLOAT
355+
* - DB_TYPE_VARCHAR, DB_TYPE_CHAR, DB_TYPE_LONG, DB_TYPE_NVARCHAR, DB_TYPE_NCHAR, DB_TYPE_LONG_NVARCHAR
356+
- STRING, LARGE_STRING
357+
261358
.. _convertingodf:
262359

263360
Converting python-oracledb's DataFrame to Other Data Frames

samples/dataframe_types.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# -----------------------------------------------------------------------------
2+
# Copyright (c) 2025, Oracle and/or its affiliates.
3+
#
4+
# This software is dual-licensed to you under the Universal Permissive License
5+
# (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl and Apache License
6+
# 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose
7+
# either license.
8+
#
9+
# If you elect to accept the software under the Apache License, Version 2.0,
10+
# the following applies:
11+
#
12+
# Licensed under the Apache License, Version 2.0 (the "License");
13+
# you may not use this file except in compliance with the License.
14+
# You may obtain a copy of the License at
15+
#
16+
# https://www.apache.org/licenses/LICENSE-2.0
17+
#
18+
# Unless required by applicable law or agreed to in writing, software
19+
# distributed under the License is distributed on an "AS IS" BASIS,
20+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
21+
# See the License for the specific language governing permissions and
22+
# limitations under the License.
23+
# -----------------------------------------------------------------------------
24+
25+
# -----------------------------------------------------------------------------
26+
# dataframe_types.py
27+
#
28+
# Shows how to change the schema types and names of a dataframe
29+
# -----------------------------------------------------------------------------
30+
31+
import pyarrow
32+
33+
import oracledb
34+
import sample_env
35+
36+
# determine whether to use python-oracledb thin mode or thick mode
37+
if sample_env.run_in_thick_mode():
38+
oracledb.init_oracle_client(lib_dir=sample_env.get_oracle_client())
39+
40+
connection = oracledb.connect(
41+
user=sample_env.get_main_user(),
42+
password=sample_env.get_main_password(),
43+
dsn=sample_env.get_connect_string(),
44+
params=sample_env.get_connect_params(),
45+
)
46+
47+
48+
SQL = "select * from SampleQueryTab where id < 5"
49+
50+
# Default fetch with no type mapping
51+
52+
odf = connection.fetch_df_all(SQL)
53+
tab = pyarrow.table(odf)
54+
print("Default Output:", tab)
55+
56+
# Fetching with an explicit schema
57+
58+
schema = pyarrow.schema(
59+
[("COL_1", pyarrow.int16()), ("COL_2", pyarrow.string())]
60+
)
61+
odf = connection.fetch_df_all(SQL, requested_schema=schema)
62+
tab = pyarrow.table(odf)
63+
print("\nNew Output:", tab)

src/oracledb/arrow_impl.pxd

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ cdef class ArrowSchemaImpl:
105105
ArrowSchema *arrow_schema
106106
ArrowType child_arrow_type
107107
int child_element_size
108+
list child_schemas
108109

109110
cdef bint _is_sparse_vector(self) except*
110111
cdef int _set_child_arrow_type(self, ArrowType child_arrow_type) except -1
@@ -122,19 +123,24 @@ cdef class ArrowArrayImpl:
122123
ArrowArray *arrow_array
123124
ArrowSchemaImpl schema_impl
124125

126+
cdef int _extract_int(self, const void* ptr, ArrowType arrow_type,
127+
int64_t index, int64_t* value) except -1
128+
cdef int _extract_uint(self, const void* ptr, ArrowType arrow_type,
129+
int64_t index, uint64_t* value) except -1
125130
cdef int _get_is_null(self, int64_t index, bint* is_null) except -1
126131
cdef int _get_list_info(self, int64_t index, ArrowArray* arrow_array,
127132
int64_t* offset, int64_t* num_elements) except -1
128133
cdef int append_bytes(self, void* ptr, int64_t num_bytes) except -1
129134
cdef int append_decimal(self, void* ptr, int64_t num_bytes) except -1
130135
cdef int append_double(self, double value) except -1
131136
cdef int append_float(self, float value) except -1
132-
cdef int append_int64(self, int64_t value) except -1
137+
cdef int append_int(self, int64_t value) except -1
133138
cdef int append_last_value(self, ArrowArrayImpl array) except -1
134139
cdef int append_null(self) except -1
135140
cdef int append_sparse_vector(self, int64_t num_dimensions,
136141
array.array indices,
137142
array.array values) except -1
143+
cdef int append_uint(self, uint64_t value) except -1
138144
cdef int append_vector(self, array.array value) except -1
139145
cdef int finish_building(self) except -1
140146
cdef int get_bool(self, int64_t index, bint* is_null,

src/oracledb/arrow_impl.pyx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232

3333
cimport cpython
3434

35+
from libc.errno cimport EINVAL
3536
from libc.stdint cimport uintptr_t
3637
from libc.string cimport memcpy, memset, strlen, strchr
3738
from cpython cimport array

src/oracledb/base_impl.pxd

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -337,8 +337,9 @@ cdef class Buffer:
337337
cdef int read_sb4(self, int32_t *value) except -1
338338
cdef int read_sb8(self, int64_t *value) except -1
339339
cdef bytes read_null_terminated_bytes(self)
340-
cdef int read_oracle_data(self, OracleMetadata metadata,
341-
OracleData* data, bint from_dbobject) except -1
340+
cdef object read_oracle_data(self, OracleMetadata metadata,
341+
OracleData* data, bint from_dbobject,
342+
bint decode_str)
342343
cdef object read_str(self, int csfrm, const char* encoding_errors=*)
343344
cdef object read_str_with_length(self)
344345
cdef int read_ub1(self, uint8_t *value) except -1
@@ -495,6 +496,10 @@ cdef class OracleMetadata:
495496
cdef int _create_arrow_schema(self) except -1
496497
cdef int _finalize_init(self) except -1
497498
cdef int _set_arrow_schema(self, ArrowSchemaImpl schema_impl) except -1
499+
cdef int check_convert_from_arrow(self,
500+
ArrowSchemaImpl schema_impl) except -1
501+
cdef int check_convert_to_arrow(self,
502+
ArrowSchemaImpl schema_impl) except -1
498503
cdef OracleMetadata copy(self)
499504
@staticmethod
500505
cdef OracleMetadata from_arrow_schema(ArrowSchemaImpl schema_impl)
@@ -718,6 +723,7 @@ cdef class BaseCursorImpl:
718723
public bint suspend_on_success
719724
public bint fetch_lobs
720725
public bint fetch_decimals
726+
public ArrowSchemaImpl schema_impl
721727
uint32_t _buffer_rowcount
722728
uint32_t _buffer_index
723729
uint32_t _fetch_array_size

src/oracledb/base_impl.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ from libc cimport errno
3939
from libc.stdint cimport int8_t, int16_t, int32_t, int64_t
4040
from libc.stdint cimport uint8_t, uint16_t, uint32_t, uint64_t
4141
from libc.stdint cimport UINT8_MAX, UINT16_MAX, UINT32_MAX, UINT64_MAX
42-
from libc.stdlib cimport strtod, strtoll
42+
from libc.stdlib cimport strtod, strtof, strtoll, strtoull
4343
from libc.string cimport memcpy
4444
from cpython cimport array
4545
from cpython.conversion cimport PyOS_snprintf

0 commit comments

Comments
 (0)