Skip to content

Commit 7f06f26

Browse files
Add support for direct path load in thin mode.
1 parent 7b54dbc commit 7f06f26

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+3465
-137
lines changed

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
# python-oracledb
22

3-
Python-oracledb is an open-source [Python][python] extension module allowing
4-
Python programs to connect to [Oracle Database][oracledb]. The module conforms
5-
to the [Python Database API 2.0 specification][pep249] with a considerable
6-
number of additions and a couple of minor exclusions, see the [feature
7-
list][features]. It is maintained by Oracle.
8-
9-
Python-oracledb is used for executing SQL and PL/SQL; for calling NoSQL-style
10-
document APIs; for working with data frames; for receiving database
3+
Python-oracledb is the widely used, open-source [Python][python] extension
4+
module allowing Python programs to connect to [Oracle Database][oracledb]. The
5+
module conforms to the [Python Database API 2.0 specification][pep249] with a
6+
considerable number of additions and a couple of minor exclusions, see the
7+
[feature list][features]. It is maintained by Oracle.
8+
9+
Python-oracledb is used for executing SQL and PL/SQL; for working with data
10+
frames; for calling NoSQL-style document APIs; for receiving database
1111
notifications and messages; and for starting and stopping the database. It has
12-
features for high availability and security. It is used by many Python
13-
Frameworks, SQL Generators, ORMs, and libraries.
12+
features for fast data loading, high availability, and security. It is used by
13+
many Python frameworks, SQL generators, ORMs, and libraries.
1414

1515
Synchronous and [concurrent][concurrent] coding styles are supported. Database
1616
operations can optionally be [pipelined][pipelining].

doc/src/api_manual/async_connection.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,14 @@ AsyncConnection Methods
7777

7878
.. versionadded:: 2.1.0
7979

80+
.. automethod:: AsyncConnection.direct_path_load
81+
82+
See :ref:`directpathloads`.
83+
84+
.. versionadded:: 3.4.0
85+
86+
.. dbapimethodextension::
87+
8088
.. automethod:: AsyncConnection.encode_oson
8189

8290
.. versionadded:: 2.1.0

doc/src/api_manual/connection.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,14 @@ Connection Methods
7373

7474
.. dbapimethodextension::
7575

76+
.. automethod:: Connection.direct_path_load
77+
78+
See :ref:`directpathloads`.
79+
80+
.. versionadded:: 3.4.0
81+
82+
.. dbapimethodextension::
83+
7684
.. automethod:: Connection.encode_oson
7785

7886
.. versionadded:: 2.1.0

doc/src/release_notes.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ oracledb `3.4.0 <https://github.com/oracle/python-oracledb/compare/v3.3.0...v3.4
1919
Thin Mode Changes
2020
+++++++++++++++++
2121

22+
#) Added support for Oracle Database's :ref:`Direct Path Load
23+
<directpathloads>` functionality which is very efficient for loading large
24+
datasets into a database.
2225
#) Fixed bug when setting values of type ``datetime.date`` on variables (such
2326
as created by :meth:`Cursor.var()` or implicitly by
2427
:meth:`Cursor.setinputsizes()`) of types

doc/src/user_guide/appendix_a.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,10 @@ For more details see :ref:`driverdiff` and :ref:`upgrading83`.
245245
- No
246246
- Yes
247247
- Yes
248+
* - Direct Path Loads (see :ref:`directpathloads`)
249+
- Yes
250+
- No
251+
- No
248252
* - Oracle Database 23ai JSON-Relational Duality Views (see :ref:`jsondualityviews`)
249253
- Yes
250254
- Yes

doc/src/user_guide/batch_statement.rst

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,11 @@ easily optimize batch insertion, and also allows "noisy" data (values not in a
1414
suitable format) to be filtered for review while other, correct, values are
1515
inserted.
1616

17+
In addition to Oracle Database "Array DML" batch loading,
18+
:ref:`directpathloads` can be used for very fast loading of large data sets if
19+
certain schema criteria can be met. Another option for frequent, small inserts
20+
is to load data using the Oracle Database :ref:`memoptimized`.
21+
1722
Related topics include :ref:`tuning` and :ref:`dataframeformat`.
1823

1924
Batch Statement Execution
@@ -618,3 +623,155 @@ B19E-449D-9968-1121AF06D793>`__ between the databases and using
618623
INSERT INTO SELECT or CREATE AS SELECT.
619624

620625
You can control the data transfer by changing your SELECT statement.
626+
627+
.. _directpathloads:
628+
629+
Direct Path Loads
630+
=================
631+
632+
Direct Path Loads allows data being inserted into Oracle Database to bypass
633+
code layers such as the database buffer cache. Also there are no INSERT
634+
statements used. This can be very efficient for ingestion of huge amounts of
635+
data but, as a consequence of the architecture, there are restrictions on when
636+
Direct Path Loads can be used. For more information see Oracle Database
637+
documentation such as on SQL*Loader `Direct Path Loads
638+
<https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=
639+
GUID-0D576DEF-7918-4DD2-A184-754D217C021F>`__ and on the Oracle Call Interface
640+
`Direct Path Load Interface
641+
<https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=
642+
GUID-596F5F9B-47A1-48DB-8702-FEED7BE038B9>`__.
643+
644+
The end-to-end insertion time when using Direct Path Loads for smaller data
645+
sets may not be faster than using :meth:`Cursor.executemany()`, however there
646+
can still be reduced load on the database.
647+
648+
.. note::
649+
650+
Direct Path Loads are only supported in python-oracledb Thin mode.
651+
652+
Direct Path Loading is performed by the :meth:`Connection.direct_path_load()`
653+
method. For example, if you have the table::
654+
655+
create table TestDirectPathLoad (
656+
id number(9),
657+
name varchar2(20)
658+
);
659+
660+
Then you can load data into it using the code:
661+
662+
.. code-block:: python
663+
664+
SCHEMA_NAME = "HR"
665+
TABLE_NAME = "TESTDIRECTPATHLOAD"
666+
COLUMN_NAMES = ["ID", "NAME"]
667+
DATA = [
668+
(1, "A first row"),
669+
(2, "A second row"),
670+
(3, "A third row"),
671+
]
672+
673+
connection.direct_path_load(
674+
schema_name=SCHEMA_NAME,
675+
table_name=TABLE_NAME,
676+
column_names=COLUMN_NAMES,
677+
data=DATA
678+
)
679+
680+
The records are always implicitly committed.
681+
682+
The ``data`` parameter can be a list of sequences, a :ref:`DataFrame
683+
<oracledataframeobj>` object, or a third-party DataFrame instance that supports
684+
the Apache Arrow PyCapsule Interface, see :ref:`dfppl`.
685+
686+
To load into VECTOR columns, pass an appropriate `Python array.array()
687+
<https://docs.python.org/3/library/array.html>`__ value, or a list of values.
688+
For example, if you have the table::
689+
690+
create table TestDirectPathLoad (
691+
id number(9),
692+
name varchar2(20),
693+
v64 vector(3, float64)
694+
);
695+
696+
Then you can load data into it using the code:
697+
698+
.. code-block:: python
699+
700+
SCHEMA_NAME = "HR"
701+
TABLE_NAME = "TESTDIRECTPATHLOAD"
702+
COLUMN_NAMES = ["ID", "NAME", "V64"]
703+
DATA = [
704+
(1, "A first row", array.array("d", [1, 2, 3])),
705+
(2, "A second row", [4, 5, 6]),
706+
(3, "A third row", array.array("d", [7, 8, 9])),
707+
]
708+
709+
connection.direct_path_load(
710+
schema_name=SCHEMA_NAME,
711+
table_name=TABLE_NAME,
712+
column_names=COLUMN_NAMES,
713+
data=DATA
714+
)
715+
716+
717+
For more on vectors, see :ref:`vectors`.
718+
719+
Runnable Direct Path Load examples are in the `GitHub examples
720+
<https://github.com/oracle/python-oracledb/tree/main/samples>`__ directory.
721+
722+
**Notes on Direct Path Loads**
723+
724+
- Data is implicitly committed.
725+
- Data being inserted into CLOB or BLOB columns must be strings or bytes, not
726+
python-oracledb :ref:`LOB Objects <lobobj>`.
727+
- Insertion of python-oracledb :ref:`DbObjectType Objects <dbobjecttype>` is
728+
not supported
729+
730+
Review Oracle Database documentation for database requirements and
731+
restrictions.
732+
733+
Batching of Direct Path Loads
734+
-----------------------------
735+
736+
If buffer, network, or database limits make it desirable to process smaller
737+
sets of records, you can either make repeated calls to
738+
:meth:`Connection.direct_path_load()` or you can use the ``batch_size``
739+
parameter. For example:
740+
741+
.. code-block:: python
742+
743+
SCHEMA_NAME = "HR"
744+
TABLE_NAME = "TESTDIRECTPATHLOAD"
745+
COLUMN_NAMES = ["ID", "NAME"]
746+
DATA = [
747+
(1, "A first row"),
748+
(2, "A second row"),
749+
. . .
750+
(10_000_000, "Ten millionth row"),
751+
]
752+
753+
connection.direct_path_load(
754+
schema_name=SCHEMA_NAME,
755+
table_name=TABLE_NAME,
756+
column_names=COLUMN_NAMES,
757+
data=DATA,
758+
batch_size=1_000_000
759+
)
760+
761+
This will send the data to the database in batches of 1,000,000 records until
762+
all 10,000,000 records have been inserted.
763+
764+
.. _memoptimized:
765+
766+
Memoptimized Rowstore
767+
=====================
768+
769+
The Memoptimized Rowstore is another Oracle Database feature for data
770+
ingestion, particularly for frequent single row inserts. It can also aid query
771+
performance. Configuration and control is handled by database configuration and
772+
the use of specific SQL statements. As a result, there is no specific
773+
python-oracledb requirement or API needed to take advantage of the feature.
774+
775+
To use the Memoptimized Rowstore see Oracle Database documentation `Enabling
776+
High Performance Data Streaming with the Memoptimized Rowstore
777+
<https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-9752E93D-55A7-4584-B09B-9623B33B5CCF>`__.

doc/src/user_guide/dataframes.rst

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -647,7 +647,12 @@ Inserting Data Frames
647647
Python-oracledb :ref:`DataFrame <oracledataframeobj>` instances, or third-party
648648
DataFrame instances that support the Apache Arrow PyCapsule Interface, can be
649649
inserted into Oracle Database by passing them directly to
650-
:meth:`Cursor.executemany()` or :meth:`AsyncCursor.executemany()`.
650+
:meth:`Cursor.executemany()` or :meth:`AsyncCursor.executemany()`. They can
651+
also be passed to :meth:`Connection.direct_path_load()` and
652+
:meth:`AsyncConnection.direct_path_load()`.
653+
654+
Inserting Data Frames with executemany()
655+
----------------------------------------
651656

652657
For example, with the table::
653658

@@ -686,6 +691,45 @@ For general information about fast data ingestion, and discussion of
686691
:meth:`Cursor.executemany()` and :meth:`AsyncCursor.executemany()` options, see
687692
:ref:`batchstmnt`.
688693

694+
.. _dfppl:
695+
696+
Inserting Data Frames with Direct Path Loads
697+
--------------------------------------------
698+
699+
Very large :ref:`DataFrame <oracledataframeobj>` objects can be efficiently
700+
inserted using Oracle Database Direct Path Loading by passing them to
701+
:meth:`Connection.direct_path_load()`. You can also pass third-party DataFrame
702+
instances that support the Apache Arrow PyCapsule Interface.
703+
704+
See :ref:`directpathloads` for general information about Direct Path Loads.
705+
706+
For example, if the user "HR" has the table::
707+
708+
create table mytab (
709+
id number(9),
710+
name varchar2(100));
711+
712+
The following code will insert a Pandas DataFrame:
713+
714+
.. code-block:: python
715+
716+
import pandas
717+
718+
d = [
719+
(1, "Abigail"),
720+
(2, "Anna"),
721+
(3, "Janey"),
722+
(4, "Jessica"),
723+
]
724+
pdf = pandas.DataFrame(data=d)
725+
726+
connection.direct_path_load(
727+
schema_name="hr",
728+
table_name="mytab",
729+
column_names=["id", "name"],
730+
data=pdf
731+
)
732+
689733
Explicit Conversion to DataFrame or ArrowArray
690734
==============================================
691735

doc/src/user_guide/sql_execution.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1114,8 +1114,8 @@ easily be executed with python-oracledb. For example:
11141114
Do not concatenate or interpolate user data into SQL statements. See
11151115
:ref:`bind` instead.
11161116

1117-
When handling multiple data values, use :meth:`Cursor.executemany()` for
1118-
performance. See :ref:`batchstmnt`
1117+
When handling multiple data values, use :meth:`Cursor.executemany()` or
1118+
:meth:`Connection.direct_path_load()` for performance. See :ref:`batchstmnt`
11191119

11201120
By default data is not committed to the database and other users will not be
11211121
able to see your changes until your connection commits them by calling

doc/src/user_guide/tuning.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,11 @@ Some general tuning tips are:
2121

2222
Make use of efficient python-oracledb functions. For example, to insert
2323
multiple rows use :meth:`Cursor.executemany()` instead of
24-
:meth:`Cursor.execute()`. Another example is to fetch data directly as
25-
:ref:`data frames <dataframeformat>` when working with packages like Pandas
26-
and NumPy.
24+
:meth:`Cursor.execute()`. Alternatively use
25+
:meth:`Connection.direct_path_load()` for inserting very large
26+
datasets. Another example is to fetch data directly as :ref:`data frames
27+
<dataframeformat>` instead of using the traditional query code path when
28+
working with packages like Pandas and NumPy.
2729

2830
* Tune your SQL statements. See the `SQL Tuning Guide
2931
<https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=TGSQL>`__.

0 commit comments

Comments
 (0)