@@ -14,6 +14,11 @@ easily optimize batch insertion, and also allows "noisy" data (values not in a
14
14
suitable format) to be filtered for review while other, correct, values are
15
15
inserted.
16
16
17
+ In addition to Oracle Database "Array DML" batch loading,
18
+ :ref: `directpathloads ` can be used for very fast loading of large data sets if
19
+ certain schema criteria can be met. Another option for frequent, small inserts
20
+ is to load data using the Oracle Database :ref: `memoptimized `.
21
+
17
22
Related topics include :ref: `tuning ` and :ref: `dataframeformat `.
18
23
19
24
Batch Statement Execution
@@ -618,3 +623,155 @@ B19E-449D-9968-1121AF06D793>`__ between the databases and using
618
623
INSERT INTO SELECT or CREATE AS SELECT.
619
624
620
625
You can control the data transfer by changing your SELECT statement.
626
+
627
+ .. _directpathloads :
628
+
629
+ Direct Path Loads
630
+ =================
631
+
632
+ Direct Path Loads allows data being inserted into Oracle Database to bypass
633
+ code layers such as the database buffer cache. Also there are no INSERT
634
+ statements used. This can be very efficient for ingestion of huge amounts of
635
+ data but, as a consequence of the architecture, there are restrictions on when
636
+ Direct Path Loads can be used. For more information see Oracle Database
637
+ documentation such as on SQL*Loader `Direct Path Loads
638
+ <https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=
639
+ GUID-0D576DEF-7918-4DD2-A184-754D217C021F> `__ and on the Oracle Call Interface
640
+ `Direct Path Load Interface
641
+ <https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=
642
+ GUID-596F5F9B-47A1-48DB-8702-FEED7BE038B9> `__.
643
+
644
+ The end-to-end insertion time when using Direct Path Loads for smaller data
645
+ sets may not be faster than using :meth: `Cursor.executemany() `, however there
646
+ can still be reduced load on the database.
647
+
648
+ .. note ::
649
+
650
+ Direct Path Loads are only supported in python-oracledb Thin mode.
651
+
652
+ Direct Path Loading is performed by the :meth: `Connection.direct_path_load() `
653
+ method. For example, if you have the table::
654
+
655
+ create table TestDirectPathLoad (
656
+ id number(9),
657
+ name varchar2(20)
658
+ );
659
+
660
+ Then you can load data into it using the code:
661
+
662
+ .. code-block :: python
663
+
664
+ SCHEMA_NAME = " HR"
665
+ TABLE_NAME = " TESTDIRECTPATHLOAD"
666
+ COLUMN_NAMES = [" ID" , " NAME" ]
667
+ DATA = [
668
+ (1 , " A first row" ),
669
+ (2 , " A second row" ),
670
+ (3 , " A third row" ),
671
+ ]
672
+
673
+ connection.direct_path_load(
674
+ schema_name = SCHEMA_NAME ,
675
+ table_name = TABLE_NAME ,
676
+ column_names = COLUMN_NAMES ,
677
+ data = DATA
678
+ )
679
+
680
+ The records are always implicitly committed.
681
+
682
+ The ``data `` parameter can be a list of sequences, a :ref: `DataFrame
683
+ <oracledataframeobj>` object, or a third-party DataFrame instance that supports
684
+ the Apache Arrow PyCapsule Interface, see :ref: `dfppl `.
685
+
686
+ To load into VECTOR columns, pass an appropriate `Python array.array()
687
+ <https://docs.python.org/3/library/array.html> `__ value, or a list of values.
688
+ For example, if you have the table::
689
+
690
+ create table TestDirectPathLoad (
691
+ id number(9),
692
+ name varchar2(20),
693
+ v64 vector(3, float64)
694
+ );
695
+
696
+ Then you can load data into it using the code:
697
+
698
+ .. code-block :: python
699
+
700
+ SCHEMA_NAME = " HR"
701
+ TABLE_NAME = " TESTDIRECTPATHLOAD"
702
+ COLUMN_NAMES = [" ID" , " NAME" , " V64" ]
703
+ DATA = [
704
+ (1 , " A first row" , array.array(" d" , [1 , 2 , 3 ])),
705
+ (2 , " A second row" , [4 , 5 , 6 ]),
706
+ (3 , " A third row" , array.array(" d" , [7 , 8 , 9 ])),
707
+ ]
708
+
709
+ connection.direct_path_load(
710
+ schema_name = SCHEMA_NAME ,
711
+ table_name = TABLE_NAME ,
712
+ column_names = COLUMN_NAMES ,
713
+ data = DATA
714
+ )
715
+
716
+
717
+ For more on vectors, see :ref: `vectors `.
718
+
719
+ Runnable Direct Path Load examples are in the `GitHub examples
720
+ <https://github.com/oracle/python-oracledb/tree/main/samples> `__ directory.
721
+
722
+ **Notes on Direct Path Loads **
723
+
724
+ - Data is implicitly committed.
725
+ - Data being inserted into CLOB or BLOB columns must be strings or bytes, not
726
+ python-oracledb :ref: `LOB Objects <lobobj >`.
727
+ - Insertion of python-oracledb :ref: `DbObjectType Objects <dbobjecttype >` is
728
+ not supported
729
+
730
+ Review Oracle Database documentation for database requirements and
731
+ restrictions.
732
+
733
+ Batching of Direct Path Loads
734
+ -----------------------------
735
+
736
+ If buffer, network, or database limits make it desirable to process smaller
737
+ sets of records, you can either make repeated calls to
738
+ :meth: `Connection.direct_path_load() ` or you can use the ``batch_size ``
739
+ parameter. For example:
740
+
741
+ .. code-block :: python
742
+
743
+ SCHEMA_NAME = " HR"
744
+ TABLE_NAME = " TESTDIRECTPATHLOAD"
745
+ COLUMN_NAMES = [" ID" , " NAME" ]
746
+ DATA = [
747
+ (1 , " A first row" ),
748
+ (2 , " A second row" ),
749
+ . . .
750
+ (10_000_000 , " Ten millionth row" ),
751
+ ]
752
+
753
+ connection.direct_path_load(
754
+ schema_name = SCHEMA_NAME ,
755
+ table_name = TABLE_NAME ,
756
+ column_names = COLUMN_NAMES ,
757
+ data = DATA ,
758
+ batch_size = 1_000_000
759
+ )
760
+
761
+ This will send the data to the database in batches of 1,000,000 records until
762
+ all 10,000,000 records have been inserted.
763
+
764
+ .. _memoptimized :
765
+
766
+ Memoptimized Rowstore
767
+ =====================
768
+
769
+ The Memoptimized Rowstore is another Oracle Database feature for data
770
+ ingestion, particularly for frequent single row inserts. It can also aid query
771
+ performance. Configuration and control is handled by database configuration and
772
+ the use of specific SQL statements. As a result, there is no specific
773
+ python-oracledb requirement or API needed to take advantage of the feature.
774
+
775
+ To use the Memoptimized Rowstore see Oracle Database documentation `Enabling
776
+ High Performance Data Streaming with the Memoptimized Rowstore
777
+ <https://www.oracle.com/pls/topic/lookup?ctx=dblatest&id=GUID-9752E93D-55A7-4584-B09B-9623B33B5CCF> `__.
0 commit comments