Document --multiline-fields-format option

renetapopova · renetapopova · commit f705abee0466 · 2024-11-06T09:20:30.000Z
diff --git a/modules/ROOT/pages/tools/neo4j-admin/neo4j-admin-import.adoc b/modules/ROOT/pages/tools/neo4j-admin/neo4j-admin-import.adoc
@@ -81,17 +81,18 @@ The syntax for importing a set of CSV files is:
 ----
 neo4j-admin database import full [-h] [--expand-commands] [--verbose] [--auto-skip-subsequent-headers[=true|false]]
                                  [--ignore-empty-strings[=true|false]] [--ignore-extra-columns[=true|false]]
-                                 [--legacy-style-quoting[=true|false]] [--multiline-fields[=true|false]]
-                                 [--normalize-types[=true|false]] [--overwrite-destination[=true|false]]
-                                 [--skip-bad-entries-logging[=true|false]] [--skip-bad-relationships[=true|false]]
-                                 [--skip-duplicate-nodes[=true|false]] [--strict[=true|false]] [--trim-strings
-                                 [=true|false]] [--additional-config=<file>] [--array-delimiter=<char>]
-                                 [--bad-tolerance=<num>] [--delimiter=<char>] [--format=<format>]
-                                 [--high-parallel-io=on|off|auto] [--id-type=string|integer|actual]
-                                 [--input-encoding=<character-set>] [--max-off-heap-memory=<size>] [--quote=<char>]
-                                 [--read-buffer-size=<size>] [--report-file=<path>] [--schema=<path>] [--threads=<num>]
-                                 --nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]
-                                 <files>...]... [--relationships=[<type>=]<files>...]... <database>
+                                 [--legacy-style-quoting[=true|false]] [--normalize-types[=true|false]]
+                                 [--overwrite-destination[=true|false]] [--skip-bad-entries-logging[=true|false]]
+                                 [--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes[=true|false]] [--strict
+                                 [=true|false]] [--trim-strings[=true|false]] [--additional-config=<file>]
+                                 [--array-delimiter=<char>] [--bad-tolerance=<num>] [--delimiter=<char>]
+                                 [--format=<format>] [--high-parallel-io=on|off|auto] [--id-type=string|integer|actual]
+                                 [--input-encoding=<character-set>] [--input-type=csv|parquet]
+                                 [--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
+                                 [--report-file=<path>] [--schema=<path>] [--threads=<num>] --nodes=[<label>[:
+                                 <label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
+                                 [--relationships=[<type>=]<files>...]... [--multiline-fields=true|false|<path>[,
+                                 <path>] [--multiline-fields-format=v1|v2]] <database>
 ----
 
 === Description
@@ -225,13 +226,14 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
 It can also be specified as a percentage of the available memory, for example `70%`.
 |90%
 
-|--multiline-fields[=true\|false]
-|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.
-
-Setting `--multiline-fields=true` can severely degrade the performance of the importer.
-Therefore, use it with care, especially with large imports.
+|--multiline-fields=true\|false\|<path>[,<path>]
+|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can be specified using regular expressions.
 |false
 
+|--multiline-field-format=v1\|v2
+|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
+|v1
+
 |--nodes=[<label>[:<label>]...=]<files>...
 |Node CSV header and data.
 
@@ -580,17 +582,19 @@ It is highly recommended to back up your database before running the incremental
 ----
 neo4j-admin database import incremental [-h] [--expand-commands] --force [--verbose] [--auto-skip-subsequent-headers
                                         [=true|false]] [--ignore-empty-strings[=true|false]] [--ignore-extra-columns
-                                        [=true|false]] [--legacy-style-quoting[=true|false]] [--multiline-fields
-                                        [=true|false]] [--normalize-types[=true|false]] [--skip-bad-entries-logging
-                                        [=true|false]] [--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes
-                                        [=true|false]] [--strict[=true|false]] [--trim-strings[=true|false]]
+                                        [=true|false]] [--legacy-style-quoting[=true|false]] [--normalize-types
+                                        [=true|false]] [--skip-bad-entries-logging[=true|false]]
+                                        [--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes[=true|false]]
+                                        [--strict[=true|false]] [--trim-strings[=true|false]]
                                         [--additional-config=<file>] [--array-delimiter=<char>] [--bad-tolerance=<num>]
                                         [--delimiter=<char>] [--high-parallel-io=on|off|auto]
                                         [--id-type=string|integer|actual] [--input-encoding=<character-set>]
-                                        [--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
-                                        [--report-file=<path>] [--schema=<path>] [--stage=all|prepare|build|merge]
-                                        [--threads=<num>] --nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>
-                                        [:<label>]...=]<files>...]... [--relationships=[<type>=]<files>...]... <database>
+                                        [--input-type=csv|parquet] [--max-off-heap-memory=<size>] [--quote=<char>]
+                                        [--read-buffer-size=<size>] [--report-file=<path>] [--schema=<path>]
+                                        [--stage=all|prepare|build|merge] [--threads=<num>] --nodes=[<label>[:
+                                        <label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
+                                        [--relationships=[<type>=]<files>...]... [--multiline-fields=true|false|<path>[,
+                                        <path>] [--multiline-fields-format=v1|v2]] <database>
 ----
 
 === Description
@@ -741,12 +745,13 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
 It can also be specified as a percentage of the available memory, for example `70%`.
 |90%
 
-|--multiline-fields[=true\|false]
-|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.
+|--multiline-fields=true\|false\|<path>[,<path>]
+|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can be specified using regular expressions.
+|null
 
-Setting `--multiline-fields=true` can severely degrade the performance of the importer.
-Therefore, use it with care, especially with large imports.
-|false
+|--multiline-field-format=v1\|v2
+|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
+|v1
 
 |--nodes=[<label>[:<label>]...=]<files>...
 |Node CSV header and data.
@@ -1410,6 +1415,77 @@ neo4j_home$ --nodes persons.csv --nodes games.csv --id-type string
 The `id` property of the nodes in the `persons` group will be stored as `long` type, while the `id` property of the nodes in the `games` group will be stored as `string` type, as the global `id-type` is a string.
 ====
 
+
+== Importing data that spans multiple lines
+
+The `--multiline-fields` option allows fields from an input source to span multiple lines, i.e. contain newline characters.
+For example:
+
+[source, shell, role=noplay]
+----
+bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=true databasename
+----
+
+Where `import/node_data.csv` contains multiline fields, such as:
+
+[source, csv, role=nocopy]
+----
+id,name,birthDate,birthYear, birthLocation, description
+1, John, October 1st, 2000, New York, This is a multiline
+description
+----
+
+[NOTE]
+====
+Setting `--multiline-fields=true` can severely degrade the performance of the importer.
+Therefore, use it with care, especially with large imports.
+====
+
+Starting from 5.26, the `--multiline-fields` option can be used in conjunction with the `--multiline-fields-format` option, which controls the parsing of the input source.
+The default value `v1` uses the current processing method for multiline fields.
+Option `v2` allows you to specify a list of files (regular expressions allowed) that contain multiline fields and are processed much more performant, with the restriction that text fields must be quoted.
+Both formats have the restriction that the entirety of every row must be able to fit into the buffer (default is 4m).
+The `--multiline-fields-format` option is available in the `full` and `incremental` import modes.
+
+For example:
+
+[.tabbed-example]
+=====
+[role=include-with-multiline-fields-format-v1]
+======
+[source, shell, role=noplay]
+----
+bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=true --multiline-fields-format=v1 databasename
+----
+
+Where `import/node_data.csv` contains multiline fields, such as:
+
+[source, csv, role=nocopy]
+----
+id,name,birthDate,birthYear, birthLocation, description
+1, John, October 1st, 2000, New York, This is a multiline
+description
+----
+======
+[role=include-with-multiline-fields-format-v2]
+======
+
+[source, shell, role=noplay]
+----
+bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=import/node_data.csv --multiline-fields-format=v2 databasename
+----
+
+Where `import/node_data.csv` contains multiline fields, such as:
+
+[source, csv, role=nocopy]
+----
+id,name,birthDate,birthYear, birthLocation, description
+1,"John","October 1st", "2000","New York", "This is a multiline
+description"
+----
+======
+=====
+
 [[import-tool-header-format-skip-columns]]
 == Skipping columns