Skip to content

Commit 0989956

Browse files
authored
Document --multiline-fields-format option (#1935)
1 parent df78feb commit 0989956

File tree

1 file changed

+109
-29
lines changed

1 file changed

+109
-29
lines changed

modules/ROOT/pages/tools/neo4j-admin/neo4j-admin-import.adoc

Lines changed: 109 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -81,17 +81,18 @@ The syntax for importing a set of CSV files is:
8181
----
8282
neo4j-admin database import full [-h] [--expand-commands] [--verbose] [--auto-skip-subsequent-headers[=true|false]]
8383
[--ignore-empty-strings[=true|false]] [--ignore-extra-columns[=true|false]]
84-
[--legacy-style-quoting[=true|false]] [--multiline-fields[=true|false]]
85-
[--normalize-types[=true|false]] [--overwrite-destination[=true|false]]
86-
[--skip-bad-entries-logging[=true|false]] [--skip-bad-relationships[=true|false]]
87-
[--skip-duplicate-nodes[=true|false]] [--strict[=true|false]] [--trim-strings
88-
[=true|false]] [--additional-config=<file>] [--array-delimiter=<char>]
89-
[--bad-tolerance=<num>] [--delimiter=<char>] [--format=<format>]
90-
[--high-parallel-io=on|off|auto] [--id-type=string|integer|actual]
91-
[--input-encoding=<character-set>] [--max-off-heap-memory=<size>] [--quote=<char>]
92-
[--read-buffer-size=<size>] [--report-file=<path>] [--schema=<path>] [--threads=<num>]
93-
--nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]
94-
<files>...]... [--relationships=[<type>=]<files>...]... <database>
84+
[--legacy-style-quoting[=true|false]] [--normalize-types[=true|false]]
85+
[--overwrite-destination[=true|false]] [--skip-bad-entries-logging[=true|false]]
86+
[--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes[=true|false]] [--strict
87+
[=true|false]] [--trim-strings[=true|false]] [--additional-config=<file>]
88+
[--array-delimiter=<char>] [--bad-tolerance=<num>] [--delimiter=<char>]
89+
[--format=<format>] [--high-parallel-io=on|off|auto] [--id-type=string|integer|actual]
90+
[--input-encoding=<character-set>] [--input-type=csv|parquet]
91+
[--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
92+
[--report-file=<path>] [--schema=<path>] [--threads=<num>] --nodes=[<label>[:
93+
<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
94+
[--relationships=[<type>=]<files>...]... [--multiline-fields=true|false|<path>[,
95+
<path>] [--multiline-fields-format=v1|v2]] <database>
9596
----
9697

9798
=== Description
@@ -225,12 +226,13 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
225226
It can also be specified as a percentage of the available memory, for example `70%`.
226227
|90%
227228

228-
|--multiline-fields[=true\|false]
229-
|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.
229+
|--multiline-fields=true\|false\|<path>[,<path>]
230+
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can also be specified using regular expressions.
231+
|null
230232

231-
Setting `--multiline-fields=true` can severely degrade the performance of the importer.
232-
Therefore, use it with care, especially with large imports.
233-
|false
233+
|--multiline-fields-format=v1\|v2
234+
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
235+
|null
234236

235237
|--nodes=[<label>[:<label>]...=]<files>...
236238
|Node CSV header and data.
@@ -580,17 +582,19 @@ It is highly recommended to back up your database before running the incremental
580582
----
581583
neo4j-admin database import incremental [-h] [--expand-commands] --force [--verbose] [--auto-skip-subsequent-headers
582584
[=true|false]] [--ignore-empty-strings[=true|false]] [--ignore-extra-columns
583-
[=true|false]] [--legacy-style-quoting[=true|false]] [--multiline-fields
584-
[=true|false]] [--normalize-types[=true|false]] [--skip-bad-entries-logging
585-
[=true|false]] [--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes
586-
[=true|false]] [--strict[=true|false]] [--trim-strings[=true|false]]
585+
[=true|false]] [--legacy-style-quoting[=true|false]] [--normalize-types
586+
[=true|false]] [--skip-bad-entries-logging[=true|false]]
587+
[--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes[=true|false]]
588+
[--strict[=true|false]] [--trim-strings[=true|false]]
587589
[--additional-config=<file>] [--array-delimiter=<char>] [--bad-tolerance=<num>]
588590
[--delimiter=<char>] [--high-parallel-io=on|off|auto]
589591
[--id-type=string|integer|actual] [--input-encoding=<character-set>]
590-
[--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
591-
[--report-file=<path>] [--schema=<path>] [--stage=all|prepare|build|merge]
592-
[--threads=<num>] --nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>
593-
[:<label>]...=]<files>...]... [--relationships=[<type>=]<files>...]... <database>
592+
[--input-type=csv|parquet] [--max-off-heap-memory=<size>] [--quote=<char>]
593+
[--read-buffer-size=<size>] [--report-file=<path>] [--schema=<path>]
594+
[--stage=all|prepare|build|merge] [--threads=<num>] --nodes=[<label>[:
595+
<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
596+
[--relationships=[<type>=]<files>...]... [--multiline-fields=true|false|<path>[,
597+
<path>] [--multiline-fields-format=v1|v2]] <database>
594598
----
595599

596600
=== Description
@@ -741,12 +745,13 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
741745
It can also be specified as a percentage of the available memory, for example `70%`.
742746
|90%
743747

744-
|--multiline-fields[=true\|false]
745-
|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.
748+
|--multiline-fields=true\|false\|<path>[,<path>]
749+
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can also be specified using regular expressions.
750+
|null
746751

747-
Setting `--multiline-fields=true` can severely degrade the performance of the importer.
748-
Therefore, use it with care, especially with large imports.
749-
|false
752+
|--multiline-fields-format=v1\|v2
753+
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
754+
|null
750755

751756
|--nodes=[<label>[:<label>]...=]<files>...
752757
|Node CSV header and data.
@@ -1410,6 +1415,81 @@ neo4j_home$ --nodes persons.csv --nodes games.csv --id-type string
14101415
The `id` property of the nodes in the `persons` group will be stored as `long` type, while the `id` property of the nodes in the `games` group will be stored as `string` type, as the global `id-type` is a string.
14111416
====
14121417

1418+
1419+
== Importing data that spans multiple lines
1420+
1421+
The `--multiline-fields` option allows fields from an input source to span multiple lines, i.e. contain newline characters.
1422+
For example:
1423+
1424+
[source, shell, role=noplay]
1425+
----
1426+
bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=true databasename
1427+
----
1428+
1429+
Where `import/node_data.csv` contains multiline fields, such as:
1430+
1431+
[source, csv, role=nocopy]
1432+
----
1433+
id,name,birthDate,birthYear,birthLocation,description
1434+
1,John,October 1st,2000,New York,This is a multiline
1435+
description
1436+
----
1437+
1438+
[NOTE]
1439+
====
1440+
Setting `--multiline-fields=true` can severely degrade the performance of the importer.
1441+
Therefore, use it with care, especially with large imports.
1442+
====
1443+
1444+
Starting from 5.26, you can optionally specify the format of the `--multiline-fields` to control the parsing of the input source by setting the `--multiline-fields-format` option.
1445+
Possible values are:
1446+
1447+
* `v1` - the default format, which uses the current processing method for multiline fields.
1448+
* `v2` - a more efficient processing method that requires text fields to be quoted.
1449+
For `v2`, the `--multiline-fields` option must be set to a list of files (regular expressions are allowed) that contain multiline fields.
1450+
1451+
Both formats have the restriction that the entirety of every row must be able to fit into the buffer (default is 4m).
1452+
The `--multiline-fields-format` option is available in the `full` and `incremental` import modes.
1453+
1454+
For example:
1455+
1456+
[.tabbed-example]
1457+
=====
1458+
[role=include-with-multiline-fields-format-v1]
1459+
======
1460+
[source, shell, role=noplay]
1461+
----
1462+
bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=true --multiline-fields-format=v1 databasename
1463+
----
1464+
1465+
Where `import/node_data.csv` contains multiline fields, such as:
1466+
1467+
[source, csv, role=nocopy]
1468+
----
1469+
id,name,birthDate,birthYear,birthLocation,description
1470+
1,John,October 1st,2000,New York,This is a multiline
1471+
description
1472+
----
1473+
======
1474+
[role=include-with-multiline-fields-format-v2]
1475+
======
1476+
1477+
[source, shell, role=noplay]
1478+
----
1479+
bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=import/node_data.csv --multiline-fields-format=v2 databasename
1480+
----
1481+
1482+
Where `import/node_data.csv` contains multiline fields, such as:
1483+
1484+
[source, csv, role=nocopy]
1485+
----
1486+
id,name,birthDate,birthYear,birthLocation,description
1487+
1,"John","October 1st",2000,"New York","This is a multiline
1488+
description"
1489+
----
1490+
======
1491+
=====
1492+
14131493
[[import-tool-header-format-skip-columns]]
14141494
== Skipping columns
14151495

0 commit comments

Comments
 (0)