Skip to content

Commit f705abe

Browse files
committed
Document --multiline-fields-format option
1 parent 9aa67a9 commit f705abe

File tree

1 file changed

+105
-29
lines changed

1 file changed

+105
-29
lines changed

modules/ROOT/pages/tools/neo4j-admin/neo4j-admin-import.adoc

Lines changed: 105 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -81,17 +81,18 @@ The syntax for importing a set of CSV files is:
8181
----
8282
neo4j-admin database import full [-h] [--expand-commands] [--verbose] [--auto-skip-subsequent-headers[=true|false]]
8383
[--ignore-empty-strings[=true|false]] [--ignore-extra-columns[=true|false]]
84-
[--legacy-style-quoting[=true|false]] [--multiline-fields[=true|false]]
85-
[--normalize-types[=true|false]] [--overwrite-destination[=true|false]]
86-
[--skip-bad-entries-logging[=true|false]] [--skip-bad-relationships[=true|false]]
87-
[--skip-duplicate-nodes[=true|false]] [--strict[=true|false]] [--trim-strings
88-
[=true|false]] [--additional-config=<file>] [--array-delimiter=<char>]
89-
[--bad-tolerance=<num>] [--delimiter=<char>] [--format=<format>]
90-
[--high-parallel-io=on|off|auto] [--id-type=string|integer|actual]
91-
[--input-encoding=<character-set>] [--max-off-heap-memory=<size>] [--quote=<char>]
92-
[--read-buffer-size=<size>] [--report-file=<path>] [--schema=<path>] [--threads=<num>]
93-
--nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]
94-
<files>...]... [--relationships=[<type>=]<files>...]... <database>
84+
[--legacy-style-quoting[=true|false]] [--normalize-types[=true|false]]
85+
[--overwrite-destination[=true|false]] [--skip-bad-entries-logging[=true|false]]
86+
[--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes[=true|false]] [--strict
87+
[=true|false]] [--trim-strings[=true|false]] [--additional-config=<file>]
88+
[--array-delimiter=<char>] [--bad-tolerance=<num>] [--delimiter=<char>]
89+
[--format=<format>] [--high-parallel-io=on|off|auto] [--id-type=string|integer|actual]
90+
[--input-encoding=<character-set>] [--input-type=csv|parquet]
91+
[--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
92+
[--report-file=<path>] [--schema=<path>] [--threads=<num>] --nodes=[<label>[:
93+
<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
94+
[--relationships=[<type>=]<files>...]... [--multiline-fields=true|false|<path>[,
95+
<path>] [--multiline-fields-format=v1|v2]] <database>
9596
----
9697

9798
=== Description
@@ -225,13 +226,14 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
225226
It can also be specified as a percentage of the available memory, for example `70%`.
226227
|90%
227228

228-
|--multiline-fields[=true\|false]
229-
|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.
230-
231-
Setting `--multiline-fields=true` can severely degrade the performance of the importer.
232-
Therefore, use it with care, especially with large imports.
229+
|--multiline-fields=true\|false\|<path>[,<path>]
230+
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can be specified using regular expressions.
233231
|false
234232

233+
|--multiline-field-format=v1\|v2
234+
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
235+
|v1
236+
235237
|--nodes=[<label>[:<label>]...=]<files>...
236238
|Node CSV header and data.
237239

@@ -580,17 +582,19 @@ It is highly recommended to back up your database before running the incremental
580582
----
581583
neo4j-admin database import incremental [-h] [--expand-commands] --force [--verbose] [--auto-skip-subsequent-headers
582584
[=true|false]] [--ignore-empty-strings[=true|false]] [--ignore-extra-columns
583-
[=true|false]] [--legacy-style-quoting[=true|false]] [--multiline-fields
584-
[=true|false]] [--normalize-types[=true|false]] [--skip-bad-entries-logging
585-
[=true|false]] [--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes
586-
[=true|false]] [--strict[=true|false]] [--trim-strings[=true|false]]
585+
[=true|false]] [--legacy-style-quoting[=true|false]] [--normalize-types
586+
[=true|false]] [--skip-bad-entries-logging[=true|false]]
587+
[--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes[=true|false]]
588+
[--strict[=true|false]] [--trim-strings[=true|false]]
587589
[--additional-config=<file>] [--array-delimiter=<char>] [--bad-tolerance=<num>]
588590
[--delimiter=<char>] [--high-parallel-io=on|off|auto]
589591
[--id-type=string|integer|actual] [--input-encoding=<character-set>]
590-
[--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
591-
[--report-file=<path>] [--schema=<path>] [--stage=all|prepare|build|merge]
592-
[--threads=<num>] --nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>
593-
[:<label>]...=]<files>...]... [--relationships=[<type>=]<files>...]... <database>
592+
[--input-type=csv|parquet] [--max-off-heap-memory=<size>] [--quote=<char>]
593+
[--read-buffer-size=<size>] [--report-file=<path>] [--schema=<path>]
594+
[--stage=all|prepare|build|merge] [--threads=<num>] --nodes=[<label>[:
595+
<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
596+
[--relationships=[<type>=]<files>...]... [--multiline-fields=true|false|<path>[,
597+
<path>] [--multiline-fields-format=v1|v2]] <database>
594598
----
595599

596600
=== Description
@@ -741,12 +745,13 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
741745
It can also be specified as a percentage of the available memory, for example `70%`.
742746
|90%
743747

744-
|--multiline-fields[=true\|false]
745-
|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.
748+
|--multiline-fields=true\|false\|<path>[,<path>]
749+
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can be specified using regular expressions.
750+
|null
746751

747-
Setting `--multiline-fields=true` can severely degrade the performance of the importer.
748-
Therefore, use it with care, especially with large imports.
749-
|false
752+
|--multiline-field-format=v1\|v2
753+
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
754+
|v1
750755

751756
|--nodes=[<label>[:<label>]...=]<files>...
752757
|Node CSV header and data.
@@ -1410,6 +1415,77 @@ neo4j_home$ --nodes persons.csv --nodes games.csv --id-type string
14101415
The `id` property of the nodes in the `persons` group will be stored as `long` type, while the `id` property of the nodes in the `games` group will be stored as `string` type, as the global `id-type` is a string.
14111416
====
14121417

1418+
1419+
== Importing data that spans multiple lines
1420+
1421+
The `--multiline-fields` option allows fields from an input source to span multiple lines, i.e. contain newline characters.
1422+
For example:
1423+
1424+
[source, shell, role=noplay]
1425+
----
1426+
bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=true databasename
1427+
----
1428+
1429+
Where `import/node_data.csv` contains multiline fields, such as:
1430+
1431+
[source, csv, role=nocopy]
1432+
----
1433+
id,name,birthDate,birthYear, birthLocation, description
1434+
1, John, October 1st, 2000, New York, This is a multiline
1435+
description
1436+
----
1437+
1438+
[NOTE]
1439+
====
1440+
Setting `--multiline-fields=true` can severely degrade the performance of the importer.
1441+
Therefore, use it with care, especially with large imports.
1442+
====
1443+
1444+
Starting from 5.26, the `--multiline-fields` option can be used in conjunction with the `--multiline-fields-format` option, which controls the parsing of the input source.
1445+
The default value `v1` uses the current processing method for multiline fields.
1446+
Option `v2` allows you to specify a list of files (regular expressions allowed) that contain multiline fields and are processed much more performant, with the restriction that text fields must be quoted.
1447+
Both formats have the restriction that the entirety of every row must be able to fit into the buffer (default is 4m).
1448+
The `--multiline-fields-format` option is available in the `full` and `incremental` import modes.
1449+
1450+
For example:
1451+
1452+
[.tabbed-example]
1453+
=====
1454+
[role=include-with-multiline-fields-format-v1]
1455+
======
1456+
[source, shell, role=noplay]
1457+
----
1458+
bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=true --multiline-fields-format=v1 databasename
1459+
----
1460+
1461+
Where `import/node_data.csv` contains multiline fields, such as:
1462+
1463+
[source, csv, role=nocopy]
1464+
----
1465+
id,name,birthDate,birthYear, birthLocation, description
1466+
1, John, October 1st, 2000, New York, This is a multiline
1467+
description
1468+
----
1469+
======
1470+
[role=include-with-multiline-fields-format-v2]
1471+
======
1472+
1473+
[source, shell, role=noplay]
1474+
----
1475+
bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=import/node_data.csv --multiline-fields-format=v2 databasename
1476+
----
1477+
1478+
Where `import/node_data.csv` contains multiline fields, such as:
1479+
1480+
[source, csv, role=nocopy]
1481+
----
1482+
id,name,birthDate,birthYear, birthLocation, description
1483+
1,"John","October 1st", "2000","New York", "This is a multiline
1484+
description"
1485+
----
1486+
======
1487+
=====
1488+
14131489
[[import-tool-header-format-skip-columns]]
14141490
== Skipping columns
14151491

0 commit comments

Comments
 (0)