Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 105 additions & 29 deletions modules/ROOT/pages/tools/neo4j-admin/neo4j-admin-import.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -81,17 +81,18 @@ The syntax for importing a set of CSV files is:
----
neo4j-admin database import full [-h] [--expand-commands] [--verbose] [--auto-skip-subsequent-headers[=true|false]]
[--ignore-empty-strings[=true|false]] [--ignore-extra-columns[=true|false]]
[--legacy-style-quoting[=true|false]] [--multiline-fields[=true|false]]
[--normalize-types[=true|false]] [--overwrite-destination[=true|false]]
[--skip-bad-entries-logging[=true|false]] [--skip-bad-relationships[=true|false]]
[--skip-duplicate-nodes[=true|false]] [--strict[=true|false]] [--trim-strings
[=true|false]] [--additional-config=<file>] [--array-delimiter=<char>]
[--bad-tolerance=<num>] [--delimiter=<char>] [--format=<format>]
[--high-parallel-io=on|off|auto] [--id-type=string|integer|actual]
[--input-encoding=<character-set>] [--max-off-heap-memory=<size>] [--quote=<char>]
[--read-buffer-size=<size>] [--report-file=<path>] [--schema=<path>] [--threads=<num>]
--nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]
<files>...]... [--relationships=[<type>=]<files>...]... <database>
[--legacy-style-quoting[=true|false]] [--normalize-types[=true|false]]
[--overwrite-destination[=true|false]] [--skip-bad-entries-logging[=true|false]]
[--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes[=true|false]] [--strict
[=true|false]] [--trim-strings[=true|false]] [--additional-config=<file>]
[--array-delimiter=<char>] [--bad-tolerance=<num>] [--delimiter=<char>]
[--format=<format>] [--high-parallel-io=on|off|auto] [--id-type=string|integer|actual]
[--input-encoding=<character-set>] [--input-type=csv|parquet]
[--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
[--report-file=<path>] [--schema=<path>] [--threads=<num>] --nodes=[<label>[:
<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
[--relationships=[<type>=]<files>...]... [--multiline-fields=true|false|<path>[,
<path>] [--multiline-fields-format=v1|v2]] <database>
----

=== Description
Expand Down Expand Up @@ -225,13 +226,14 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
It can also be specified as a percentage of the available memory, for example `70%`.
|90%

|--multiline-fields[=true\|false]
|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.

Setting `--multiline-fields=true` can severely degrade the performance of the importer.
Therefore, use it with care, especially with large imports.
|--multiline-fields=true\|false\|<path>[,<path>]
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can be specified using regular expressions.
|false

|--multiline-field-format=v1\|v2
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
|v1

|--nodes=[<label>[:<label>]...=]<files>...
|Node CSV header and data.

Expand Down Expand Up @@ -580,17 +582,19 @@ It is highly recommended to back up your database before running the incremental
----
neo4j-admin database import incremental [-h] [--expand-commands] --force [--verbose] [--auto-skip-subsequent-headers
[=true|false]] [--ignore-empty-strings[=true|false]] [--ignore-extra-columns
[=true|false]] [--legacy-style-quoting[=true|false]] [--multiline-fields
[=true|false]] [--normalize-types[=true|false]] [--skip-bad-entries-logging
[=true|false]] [--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes
[=true|false]] [--strict[=true|false]] [--trim-strings[=true|false]]
[=true|false]] [--legacy-style-quoting[=true|false]] [--normalize-types
[=true|false]] [--skip-bad-entries-logging[=true|false]]
[--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes[=true|false]]
[--strict[=true|false]] [--trim-strings[=true|false]]
[--additional-config=<file>] [--array-delimiter=<char>] [--bad-tolerance=<num>]
[--delimiter=<char>] [--high-parallel-io=on|off|auto]
[--id-type=string|integer|actual] [--input-encoding=<character-set>]
[--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
[--report-file=<path>] [--schema=<path>] [--stage=all|prepare|build|merge]
[--threads=<num>] --nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>
[:<label>]...=]<files>...]... [--relationships=[<type>=]<files>...]... <database>
[--input-type=csv|parquet] [--max-off-heap-memory=<size>] [--quote=<char>]
[--read-buffer-size=<size>] [--report-file=<path>] [--schema=<path>]
[--stage=all|prepare|build|merge] [--threads=<num>] --nodes=[<label>[:
<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
[--relationships=[<type>=]<files>...]... [--multiline-fields=true|false|<path>[,
<path>] [--multiline-fields-format=v1|v2]] <database>
----

=== Description
Expand Down Expand Up @@ -741,12 +745,13 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
It can also be specified as a percentage of the available memory, for example `70%`.
|90%

|--multiline-fields[=true\|false]
|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.
|--multiline-fields=true\|false\|<path>[,<path>]
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can be specified using regular expressions.
|null

Setting `--multiline-fields=true` can severely degrade the performance of the importer.
Therefore, use it with care, especially with large imports.
|false
|--multiline-field-format=v1\|v2
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
|v1

|--nodes=[<label>[:<label>]...=]<files>...
|Node CSV header and data.
Expand Down Expand Up @@ -1410,6 +1415,77 @@ neo4j_home$ --nodes persons.csv --nodes games.csv --id-type string
The `id` property of the nodes in the `persons` group will be stored as `long` type, while the `id` property of the nodes in the `games` group will be stored as `string` type, as the global `id-type` is a string.
====


== Importing data that spans multiple lines

The `--multiline-fields` option allows fields from an input source to span multiple lines, i.e. contain newline characters.
For example:

[source, shell, role=noplay]
----
bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=true databasename
----

Where `import/node_data.csv` contains multiline fields, such as:

[source, csv, role=nocopy]
----
id,name,birthDate,birthYear, birthLocation, description
1, John, October 1st, 2000, New York, This is a multiline
description
----

[NOTE]
====
Setting `--multiline-fields=true` can severely degrade the performance of the importer.
Therefore, use it with care, especially with large imports.
====

Starting from 5.26, the `--multiline-fields` option can be used in conjunction with the `--multiline-fields-format` option, which controls the parsing of the input source.
The default value `v1` uses the current processing method for multiline fields.
Option `v2` allows you to specify a list of files (regular expressions allowed) that contain multiline fields and are processed much more performant, with the restriction that text fields must be quoted.
Both formats have the restriction that the entirety of every row must be able to fit into the buffer (default is 4m).
The `--multiline-fields-format` option is available in the `full` and `incremental` import modes.

For example:

[.tabbed-example]
=====
[role=include-with-multiline-fields-format-v1]
======
[source, shell, role=noplay]
----
bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=true --multiline-fields-format=v1 databasename
----

Where `import/node_data.csv` contains multiline fields, such as:

[source, csv, role=nocopy]
----
id,name,birthDate,birthYear, birthLocation, description
1, John, October 1st, 2000, New York, This is a multiline
description
----
======
[role=include-with-multiline-fields-format-v2]
======

[source, shell, role=noplay]
----
bin/neo4j-admin database import full --nodes import/node_header.csv,import/node_data.csv --multiline-fields=import/node_data.csv --multiline-fields-format=v2 databasename
----

Where `import/node_data.csv` contains multiline fields, such as:

[source, csv, role=nocopy]
----
id,name,birthDate,birthYear, birthLocation, description
1,"John","October 1st", "2000","New York", "This is a multiline
description"
----
======
=====

[[import-tool-header-format-skip-columns]]
== Skipping columns

Expand Down