Skip to content
Closed
104 changes: 82 additions & 22 deletions modules/ROOT/pages/tools/neo4j-admin/neo4j-admin-import.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@

:rfc-4180: https://tools.ietf.org/html/rfc4180

`neo4j-admin database import` writes CSV data into Neo4j's native file format as fast as possible. You should use this tool when:
`neo4j-admin database import` writes CSV data into Neo4j's native file format as fast as possible. +
From Neo4j 5.25, Neo4j also provides support for the Parquet file format in a public beta version.

You should use this tool when:

* Import performance is important because you have a large amount of data (millions/billions of entities).
* The database can be taken offline and you have direct access to one of the servers hosting your Neo4j DBMS.
Expand Down Expand Up @@ -78,20 +81,21 @@ See <<indexes-constraints-import, Provide indexes and constraints during import>

The syntax for importing a set of CSV files is:

[source, syntax, role="nocopy"]
----
neo4j-admin database import full [-h] [--expand-commands] [--verbose] [--auto-skip-subsequent-headers[=true|false]]
[--ignore-empty-strings[=true|false]] [--ignore-extra-columns[=true|false]]
[--legacy-style-quoting[=true|false]] [--multiline-fields[=true|false]]
[--normalize-types[=true|false]] [--overwrite-destination[=true|false]]
[--skip-bad-entries-logging[=true|false]] [--skip-bad-relationships[=true|false]]
[--skip-duplicate-nodes[=true|false]] [--strict[=true|false]] [--trim-strings
[=true|false]] [--additional-config=<file>] [--array-delimiter=<char>]
[--bad-tolerance=<num>] [--delimiter=<char>] [--format=<format>]
[--high-parallel-io=on|off|auto] [--id-type=string|integer|actual]
[--input-encoding=<character-set>] [--max-off-heap-memory=<size>] [--quote=<char>]
[--skip-duplicate-nodes[=true|false]] [--strict[=true|false]] [--trim-strings[=true|false]]
[--additional-config=<file>] [--array-delimiter=<char>] [--bad-tolerance=<num>]
[--delimiter=<char>] [--format=<format>] [--high-parallel-io=on|off|auto]
[--id-type=string|integer|actual] [--input-encoding=<character-set>]
--input-type=csv|parquet [--max-off-heap-memory=<size>] [--quote=<char>]
[--read-buffer-size=<size>] [--report-file=<path>] [--schema=<path>] [--threads=<num>]
--nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]
<files>...]... [--relationships=[<type>=]<files>...]... <database>
--nodes=[<label>[: <label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
[--relationships=[<type>=]<files>...]... <database>
----

=== Description
Expand Down Expand Up @@ -123,6 +127,29 @@ For more information, please contact Neo4j Professional Services.

=== Options

[role=label--beta]
.Parquet file support
[NOTE]
====
Starting with Neo4j 5.25, Neo4j provides support for the Parquet file format in a public beta version.
The additional parameter `--input-type [csv|parquet]` is introduced to explicitly tell the importer to use either CSV or Parquet.
Its value defaults to CSV if it is not defined.

Most of the parameters that can be used to configure the import are also valid for the Parquet format.
The following parameters are not supported (see <<full-import-options-table, `neo4j-admin database import full` options>> table for more details):

- `--auto-skip-subsequent-headers`
- `--delimiter`
- `--ignore-extra-columns`
- `--input-encoding`
- `--multiline-fields`
- `--quote`
- `--trim-strings`

The xref:tools/neo4j-admin/neo4j-admin-import.adoc#import-tool-examples[examples] for CSV can also be used with Parquet.
====

[[full-import-options-table]]
.`neo4j-admin database import full` options
[options="header", cols="5m,10a,2m"]
|===
Expand Down Expand Up @@ -214,6 +241,11 @@ Possible values are:
|Character set that input data is encoded in.
|UTF-8

|--input-type=csv\|parquet
label:beta[]
|File type to import from. Can be csv or parquet. Defaults to csv.
|csv

|--legacy-style-quoting[=true\|false]
|Whether or not a backslash-escaped quote e.g. \" is interpreted as an inner quote.
|false
Expand Down Expand Up @@ -459,7 +491,7 @@ For example:

[source, shell, role=noplay]
----
bin/neo4j-admin database import full neo4j --nodes=import/movies.csv --nodes=import/actors.csv --relationships=import/roles.csv --schema=import/schema.cypher
bin/neo4j-admin database import full neo4j --nodes=import/movies.csv --nodes=import/actors.csv --relationships=import/roles.csv --schema=import/schema.cypher
----


Expand Down Expand Up @@ -575,21 +607,21 @@ It is highly recommended to back up your database before running the incremental
[[import-tool-incremental-syntax]]
=== Syntax

[source, shell, role=noplay]
The syntax for importing a set of CSV files incrementally is:

[source, syntax, role="nocopy"]
----
neo4j-admin database import incremental [-h] [--expand-commands] --force [--verbose] [--auto-skip-subsequent-headers
[=true|false]] [--ignore-empty-strings[=true|false]] [--ignore-extra-columns
[=true|false]] [--legacy-style-quoting[=true|false]] [--multiline-fields
[=true|false]] [--normalize-types[=true|false]] [--skip-bad-entries-logging
[=true|false]] [--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes
[=true|false]] [--strict[=true|false]] [--trim-strings[=true|false]]
[--additional-config=<file>] [--array-delimiter=<char>] [--bad-tolerance=<num>]
[--delimiter=<char>] [--high-parallel-io=on|off|auto]
[--id-type=string|integer|actual] [--input-encoding=<character-set>]
[--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
neo4j-admin database import incremental [-h] [--expand-commands] --force [--verbose] [--auto-skip-subsequent-headers[=true|false]]
[--ignore-empty-strings[=true|false]] [--ignore-extra-columns[=true|false]]
[--legacy-style-quoting[=true|false]] [--multiline-fields[=true|false]] [--normalize-types[=true|false]]
[--skip-bad-entries-logging[=true|false]] [--skip-bad-relationships[=true|false]]
[--skip-duplicate-nodes[=true|false]] [--strict[=true|false]] [--trim-strings[=true|false]]
[--additional-config=<file>] [--array-delimiter=<char>] [--bad-tolerance=<num>] [--delimiter=<char>]
[--high-parallel-io=on|off|auto] [--id-type=string|integer|actual] [--input-encoding=<character-set>]
--input-type=csv|parquet [--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
[--report-file=<path>] [--stage=all|prepare|build|merge] [--threads=<num>]
--nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>[:<label>]...=]
<files>...]... [--relationships=[<type>=]<files>...]... <database>
--nodes=[<label>[: <label>]...=]<files>... [--nodes=[<label>[:<label>]...=]<files>...]...
[--relationships=[<type>=]<files>...]... <database>
----

=== Description
Expand Down Expand Up @@ -640,6 +672,29 @@ If the database into which you import does not exist prior to importing, you mus

=== Options

[role=label--beta]
.Parquet file support
[NOTE]
====
Starting with Neo4j 5.25, Neo4j provides support for the Parquet file format in a public beta version.
The additional parameter `--input-type [csv|parquet]` is introduced to explicitly tell the importer to use either CSV or Parquet.
Its value defaults to CSV if it is not defined.

Most of the parameters that can be used to configure the import are also valid for the Parquet format.
The following parameters are not supported (see <<incremental-import-options-table, `neo4j-admin database import incremental` options>> table for more details):

- `--auto-skip-subsequent-headers`
- `--delimiter`
- `--ignore-extra-columns`
- `--input-encoding`
- `--multiline-fields`
- `--quote`
- `--trim-strings`

The xref:tools/neo4j-admin/neo4j-admin-import.adoc#import-tool-examples[examples] for CSV can also be used with Parquet.
====

[[incremental-import-options-table]]
.`neo4j-admin database import incremental` options
[options="header", cols="5m,10a,2m"]
|===
Expand Down Expand Up @@ -729,6 +784,11 @@ Possible values are:
|Character set that input data is encoded in.
|UTF-8

|--input-type=csv\|parquet
label:beta[]
|File type to import from. Can be csv or parquet. Defaults to csv.
|csv

|--legacy-style-quoting[=true\|false]
|Whether or not a backslash-escaped quote e.g. \" is interpreted as an inner quote.
|false
Expand Down