Skip to content

Commit 2ee086c

Browse files
fbivillemeistermeierrenetapopova
authored
Add neo4j-admin-import section and parameter details for Parquet. (#1858)
The Parquet file support for neo4j admin import will come out in on of the next minor versions as a preview feature. Depending on the feedback we get from customers and users, there will be definitely coming more (also to the docs). This is a quite defensive change to avoid promising too much but also pointing out that this feature exists at all ;) Because the feature itself is not merged yet, I added the DO NOT MERGE label. Please let us get this into a shape where we can just merge it after the feature went into the product, thanks. This supersedes #1850 --------- Co-authored-by: Gerrit Meier <[email protected]> Co-authored-by: Reneta Popova <[email protected]>
1 parent da2c3aa commit 2ee086c

File tree

1 file changed

+41
-21
lines changed

1 file changed

+41
-21
lines changed

modules/ROOT/pages/tools/neo4j-admin/neo4j-admin-import.adoc

Lines changed: 41 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,10 @@
44

55
:rfc-4180: https://tools.ietf.org/html/rfc4180
66

7-
`neo4j-admin database import` writes CSV data into Neo4j's native file format as fast as possible. You should use this tool when:
7+
`neo4j-admin database import` writes CSV data into Neo4j's native file format as fast as possible. +
8+
Starting with version 5.26, Neo4j also provides support for the Parquet file format.
9+
10+
You should use this tool when:
811

912
* Import performance is important because you have a large amount of data (millions/billions of entities).
1013
* The database can be taken offline and you have direct access to one of the servers hosting your Neo4j DBMS.
@@ -78,6 +81,7 @@ See <<indexes-constraints-import, Provide indexes and constraints during import>
7881

7982
The syntax for importing a set of CSV files is:
8083

84+
[source, syntax, role="nocopy"]
8185
----
8286
neo4j-admin database import full [-h] [--expand-commands] [--verbose] [--auto-skip-subsequent-headers[=true|false]]
8387
[--ignore-empty-strings[=true|false]] [--ignore-extra-columns[=true|false]]
@@ -124,6 +128,12 @@ For more information, please contact Neo4j Professional Services.
124128

125129
=== Options
126130

131+
Starting from Neo4j 5.26, the importer also supports the Parquet file format.
132+
An additional parameter `--input-type=csv|parquet` has been introduced to explicitly specify whether to use CSV or Parquet for the importer.
133+
If not defined, the default value will be CSV.
134+
The xref:tools/neo4j-admin/neo4j-admin-import.adoc#import-tool-examples[examples] for CSV can also be used with Parquet.
135+
136+
[[full-import-options-table]]
127137
.`neo4j-admin database import full` options
128138
[options="header", cols="5m,10a,2m"]
129139
|===
@@ -150,15 +160,15 @@ For horizontal tabulation (HT), use `\t` or the Unicode character ID `\9`.
150160
Unicode character ID can be used if prepended by `\`.
151161
|;
152162

153-
| --auto-skip-subsequent-headers[=true\|false]
163+
| --auto-skip-subsequent-headers[=true\|false]footnote:ingnoredByParquet1[Ignored by Parquet import.]
154164
|Automatically skip accidental header lines in subsequent files in file groups with more than one file.
155165
|false
156166

157167
|--bad-tolerance=<num>
158168
|Number of bad entries before the import is aborted. The import process is optimized for error-free data. Therefore, cleaning the data before importing it is highly recommended. If you encounter any bad entries during the import process, you can set the number of bad entries to a specific value that suits your needs. However, setting a high value may affect the performance of the tool.
159169
|1000
160170

161-
|--delimiter=<char>
171+
|--delimiter=<char>footnote:ingnoredByParquet1[]
162172
|Delimiter character between values in CSV data. Also accepts `TAB` and e.g. `U+20AC` for specifying a character using Unicode.
163173

164174
====
@@ -207,14 +217,18 @@ Possible values are:
207217
|Whether or not empty string fields, i.e. "" from input source are ignored, i.e. treated as null.
208218
|false
209219

210-
|--ignore-extra-columns[=true\|false]
220+
|--ignore-extra-columns[=true\|false]footnote:ingnoredByParquet1[]
211221
|If unspecified columns should be ignored during the import.
212222
|false
213223

214-
|--input-encoding=<character-set>
224+
|--input-encoding=<character-set>footnote:ingnoredByParquet1[]
215225
|Character set that input data is encoded in.
216226
|UTF-8
217227

228+
|--input-type=csv\|parquet
229+
|label:new[Introduced in 5.26] File type to import from. Can be csv or parquet. Defaults to csv.
230+
|csv
231+
218232
|--legacy-style-quoting[=true\|false]
219233
|Whether or not a backslash-escaped quote e.g. \" is interpreted as an inner quote.
220234
|false
@@ -226,11 +240,11 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
226240
It can also be specified as a percentage of the available memory, for example `70%`.
227241
|90%
228242

229-
|--multiline-fields=true\|false\|<path>[,<path>]
243+
|--multiline-fields=true\|false\|<path>[,<path>]footnote:ingnoredByParquet1[]
230244
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can also be specified using regular expressions.
231245
|null
232246

233-
|--multiline-fields-format=v1\|v2
247+
|--multiline-fields-format=v1\|v2footnote:ingnoredByParquet1[]
234248
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
235249
|null
236250

@@ -255,7 +269,7 @@ For an example, see <<import-tool-multiple-input-files-regex-example>>.
255269
|Delete any existing database files prior to the import.
256270
|false
257271

258-
|--quote=<char>
272+
|--quote=<char>footnote:ingnoredByParquet1[]
259273
|Character to treat as quotation character for values in CSV data.
260274

261275
Quotes can be escaped as per link:{rfc-4180}[RFC 4180] by doubling them, for example `""` would be interpreted as a literal `"`.
@@ -330,7 +344,7 @@ If enabled all those relationships will be found but at the cost of lower perfor
330344
performance, this value should not be greater than the number of available processors.
331345
|20
332346

333-
|--trim-strings[=true\|false]
347+
|--trim-strings[=true\|false]footnote:ingnoredByParquet1[]
334348
|Whether or not strings should be trimmed for whitespaces.
335349
|false
336350

@@ -339,7 +353,6 @@ performance, this value should not be greater than the number of available proce
339353
|
340354
|===
341355

342-
343356
[NOTE]
344357
.Heap size for the import
345358
====
@@ -435,7 +448,7 @@ bin/neo4j-admin database import full --nodes import/movies_header.csv,import/mov
435448
[[indexes-constraints-import]]
436449
==== Provide indexes and constraints during import
437450

438-
Starting with Neo4j 5.24, you can use the `--schema` option that allows Cypher commands to be provided to create indexes/constraints during the initial import process.
451+
Starting from Neo4j 5.24, you can use the `--schema` option that allows Cypher commands to be provided to create indexes/constraints during the initial import process.
439452
It currently only works for the block format and full import.
440453

441454
You should have a Cypher script containing only `CREATE INDEX|CONSTRAINT` commands to be parsed and executed.
@@ -578,7 +591,9 @@ It is highly recommended to back up your database before running the incremental
578591
[[import-tool-incremental-syntax]]
579592
=== Syntax
580593

581-
[source, shell, role=noplay]
594+
The syntax for importing a set of CSV files incrementally is:
595+
596+
[source, syntax, role="nocopy"]
582597
----
583598
neo4j-admin database import incremental [-h] [--expand-commands] --force [--verbose] [--auto-skip-subsequent-headers
584599
[=true|false]] [--ignore-empty-strings[=true|false]] [--ignore-extra-columns
@@ -645,6 +660,7 @@ If the database into which you import does not exist prior to importing, you mus
645660

646661
=== Options
647662

663+
[[incremental-import-options-table]]
648664
.`neo4j-admin database import incremental` options
649665
[options="header", cols="5m,10a,2m"]
650666
|===
@@ -671,15 +687,15 @@ For horizontal tabulation (HT), use `\t` or the Unicode character ID `\9`.
671687
Unicode character ID can be used if prepended by `\`.
672688
|;
673689

674-
| --auto-skip-subsequent-headers[=true\|false]
690+
| --auto-skip-subsequent-headers[=true\|false]footnote:ingnoredByParquet2[Ignored by Parquet import.]
675691
|Automatically skip accidental header lines in subsequent files in file groups with more than one file.
676692
|false
677693

678694
|--bad-tolerance=<num>
679695
|Number of bad entries before the import is aborted. The import process is optimized for error-free data. Therefore, cleaning the data before importing it is highly recommended. If you encounter any bad entries during the import process, you can set the number of bad entries to a specific value that suits your needs. However, setting a high value may affect the performance of the tool.
680696
|1000
681697

682-
|--delimiter=<char>
698+
|--delimiter=<char>footnote:ingnoredByParquet2[]
683699
|Delimiter character between values in CSV data. Also accepts `TAB` and e.g. `U+20AC` for specifying a character using Unicode.
684700

685701
====
@@ -726,14 +742,18 @@ Possible values are:
726742
|Whether or not empty string fields, i.e. "" from input source are ignored, i.e. treated as null.
727743
|false
728744

729-
|--ignore-extra-columns[=true\|false]
745+
|--ignore-extra-columns[=true\|false]footnote:ingnoredByParquet2[]
730746
|If unspecified columns should be ignored during the import.
731747
|false
732748

733-
|--input-encoding=<character-set>
749+
|--input-encoding=<character-set>footnote:ingnoredByParquet2[]
734750
|Character set that input data is encoded in.
735751
|UTF-8
736752

753+
|--input-type=csv\|parquet
754+
|label:new[Introduced in 5.26]File type to import from. Can be csv or parquet. Defaults to csv.
755+
|csv
756+
737757
|--legacy-style-quoting[=true\|false]
738758
|Whether or not a backslash-escaped quote e.g. \" is interpreted as an inner quote.
739759
|false
@@ -745,11 +765,11 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
745765
It can also be specified as a percentage of the available memory, for example `70%`.
746766
|90%
747767

748-
|--multiline-fields=true\|false\|<path>[,<path>]
768+
|--multiline-fields=true\|false\|<path>[,<path>]footnote:ingnoredByParquet2[]
749769
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can also be specified using regular expressions.
750770
|null
751771

752-
|--multiline-fields-format=v1\|v2
772+
|--multiline-fields-format=v1\|v2footnote:ingnoredByParquet2[]
753773
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
754774
|null
755775

@@ -770,7 +790,7 @@ For an example, see <<import-tool-multiple-input-files-regex-example>>.
770790
|When `true`, non-array property values are converted to their equivalent Cypher types. For example, all integer values will be converted to 64-bit long integers.
771791
| true
772792

773-
|--quote=<char>
793+
|--quote=<char>footnote:ingnoredByParquet2[]
774794
|Character to treat as quotation character for values in CSV data.
775795

776796
Quotes can be escaped as per link:{rfc-4180}[RFC 4180] by doubling them, for example `""` would be interpreted as a literal `"`.
@@ -812,7 +832,7 @@ If you need to debug the import, it might be useful to collect the stack trace.
812832
This is done by using the `--verbose` option.
813833
|import.report
814834

815-
|--schema=<path> footnote:[The `--schema` option is available in this version but not yet supported. It will be functional in a future release.]
835+
|--schema=<path>footnote:[The `--schema` option is available in this version but not yet supported. It will be functional in a future release.]
816836
|label:new[Introduced in 5.24] Path to the file containing the Cypher commands for creating indexes and constraints during data import.
817837
|
818838

@@ -854,7 +874,7 @@ If enabled all those relationships will be found but at the cost of lower perfor
854874
performance, this value should not be greater than the number of available processors.
855875
|20
856876

857-
|--trim-strings[=true\|false]
877+
|--trim-strings[=true\|false]footnote:ingnoredByParquet2[]
858878
|Whether or not strings should be trimmed for whitespaces.
859879
|false
860880

0 commit comments

Comments
 (0)