Skip to content

Commit 9791939

Browse files
meistermeierrenetapopova
authored andcommitted
Add neo4j-admin-import section and parameter details for Parquet.
1 parent 9ad1096 commit 9791939

File tree

1 file changed

+34
-14
lines changed

1 file changed

+34
-14
lines changed

modules/ROOT/pages/tools/neo4j-admin/neo4j-admin-import.adoc

Lines changed: 34 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,17 @@ Change Data Capture does **not** capture any data changes resulting from the use
2525
See link:{neo4j-docs-base-uri}/cdc/current/get-started/self-managed/#non-tx-log-changes/[Change Data Capture -> Key considerations] for more information.
2626
====
2727

28+
[role=label--beta]
29+
== Parquet file support
30+
Starting with Neo4j 5.24, Neo4j provides support for the Parquet file format in a public beta version.
31+
The additional parameter `--input-type [csv|parquet]` was introduced to explicitly tell the importer to use either CSV or Parquet.
32+
Its value defaults to CSV if it is not defined.
33+
34+
Most of the parameters that can be used to configure the import are also valid for the Parquet format.
35+
There are indicators in the parameter overview to point out which parameters are supported.
36+
37+
The xref:tools/neo4j-admin/neo4j-admin-import.adoc#import-tool-examples[examples] for CSV can also be used with Parquet.
38+
2839
== Overview
2940

3041
The `neo4j-admin database import` command has two modes both used for initial data import:
@@ -149,15 +160,15 @@ For horizontal tabulation (HT), use `\t` or the Unicode character ID `\9`.
149160
Unicode character ID can be used if prepended by `\`.
150161
|;
151162

152-
| --auto-skip-subsequent-headers[=true\|false]
163+
| --auto-skip-subsequent-headers[=true\|false]^1^
153164
|Automatically skip accidental header lines in subsequent files in file groups with more than one file.
154165
|false
155166

156167
|--bad-tolerance=<num>
157168
|Number of bad entries before the import is aborted. The import process is optimized for error-free data. Therefore, cleaning the data before importing it is highly recommended. If you encounter any bad entries during the import process, you can set the number of bad entries to a specific value that suits your needs. However, setting a high value may affect the performance of the tool.
158169
|1000
159170

160-
|--delimiter=<char>
171+
|--delimiter=<char>^1^
161172
|Delimiter character between values in CSV data. Also accepts `TAB` and e.g. `U+20AC` for specifying a character using Unicode.
162173

163174
====
@@ -206,14 +217,19 @@ Possible values are:
206217
|Whether or not empty string fields, i.e. "" from input source are ignored, i.e. treated as null.
207218
|false
208219

209-
|--ignore-extra-columns[=true\|false]
220+
|--ignore-extra-columns[=true\|false]^1^
210221
|If unspecified columns should be ignored during the import.
211222
|false
212223

213-
|--input-encoding=<character-set>
224+
|--input-encoding=<character-set>^1^
214225
|Character set that input data is encoded in.
215226
|UTF-8
216227

228+
|--input-type[=csv\|parquet]
229+
label:beta[]
230+
|File type to import.
231+
|csv
232+
217233
|--legacy-style-quoting[=true\|false]
218234
|Whether or not a backslash-escaped quote e.g. \" is interpreted as an inner quote.
219235
|false
@@ -225,7 +241,7 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
225241
It can also be specified as a percentage of the available memory, for example `70%`.
226242
|90%
227243

228-
|--multiline-fields[=true\|false]
244+
|--multiline-fields[=true\|false]^1^
229245
|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.
230246

231247
Setting `--multiline-fields=true` can severely degrade the performance of the importer.
@@ -253,7 +269,7 @@ For an example, see <<import-tool-multiple-input-files-regex-example>>.
253269
|Delete any existing database files prior to the import.
254270
|false
255271

256-
|--quote=<char>
272+
|--quote=<char>^1^
257273
|Character to treat as quotation character for values in CSV data.
258274

259275
Quotes can be escaped as per link:{rfc-4180}[RFC 4180] by doubling them, for example `""` would be interpreted as a literal `"`.
@@ -328,7 +344,7 @@ If enabled all those relationships will be found but at the cost of lower perfor
328344
performance, this value should not be greater than the number of available processors.
329345
|20
330346

331-
|--trim-strings[=true\|false]
347+
|--trim-strings[=true\|false]^1^
332348
|Whether or not strings should be trimmed for whitespaces.
333349
|false
334350

@@ -337,6 +353,8 @@ performance, this value should not be greater than the number of available proce
337353
|
338354
|===
339355

356+
^1^ __Ignored by Parquet import label:beta[].__ +
357+
340358
[NOTE]
341359
.Heap size for the import
342360
====
@@ -666,15 +684,15 @@ For horizontal tabulation (HT), use `\t` or the Unicode character ID `\9`.
666684
Unicode character ID can be used if prepended by `\`.
667685
|;
668686

669-
| --auto-skip-subsequent-headers[=true\|false]
687+
| --auto-skip-subsequent-headers[=true\|false]^1^
670688
|Automatically skip accidental header lines in subsequent files in file groups with more than one file.
671689
|false
672690

673691
|--bad-tolerance=<num>
674692
|Number of bad entries before the import is aborted. The import process is optimized for error-free data. Therefore, cleaning the data before importing it is highly recommended. If you encounter any bad entries during the import process, you can set the number of bad entries to a specific value that suits your needs. However, setting a high value may affect the performance of the tool.
675693
|1000
676694

677-
|--delimiter=<char>
695+
|--delimiter=<char>^1^
678696
|Delimiter character between values in CSV data. Also accepts `TAB` and e.g. `U+20AC` for specifying a character using Unicode.
679697

680698
====
@@ -721,11 +739,11 @@ Possible values are:
721739
|Whether or not empty string fields, i.e. "" from input source are ignored, i.e. treated as null.
722740
|false
723741

724-
|--ignore-extra-columns[=true\|false]
742+
|--ignore-extra-columns[=true\|false]^1^
725743
|If unspecified columns should be ignored during the import.
726744
|false
727745

728-
|--input-encoding=<character-set>
746+
|--input-encoding=<character-set>^1^
729747
|Character set that input data is encoded in.
730748
|UTF-8
731749

@@ -740,7 +758,7 @@ Values can be plain numbers, such as `10000000`, or `20G` for 20 gigabytes.
740758
It can also be specified as a percentage of the available memory, for example `70%`.
741759
|90%
742760

743-
|--multiline-fields[=true\|false]
761+
|--multiline-fields[=true\|false]^1^
744762
|Whether or not fields from an input source can span multiple lines, i.e. contain newline characters.
745763

746764
Setting `--multiline-fields=true` can severely degrade the performance of the importer.
@@ -764,7 +782,7 @@ For an example, see <<import-tool-multiple-input-files-regex-example>>.
764782
|When `true`, non-array property values are converted to their equivalent Cypher types. For example, all integer values will be converted to 64-bit long integers.
765783
| true
766784

767-
|--quote=<char>
785+
|--quote=<char>^1^
768786
|Character to treat as quotation character for values in CSV data.
769787

770788
Quotes can be escaped as per link:{rfc-4180}[RFC 4180] by doubling them, for example `""` would be interpreted as a literal `"`.
@@ -844,7 +862,7 @@ If enabled all those relationships will be found but at the cost of lower perfor
844862
performance, this value should not be greater than the number of available processors.
845863
|20
846864

847-
|--trim-strings[=true\|false]
865+
|--trim-strings[=true\|false]^1^
848866
|Whether or not strings should be trimmed for whitespaces.
849867
|false
850868

@@ -853,6 +871,8 @@ performance, this value should not be greater than the number of available proce
853871
|
854872
|===
855873

874+
^1^ __Ignored by Parquet import label:beta[].__ +
875+
856876
[NOTE]
857877
.Using both a multi-value option and a positional parameter
858878
====

0 commit comments

Comments
 (0)