You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add neo4j-admin-import section and parameter details for Parquet. (#1858)
The Parquet file support for neo4j admin import will come out in on of
the next minor versions as a preview feature.
Depending on the feedback we get from customers and users, there will be
definitely coming more (also to the docs).
This is a quite defensive change to avoid promising too much but also
pointing out that this feature exists at all ;)
Because the feature itself is not merged yet, I added the DO NOT MERGE
label.
Please let us get this into a shape where we can just merge it after the
feature went into the product, thanks.
This supersedes #1850
---------
Co-authored-by: Gerrit Meier <[email protected]>
Co-authored-by: Reneta Popova <[email protected]>
@@ -124,6 +128,12 @@ For more information, please contact Neo4j Professional Services.
124
128
125
129
=== Options
126
130
131
+
Starting from Neo4j 5.26, the importer also supports the Parquet file format.
132
+
An additional parameter `--input-type=csv|parquet` has been introduced to explicitly specify whether to use CSV or Parquet for the importer.
133
+
If not defined, the default value will be CSV.
134
+
The xref:tools/neo4j-admin/neo4j-admin-import.adoc#import-tool-examples[examples] for CSV can also be used with Parquet.
135
+
136
+
[[full-import-options-table]]
127
137
.`neo4j-admin database import full` options
128
138
[options="header", cols="5m,10a,2m"]
129
139
|===
@@ -150,15 +160,15 @@ For horizontal tabulation (HT), use `\t` or the Unicode character ID `\9`.
150
160
Unicode character ID can be used if prepended by `\`.
151
161
|;
152
162
153
-
| --auto-skip-subsequent-headers[=true\|false]
163
+
| --auto-skip-subsequent-headers[=true\|false]footnote:ingnoredByParquet1[Ignored by Parquet import.]
154
164
|Automatically skip accidental header lines in subsequent files in file groups with more than one file.
155
165
|false
156
166
157
167
|--bad-tolerance=<num>
158
168
|Number of bad entries before the import is aborted. The import process is optimized for error-free data. Therefore, cleaning the data before importing it is highly recommended. If you encounter any bad entries during the import process, you can set the number of bad entries to a specific value that suits your needs. However, setting a high value may affect the performance of the tool.
159
169
|1000
160
170
161
-
|--delimiter=<char>
171
+
|--delimiter=<char>footnote:ingnoredByParquet1[]
162
172
|Delimiter character between values in CSV data. Also accepts `TAB` and e.g. `U+20AC` for specifying a character using Unicode.
163
173
164
174
====
@@ -207,14 +217,18 @@ Possible values are:
207
217
|Whether or not empty string fields, i.e. "" from input source are ignored, i.e. treated as null.
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can also be specified using regular expressions.
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
235
249
|null
236
250
@@ -255,7 +269,7 @@ For an example, see <<import-tool-multiple-input-files-regex-example>>.
255
269
|Delete any existing database files prior to the import.
256
270
|false
257
271
258
-
|--quote=<char>
272
+
|--quote=<char>footnote:ingnoredByParquet1[]
259
273
|Character to treat as quotation character for values in CSV data.
260
274
261
275
Quotes can be escaped as per link:{rfc-4180}[RFC 4180] by doubling them, for example `""` would be interpreted as a literal `"`.
@@ -330,7 +344,7 @@ If enabled all those relationships will be found but at the cost of lower perfor
330
344
performance, this value should not be greater than the number of available processors.
|Whether or not strings should be trimmed for whitespaces.
335
349
|false
336
350
@@ -339,7 +353,6 @@ performance, this value should not be greater than the number of available proce
339
353
|
340
354
|===
341
355
342
-
343
356
[NOTE]
344
357
.Heap size for the import
345
358
====
@@ -435,7 +448,7 @@ bin/neo4j-admin database import full --nodes import/movies_header.csv,import/mov
435
448
[[indexes-constraints-import]]
436
449
==== Provide indexes and constraints during import
437
450
438
-
Starting with Neo4j 5.24, you can use the `--schema` option that allows Cypher commands to be provided to create indexes/constraints during the initial import process.
451
+
Starting from Neo4j 5.24, you can use the `--schema` option that allows Cypher commands to be provided to create indexes/constraints during the initial import process.
439
452
It currently only works for the block format and full import.
440
453
441
454
You should have a Cypher script containing only `CREATE INDEX|CONSTRAINT` commands to be parsed and executed.
@@ -578,7 +591,9 @@ It is highly recommended to back up your database before running the incremental
578
591
[[import-tool-incremental-syntax]]
579
592
=== Syntax
580
593
581
-
[source, shell, role=noplay]
594
+
The syntax for importing a set of CSV files incrementally is:
@@ -671,15 +687,15 @@ For horizontal tabulation (HT), use `\t` or the Unicode character ID `\9`.
671
687
Unicode character ID can be used if prepended by `\`.
672
688
|;
673
689
674
-
| --auto-skip-subsequent-headers[=true\|false]
690
+
| --auto-skip-subsequent-headers[=true\|false]footnote:ingnoredByParquet2[Ignored by Parquet import.]
675
691
|Automatically skip accidental header lines in subsequent files in file groups with more than one file.
676
692
|false
677
693
678
694
|--bad-tolerance=<num>
679
695
|Number of bad entries before the import is aborted. The import process is optimized for error-free data. Therefore, cleaning the data before importing it is highly recommended. If you encounter any bad entries during the import process, you can set the number of bad entries to a specific value that suits your needs. However, setting a high value may affect the performance of the tool.
680
696
|1000
681
697
682
-
|--delimiter=<char>
698
+
|--delimiter=<char>footnote:ingnoredByParquet2[]
683
699
|Delimiter character between values in CSV data. Also accepts `TAB` and e.g. `U+20AC` for specifying a character using Unicode.
684
700
685
701
====
@@ -726,14 +742,18 @@ Possible values are:
726
742
|Whether or not empty string fields, i.e. "" from input source are ignored, i.e. treated as null.
|label:changed[Changed in 5.26] In v1, whether or not fields from an input source can span multiple lines, i.e. contain newline characters. Setting `--multiline-fields=true` can severely degrade the performance of the importer. Therefore, use it with care, especially with large imports. In v2, this option will specify the list of files that contain multiline fields. Files can also be specified using regular expressions.
|label:new[Introduced in 5.26] Controls the parsing of input source that can span multiple lines, i.e. contain newline characters. When set to v1, the value for `--multiline-fields` can only be true or false. When set to v2, the value for `--multiline-fields` should be the list of files that contain multiline fields.
754
774
|null
755
775
@@ -770,7 +790,7 @@ For an example, see <<import-tool-multiple-input-files-regex-example>>.
770
790
|When `true`, non-array property values are converted to their equivalent Cypher types. For example, all integer values will be converted to 64-bit long integers.
771
791
| true
772
792
773
-
|--quote=<char>
793
+
|--quote=<char>footnote:ingnoredByParquet2[]
774
794
|Character to treat as quotation character for values in CSV data.
775
795
776
796
Quotes can be escaped as per link:{rfc-4180}[RFC 4180] by doubling them, for example `""` would be interpreted as a literal `"`.
@@ -812,7 +832,7 @@ If you need to debug the import, it might be useful to collect the stack trace.
812
832
This is done by using the `--verbose` option.
813
833
|import.report
814
834
815
-
|--schema=<path>footnote:[The `--schema` option is available in this version but not yet supported. It will be functional in a future release.]
835
+
|--schema=<path>footnote:[The `--schema` option is available in this version but not yet supported. It will be functional in a future release.]
816
836
|label:new[Introduced in 5.24] Path to the file containing the Cypher commands for creating indexes and constraints during data import.
817
837
|
818
838
@@ -854,7 +874,7 @@ If enabled all those relationships will be found but at the cost of lower perfor
854
874
performance, this value should not be greater than the number of available processors.
0 commit comments