Skip to content

Conversation

@fbiville
Copy link
Contributor

The Parquet file support for neo4j admin import will come out in on of the next minor versions as a preview feature.
Depending on the feedback we get from customers and users, there will be definitely coming more (also to the docs).
This is a quite defensive change to avoid promising too much but also pointing out that this feature exists at all ;)

Because the feature itself is not merged yet, I added the DO NOT MERGE label.
Please let us get this into a shape where we can just merge it after the feature went into the product, thanks.

This supersedes #1850

@fbiville fbiville changed the title Add neo4j-admin-import section and parameter details for Parquet. #1850 Add neo4j-admin-import section and parameter details for Parquet. Oct 11, 2024
@stefano-ottolenghi
Copy link
Contributor

The schema option was not present in both tables, I've pushed that change.

There's a mismatch in description for schema, you'll have to decide which phrasing you want to keep and whether to change docs or product.

As far as I can see, the mismatch in usage is due to the --input-type option not showing up in product help, but I believe that is due to the fact that we switched to testing against builds of neo4j that are updated only weekly due to how the releaseops team changed builds, but this is pending verification.

@renetapopova
Copy link
Collaborator

The schema option was not present in both tables, I've pushed that change.

There's a mismatch in description for schema, you'll have to decide which phrasing you want to keep and whether to change docs or product.

As far as I can see, the mismatch in usage is due to the --input-type option not showing up in product help, but I believe that is due to the fact that we switched to testing against builds of neo4j that are updated only weekly due to how the releaseops team changed builds, but this is pending verification.

The --schema option is not present because it is not available in incremental import. This is a known issue. And the --schema option description is aligned with the code but the test shows that it is not.

@stefano-ottolenghi
Copy link
Contributor

As I see it, --schema is defined in the Base class and does in fact show up in incremental -h too:

stefano@stefano-carbon:~$ docker container exec  -it neo4j5-dev neo4j-admin database import incremental -h


USAGE

neo4j-admin database import incremental [-h] [--expand-commands] --force [--verbose] [--auto-skip-subsequent-headers
                                        [=true|false]] [--ignore-empty-strings[=true|false]] [--ignore-extra-columns
                                        [=true|false]] [--legacy-style-quoting[=true|false]] [--multiline-fields
                                        [=true|false]] [--normalize-types[=true|false]] [--skip-bad-entries-logging
                                        [=true|false]] [--skip-bad-relationships[=true|false]] [--skip-duplicate-nodes
                                        [=true|false]] [--strict[=true|false]] [--trim-strings[=true|false]]
                                        [--additional-config=<file>] [--array-delimiter=<char>] [--bad-tolerance=<num>]
                                        [--delimiter=<char>] [--high-parallel-io=on|off|auto]
                                        [--id-type=string|integer|actual] [--input-encoding=<character-set>]
                                        [--max-off-heap-memory=<size>] [--quote=<char>] [--read-buffer-size=<size>]
                                        [--report-file=<path>] [--schema=<path>] [--stage=all|prepare|build|merge]
                                        [--threads=<num>] --nodes=[<label>[:<label>]...=]<files>... [--nodes=[<label>[:
                                        <label>]...=]<files>...]... [--relationships=[<type>=]<files>...]... <database>

DESCRIPTION

Incremental import into an existing database.

OPTIONS

     ...
      --schema=<path>        File in which to store the Cypher schema commands to run as part of of the data import.
     ...

If it shouldn't be there, is should be taken away from product too.
I have added the option to the table, but not updated the syntax, which also lacks --schema and is failing the test.

I'm now trying to figure out why it reports missing --input-type=csv|parquet from docs.

@stefano-ottolenghi
Copy link
Contributor

The reason why test on --input-type fails is that its value is shown as optional in docs (because of square brackets [=csv|parquet]) while product has it compulsory. It is flagged as required, but also has a defaultValue, which is contradictory.
https://github.com/neo-technology/neo4j/blob/1538dfcffae9c262b0a82decc3ffb4c16efa4db6/public/community/import-tool/src/main/java/org/neo4j/importer/ImportCommand.java#L421
This looks like a bug @fbiville @meistermeier

Copy link
Collaborator

@renetapopova renetapopova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, but we need to fix the test as well - see Stefano's comment above.
Also, @fbiville, does this need to be merged/published in 5.25?

@neo-technology-commit-status-publisher
Copy link
Collaborator

neo-technology-commit-status-publisher commented Nov 13, 2024

Thanks for the documentation updates.

The preview documentation has now been torn down - reopening this PR will republish it.

@renetapopova renetapopova merged commit 2ee086c into neo4j:dev Nov 15, 2024
8 checks passed
@fbiville fbiville deleted the parquet-import branch November 22, 2024 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants