Skip to content

Add a PUT /collections and modify the POST & PATCH /collection endpoints for collections update  #467

@FaheemBEG

Description

@FaheemBEG

To remain consistent with REST nomenclature:

  • POST = creation.
  • PUT = replacement.
  • PATCH = partial update.

This is why it is rigorous to keep 2 differents endpoints (PUT and PATCH).

POST /collections

This endpoint must allow the creation of a collection by uploading or not a parquet file in order to:

  • Add documents to the freshly created collection.

PUT /collections

This endpoint must allow existing collections full update by uploading a parquet file in order to:

  • Completely overwrite the existing collection with the new file (force update).

PATCH /collections

This endpoint must allow existing collections documents update by uploading a parquet file in order to:

  • Add new documents that are not present in the existing collection. => e.g. if a document_name doesn't exists in the existing collection but does in the uploaded file.
  • Update collection's documents if necessary :
    A document needs to be updated if:
    • For a same document, there is NOT the same number of chunks in the file AND in the existing collection side => update required for all chunks of the document (deletes and replaces the entire document).
    • For a same document, there is the same number of chunks in the parquet file AND in the existing collection side + BUT at least one chunk's hash is different between the file side and the collection side => update required for all chunks of the document (deletes and replaces the entire document).

This endpoint is based only on the document names, not the ID. Because the IDs are generated only in the API, and several documents entities can, for now, have the same name if they were uploaded independently for exemple.

Note:

If an user wants to update an existing collection by uploading several parquet files, there are certain conditions that must be met:

  • For each parquet file, a different request to the endpoint must be done.
  • Each parquet file must not exceed a size of XX mb (to define).
  • If a document is present in a Parquet file, all of its chunks must be included in that file.
    For example, a document with 10 chunks should not have 7 chunks in one Parquet file and 3 chunks in another.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions