Skip to content

Add support to perform incremental updates of packages when updating the indexers #1527

@mrodm

Description

@mrodm

Important

This feature request is currently blocked by an internal dependency. We are tracking the necessary work internally and will update this issue once progress is made or when the blocking item is resolved.

The goal of this issue is adding support to incremental updates by adding a new delta file that contains the latest changes introduced in the search index.

Currently, storage indexers read two JSON files (index and cursor) containing all the packages every time a package is added, delete or updated:

File Path Purpose
v2/metadata/<timestamp>/search-index-all.json Full package list (used for every update in package-registry)
v2/metadata/cursor.json Points to latest search index created

The new implementation will add a new delta file:

File Path New file Purpose
v2/metadata/<timestamp>/search-index-all.json No Full package list (for recovery/init)
v2/metadata/<timestamp>/search-index-delta.json Yes Delta changes (added/updated/deleted)
v2/metadata/cursor.json No Points to latest search index created

The delta will contain packages that have been added, deleted, or updated in that operation, or indicate a full synchronization (full_sync field) to replace all packages in the indexer.

Example of this delta file:

{
  "full_sync": true,
  "added": [
    { "name": "packageA", "version": "1.0.0", ... }
  ],
  "updated": [
    { "name": "packageB", "version": "2.1.0", ... }
  ],
  "deleted": [
    { "name": "packageC", "version": "1.0.0" }
  ]
}

The Package Registry must handle two scenarios:

  • The service is (re)started and does not have any package from the remote index.
    • It must load all the packages from the search index (current logic).
  • The service is already running with a set of packages loaded in memory.
    • It must read new delta files and perform only the required operations.

For the first scenario (initial service state as well as current approach):

  1. Read v2/metadata/cursor.json to get the latest timestamp (cursor).
  2. Read the search-index pointed by the cursor (search-index-all.json).
  3. Load all the packages from that index.
  4. Update its own cursor of last processed search index.

For the second scenario (incremental update):

  • Service already has a cursor pointing to a search index from previous scenario.
  • On each interval:
    1. Read v2/metadata/cursor.json to get the latest cursor updated.
      • If it is the same cursor, skip iteration and wait for the next interval execution.
    2. List all folders in v2/metadata/ with timestamps newer than the last processed.
    3. For each folder (in order):
      1. Read and apply search-index-delta.json to update local state (packages).
      2. Optionally, use search-index-all.json if a full sync is required.
    4. Update its cursor with the last processed search index.

Requirements

  • Package Registry service should be able to keep reading the full search index to replace all packages:
    • option to ensure consistency
    • required when the service starts from scratch or it is restarted.
  • When applying multiple delta files is important to apply them in the same order they were created (e.g. according to their timestamps/cursors).
  • This new feature should be behind a feature flag, so it keeps by default the current behavior (updating all packages).

Checklist

  • Package registry should be able to update the list of packages with the new packages.
  • Package registry should be able to update the list of packages removing the required ones.
  • Package registry should be able to read and apply all the delta files between the cursor stored and the one retrieved in the same order they were created.
  • Package registry should be able to do a full synchronization of packages.
  • Added feature flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Team:EcosystemLabel for the Packages Ecosystem team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions