-
Notifications
You must be signed in to change notification settings - Fork 79
Description
Important
This feature request is currently blocked by an internal dependency. We are tracking the necessary work internally and will update this issue once progress is made or when the blocking item is resolved.
The goal of this issue is adding support to incremental updates by adding a new delta file that contains the latest changes introduced in the search index.
Currently, storage indexers read two JSON files (index and cursor) containing all the packages every time a package is added, delete or updated:
| File Path | Purpose |
|---|---|
v2/metadata/<timestamp>/search-index-all.json |
Full package list (used for every update in package-registry) |
v2/metadata/cursor.json |
Points to latest search index created |
The new implementation will add a new delta file:
| File Path | New file | Purpose |
|---|---|---|
v2/metadata/<timestamp>/search-index-all.json |
No | Full package list (for recovery/init) |
v2/metadata/<timestamp>/search-index-delta.json |
Yes | Delta changes (added/updated/deleted) |
v2/metadata/cursor.json |
No | Points to latest search index created |
The delta will contain packages that have been added, deleted, or updated in that operation, or indicate a full synchronization (full_sync field) to replace all packages in the indexer.
Example of this delta file:
{
"full_sync": true,
"added": [
{ "name": "packageA", "version": "1.0.0", ... }
],
"updated": [
{ "name": "packageB", "version": "2.1.0", ... }
],
"deleted": [
{ "name": "packageC", "version": "1.0.0" }
]
}The Package Registry must handle two scenarios:
- The service is (re)started and does not have any package from the remote index.
- It must load all the packages from the search index (current logic).
- The service is already running with a set of packages loaded in memory.
- It must read new delta files and perform only the required operations.
For the first scenario (initial service state as well as current approach):
- Read
v2/metadata/cursor.jsonto get the latest timestamp (cursor). - Read the search-index pointed by the cursor (
search-index-all.json). - Load all the packages from that index.
- Update its own cursor of last processed search index.
For the second scenario (incremental update):
- Service already has a cursor pointing to a search index from previous scenario.
- On each interval:
- Read
v2/metadata/cursor.jsonto get the latest cursor updated.- If it is the same cursor, skip iteration and wait for the next interval execution.
- List all folders in
v2/metadata/with timestamps newer than the last processed. - For each folder (in order):
- Read and apply
search-index-delta.jsonto update local state (packages). - Optionally, use
search-index-all.jsonif a full sync is required.
- Read and apply
- Update its cursor with the last processed search index.
- Read
Requirements
- Package Registry service should be able to keep reading the full search index to replace all packages:
- option to ensure consistency
- required when the service starts from scratch or it is restarted.
- When applying multiple delta files is important to apply them in the same order they were created (e.g. according to their timestamps/cursors).
- This new feature should be behind a feature flag, so it keeps by default the current behavior (updating all packages).
Checklist
- Package registry should be able to update the list of packages with the new packages.
- Package registry should be able to update the list of packages removing the required ones.
- Package registry should be able to read and apply all the delta files between the cursor stored and the one retrieved in the same order they were created.
- Package registry should be able to do a full synchronization of packages.
- Added feature flag.