Skip to content

Conversation

@InduwaraSMPN
Copy link

Purpose

This PR addresses scalability and performance limitations when syncing large numbers of entities from OpenChoreo. It introduces a new Incremental Entity Provider to handle large datasets efficiently via cursor-based pagination and updates the entire plugin ecosystem to support the new paginated API endpoints.

Goals

  • Scalability: Enable the ingestion of thousands of components without hitting API timeouts or memory limits.
  • Resiliency: Implement a stateful ingestion engine that can resume from the last successful cursor in case of interruption or token expiration.
  • Performance: Optimize API usage by implementing burst/rest cycles and batched processing for component details.
  • UI Experience: specific improvements to the Scaffolder UI to handle large lists of traits via pagination.

Approach

1. New Incremental Backend Module (catalog-backend-module-openchoreo-incremental)

  • Created a new backend module dedicated to incremental ingestion.
  • Database: Added Knex migrations to create ingestions, ingestion_marks, and ingestion_mark_entities tables for persisting cursor state and tracking processed entities.
  • Ingestion Engine: Implemented OpenChoreoIncrementalIngestionEngine which manages the ingestion lifecycle (Burst -> Interstitial -> Rest) and handles backoff strategies for errors.
  • Cursor Traversal: Implemented OpenChoreoIncrementalEntityProvider to traverse resources in a specific order (Organizations -> Projects -> Components) using the new API continue tokens.
  • Management API: Added router endpoints (/incremental/...) to monitor health, trigger runs, and reset provider state.

2. API Client & Common Utilities

  • Updated openapi/openchoreo-api.yaml and generated types to support limit and continue query parameters on all list endpoints.
  • Added support for ResponseMetadata (resourceVersion, hasMore).
  • Implemented fetchAllResources utility in openchoreo-common to standardize pagination logic across legacy and new services.
  • Added handling for 410 Gone errors to automatically reset cursors if tokens expire.

3. Existing Backend Refactoring

  • Refactored OpenChoreoEntityProvider and various info services (EnvironmentInfoService, TraitInfoService, etc.) to use the new fetchAllResources utility, ensuring they work with the updated paginated API.
  • Updated CtdToTemplateConverter to improve tag inference, UI widget selection (e.g., using radio for booleans), and conditional CI/CD setup sections.

4. Frontend Updates

  • Updated TraitsFieldExtension.tsx to support "Load More" functionality for retrieving traits incrementally in the Scaffolder.

5. Configuration

  • Added new configuration schema openchoreo.incremental in app-config.yaml to control burst length, intervals, and chunk sizes.

User stories

  • As a Platform Engineer, I can configure Backstage to ingest entities from OpenChoreo incrementally, preventing "Out of Memory" errors during large syncs.
  • As an Administrator, I can monitor the status of incremental ingestion and trigger manual syncs via API.
  • As a User, I can browse a large list of Component Traits in the Scaffolder without waiting for the entire list to load upfront.

Release note

  • New Feature: Added @openchoreo/plugin-catalog-backend-module-openchoreo-incremental for scalable, cursor-based entity ingestion.
  • Enhancement: Updated all OpenChoreo backend services to support API pagination.
  • Enhancement: Improved Component Template generation logic (better tag inference and UI widgets).
  • Fix: Added handling for expired API continuation tokens (HTTP 410).

Documentation

  • Added README.md in plugins/catalog-backend-module-openchoreo-incremental detailing configuration and architecture.
  • Updated app-config.yaml examples with comments explaining how to enable the incremental provider.

Training

N/A

Certification

N/A

Marketing

N/A

Automation tests

  • Unit tests: Added comprehensive tests for the new incremental provider module:
    • OpenChoreoIncrementalEntityProvider.test.ts: Verifies cursor traversal, phase transitions (Org -> Project -> Component), and 410 error recovery.
    • OpenChoreoIncrementalIngestionDatabaseManager.test.ts: Verifies database persistence for cursors and marks.
    • CtdToTemplateConverter.test.ts: Updated tests to reflect changes in template generation (tags, CI setup).
  • Dev setup: Added dev/index.ts in the new module for local testing with a dummy provider.

Security checks

Samples

N/A

Related PRs

N/A

Migrations (if applicable)

  • Database Migrations: This PR includes Knex migrations for the catalog-backend-module-openchoreo-incremental plugin.
    • 20221116073152_init.js: Creates initial ingestion tables.
    • 20240110000001_add_performance_indexes.ts: Adds indexes for performance.
    • 20240110000003_expand_last_error_field.ts: Expands error logging column size.
  • Configuration: Users opting into incremental ingestion must update app-config.yaml to configure openchoreo.incremental and register the new module in packages/backend/src/index.ts.

Test environment

  • OS: Linux (Ubuntu)
  • Database: SQLite (Local)
  • Node: v20.19.5
  • Backstage: v1.43.3

Learning

  • Utilized the incremental-ingestion backend pattern (inspired by the official Backstage incremental provider) to handle large datasets efficiently.
  • Implemented "Cursor-based pagination" to ensure data consistency during long-running sync processes.

Introduces generic helpers and refactors API calls to use cursor-based pagination for all list endpoints, improving scalability and reliability for large data sets. Updates OpenAPI schema and client types to support metadata-driven pagination, replaces page/size parameters with limit/continue, and implements a default max page size for efficiency.

Enhances error handling for paginated requests and prevents infinite loops. Refactors several backend services and entity providers to leverage the new pagination utilities, ensuring consistent resource synchronization across organizations, projects, and components.

Motivated by the need for reliable handling of large deployments and to align with upstream OpenChoreo API changes.
Introduces a new incremental ingestion backend module supporting burst-based, cursor-driven processing for large-scale OpenChoreo deployments. Enables efficient, fault-tolerant, and memory-conscious catalog updates by fetching entities in resumable batches with database-persisted state and management APIs. Improves scalability, observability, and operational control over catalog ingestion.

Relates to large dataset handling and platform scalability needs.
Prevents migration failures by ensuring PostgreSQL CREATE INDEX CONCURRENTLY commands run outside transaction blocks, as required by the database. Improves migration reliability for performance optimizations.
Updates test cases to reflect revised tag logic, ensuring tags now include 'openchoreo', the component name, and workload type, rather than inferring tags from component name parts. Cleans up unused imports for clarity.
Aligns tests with changes to CI/CD configuration, including
renaming UI fields, updating workflow selection logic, and
switching boolean widget expectations. Improves cursor
expiration handling assertions and mocks for increased
clarity and accuracy.

Reflects recent logic changes to section generation and
parameter spreading, ensuring tests accurately validate
intended behaviors.
Refactors test and source files for improved readability,
including consistent formatting in function arguments and
object initializations. Enhances error handling by making
log messages for expired pagination tokens more descriptive,
aiding in debugging and operational clarity.
Introduces a new backend module and service factory to support
immediate delta mutations in the catalog, enabling real-time
entity ingestion for OpenChoreo without relying on legacy
scheduled providers. Updates configuration example and
dependencies to facilitate large-scale local testing and
integration with scaffolder actions.
Replaces hardcoded page limit with a shared constant to ensure consistency and easier maintenance. Reflects updated OpenAPI schema cap and avoids potential request errors from exceeding limits.
Aligns the default pagination limit with standard system values,
potentially improving consistency and compatibility with related systems.
Increases the maximum allowed value for the limit parameter from 500 to 512 in both the API specification and documentation. Aligns configuration with backend capabilities and clarifies usage for clients.
Adds mocked headers with content-length values to test HTTP responses
to better simulate real-world scenarios and enable more accurate
testing of code that depends on response headers.
Introduces generic helpers and refactors API calls to use cursor-based pagination for all list endpoints, improving scalability and reliability for large data sets. Updates OpenAPI schema and client types to support metadata-driven pagination, replaces page/size parameters with limit/continue, and implements a default max page size for efficiency.

Enhances error handling for paginated requests and prevents infinite loops. Refactors several backend services and entity providers to leverage the new pagination utilities, ensuring consistent resource synchronization across organizations, projects, and components.

Motivated by the need for reliable handling of large deployments and to align with upstream OpenChoreo API changes.
Introduces a new incremental ingestion backend module supporting burst-based, cursor-driven processing for large-scale OpenChoreo deployments. Enables efficient, fault-tolerant, and memory-conscious catalog updates by fetching entities in resumable batches with database-persisted state and management APIs. Improves scalability, observability, and operational control over catalog ingestion.

Relates to large dataset handling and platform scalability needs.
Prevents migration failures by ensuring PostgreSQL CREATE INDEX CONCURRENTLY commands run outside transaction blocks, as required by the database. Improves migration reliability for performance optimizations.
Updates test cases to reflect revised tag logic, ensuring tags now include 'openchoreo', the component name, and workload type, rather than inferring tags from component name parts. Cleans up unused imports for clarity.
Aligns tests with changes to CI/CD configuration, including
renaming UI fields, updating workflow selection logic, and
switching boolean widget expectations. Improves cursor
expiration handling assertions and mocks for increased
clarity and accuracy.

Reflects recent logic changes to section generation and
parameter spreading, ensuring tests accurately validate
intended behaviors.
Refactors test and source files for improved readability,
including consistent formatting in function arguments and
object initializations. Enhances error handling by making
log messages for expired pagination tokens more descriptive,
aiding in debugging and operational clarity.
Introduces a new backend module and service factory to support
immediate delta mutations in the catalog, enabling real-time
entity ingestion for OpenChoreo without relying on legacy
scheduled providers. Updates configuration example and
dependencies to facilitate large-scale local testing and
integration with scaffolder actions.
Replaces hardcoded page limit with a shared constant to ensure consistency and easier maintenance. Reflects updated OpenAPI schema cap and avoids potential request errors from exceeding limits.
Aligns the default pagination limit with standard system values,
potentially improving consistency and compatibility with related systems.
Increases the maximum allowed value for the limit parameter from 500 to 512 in both the API specification and documentation. Aligns configuration with backend capabilities and clarifies usage for clients.
Adds mocked headers with content-length values to test HTTP responses
to better simulate real-world scenarios and enable more accurate
testing of code that depends on response headers.
Cleans up duplicate and unreachable error handling logic to streamline
control flow and improve maintainability. Reduces noise by eliminating
repeated or obsolete try/catch and entity transformation code. Ensures
only necessary error checks remain, making the codebase easier to
comprehend and maintain.
Improves trait selection performance and scalability by introducing
cursor-based pagination to the trait-fetching endpoint and UI, allowing
users to load large trait lists efficiently without overloading the
frontend. Updates the backend to support pagination parameters and
metadata.

Enhances incremental ingestion configuration with options for concurrent
requests and batch delay, and adds logic to avoid duplicate processing
of organizations when cursors expire. Strengthens config validation and
documentation for large-scale operations.

Warns about potential memory issues in generic pagination utilities,
encouraging chunked processing for large datasets.
Prevents runaway memory usage and unresponsive behavior by adding
a configurable timeout and AbortSignal support to the pagination
utility. Replaces the fixed page count safeguard with time-based
limits, improving flexibility and reliability for large or
slow data sources.
Reflows long lines in pagination utils to improve readability. No functional changes.
… 2/2

- Remove outdated “STEP 2 of 3” comment block
- Keep Standard Catalog line active and add ‘catalog plugin’ comment
- Keep scaffolder-entity-model active as instructed
Add explanatory comments for incremental ingestion parameters across configuration files. Introduce defaultOwner and schedule options in production config, with optional incremental setup and backend code update note for large-scale deployments.
This commit updates the test suite for catalog backend modules with:
- Increased jest timeout from 60s to 120s for database/provider tests
- Added conditional CI skipping to prevent test timeouts
- Updated cursor expectations to match implementation (reordered properties, new sets)
- Modified tag splitting logic in template converter tests
- Changed component name picker to EntityNamePicker in UI tests
- Updated boolean widget expectations for enableBackup property

These changes ensure tests run reliably in CI environments while accurately reflecting current component behavior.
Replace config.d.ts with config.ts, moving detailed zod schema definitions and exports out of the type declaration file for better separation of concerns and clearer runtime imports.
@InduwaraSMPN InduwaraSMPN force-pushed the main branch 2 times, most recently from 4e7a898 to 3449126 Compare December 18, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant