Skip to content

[ER] High-Availability and Federated Proxy Mode for a Distributed Ecosystem #1439

@fred-maussion

Description

@fred-maussion

User Story

As a Platform Engineer/SRE,
I want to configure the package registry to query multiple, independent backend registries in parallel and merge their catalogs,
So that I can support a rich, distributed ecosystem for both our internal teams and our largest customers. This enables three key scenarios:

  • High Availability: Create a resilient service that can withstand the failure of a single backend.
  • Organizational Federation: Allow different business units or "Centers of Excellence" to manage their own custom packages in separate registries while providing a unified view.
  • Community Integration: Enable our customers to easily and securely point to community-led package registries, aggregating community packages alongside their internal and official Elastic packages through a single, managed endpoint.

Acceptance Criteria

  • The service can be configured with a list of backend URLs, including both internal and external endpoints.
  • Each backend URL can have an optional priority weighting for future conflict resolution.
  • For /search requests, the service queries all configured backends in parallel.
  • The service correctly aggregates and de-duplicates the package catalogs from all sources (e.g., showing community, custom, and official packages in a single list).
  • If one backend fails or times out, the service logs the error but still successfully returns data from the other healthy backends.
  • The timeouts and retry attempts for these outbound calls are configurable via environment variables.
  • All logs and APM traces generated during these parallel operations are correctly correlated.

High-Level Technical Proposal

This enhancement will be implemented by refactoring the proxymode module to support a concurrent, multi-backend architecture.

Architecture Diagram

This diagram shows the logical flow of a request from the client through the proxy to the various backends

graph TD
    subgraph Client Applications
        A[Kibana / Integrations]
    end

    subgraph Elastic Package Registry
        C["/search Endpoint"] --> D[Proxy Logic: Merge & Resolve]
    end

    subgraph Backend Registries
        direction TB
        subgraph Customer Centers of Excellence
            C1(EPR CoE A)
            C2(EPR CoE B)
        end
        subgraph Elastic Internal / Dev
            E1(Official EPR)
            E2(Dev/Preview EPR)
        end
        subgraph Community
            Com(Community-Led EPR)
        end
    end

    LB((Load Balancer - Optional))

    A --> LB
    LB --> C
    
    D --> C1
    D --> C2
    D --> E1
    D --> E2
    D --> Com

    style LB fill:#ccf,stroke:#333,stroke-width:2px
    style C fill:#f9f,stroke:#333,stroke-width:2px
    style D fill:#ccf,stroke:#333,stroke-width:2px,color:#333
    style C1 fill:#dfd,stroke:#333,stroke-width:2px
    style C2 fill:#dfd,stroke:#333,stroke-width:2px
    style E1 fill:#dde,stroke:#333,stroke-width:2px
    style E2 fill:#dde,stroke:#333,stroke-width:2px
    style Com fill:#eed,stroke:#333,stroke-width:2px
Loading

Configuration

The service configuration will be updated to accept a list of backend URLs instead of a single string. Each URL in the list can be appended with an optional priority weighting (e.g., url;priority). A new internal backend struct will be created to hold the parsed URL and its priority, making the configuration easy to manage within the application.

Parallel Execution (Fan-out/Fan-in)

Upon receiving a request, the proxymode will execute a "fan-out, fan-in" pattern:

  • Fan-out: It will iterate through the list of configured backends and launch a separate goroutine for each one to fetch data in parallel.
  • Fan-in: It will use a channel and a sync.WaitGroup to collect the responses from all goroutines. This ensures that the application waits for all backends to respond (or time out) before proceeding.

Result Merging & Conflict Resolution

Once all backend responses are collected, a new merging layer will process the data. It will use a map to efficiently de-duplicate the results from different sources. For packages with the same name, it will apply the defined conflict resolution logic (latest version wins, then highest priority) to produce a single, consolidated, and authoritative list.

Mermaid Diagram: Request Flow

This diagram illustrates the parallel request flow for a /search call (same applies for /categories & /packages)

sequenceDiagram
    participant Client
    participant Package Registry
    participant Backend 1
    participant Backend 2
    participant Backend N

    Client->>+Package Registry: GET /search
    activate Package Registry

    par
        Package Registry->>+Backend 1: GET /search
    and
        Package Registry->>+Backend 2: GET /search
    and
        Package Registry->>+Backend N: GET /search
    end

    Backend 1-->>-Package Registry: packages_list_1
    Backend 2-->>-Package Registry: packages_list_2
    Backend N-->>-Package Registry: packages_list_N

    Package Registry->>Package Registry: Merge & Resolve Conflicts
    deactivate Package Registry

    Package Registry-->>-Client: Consolidated Package List
Loading

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions