|
| 1 | +# CFP-39876: Scoped Export Mode for ClusterMesh |
| 2 | + |
| 3 | +**SIG:** SIG-NAME |
| 4 | + |
| 5 | +**Sharing:** Public |
| 6 | + |
| 7 | +**Begin Design Discussion:** 2025-06-02 |
| 8 | + |
| 9 | +**End Design Discussion:** |
| 10 | + |
| 11 | +**Cilium Release:** (TBD) |
| 12 | + |
| 13 | +**Authors: ** Krunal Jain <[email protected]>, Vamsi Kalapala < [email protected]> |
| 14 | + |
| 15 | +## Summary |
| 16 | + |
| 17 | +This CFP introduces selective distribution of endpoints and identities within Cilium Clustermesh by introducing a new “scoped-export” mode for the Clustermesh API server, restricting cross-cluster propagation only to resources that are fronted by global services. The modification targets improved scalability while acknowledging that direct inter-cluster endpoint access and network policy enforcement will be discontinued for resources not associated with global services. The addition of new Clustermesh APIserver export mode would ensure the changes are backwards compatible. |
| 18 | + |
| 19 | +## Motivation |
| 20 | + |
| 21 | +The existing Cilium Clustermesh implementation distributes all endpoints and identities across connected clusters, creating scaling constraints as resource volumes grow. This comprehensive propagation model becomes a limiting factor when managing extensive endpoint and identity collections. Performance profiling analysis demonstrates the severity of this scaling challenge: in a 200-cluster Clustermesh deployment with the following configuration parameters, each Cilium agent consumes 15GiB of memory: |
| 22 | + |
| 23 | +- cm-apiserver-replicas=2 |
| 24 | +- clusters=200 |
| 25 | +- nodes/cluster=300 |
| 26 | +- identities/cluster=500 |
| 27 | +- identities-qps=0.2 |
| 28 | +- endpoints/cluster=15000 |
| 29 | +- endpoints-qps=1 |
| 30 | +- services=0 |
| 31 | +- services-qps=0 |
| 32 | + |
| 33 | +By restricting distribution to only those resources backing global services, we can achieve substantial scalability improvements and operational efficiency gains. The gains in compute from the proposed optimization would also ensure that customers can run multiple replicas of the Clustermesh APIserver which is recommended for production workloads. |
| 34 | + |
| 35 | +## Goals |
| 36 | + |
| 37 | +- Enhance Cilium Clustermesh scalability through selective endpoint and identity distribution |
| 38 | +- Maintain cross-cluster visibility for endpoints and identities associated with global services |
| 39 | +- Service to Service connection and network policies |
| 40 | + |
| 41 | +## Non-Goals |
| 42 | + |
| 43 | +- Direct inter-cluster endpoint access for resources not backed by global services |
| 44 | +- Cross-cluster network policy enforcement |
| 45 | +- Pod to Pod or Pod to service connection or Network Policy |
| 46 | + |
| 47 | +## Proposal |
| 48 | + |
| 49 | +### Implementation details |
| 50 | + |
| 51 | +We intend to add a new cache in conjunction to the Service cache that exists both in the cilium agent as well as the operator. We intend to provide a new config in the cilium agent configmap called scoped-export which is by default set to false. Upon setting this config, the agent and the operator will annotate the CiliumEndpoint, CiliumEndpointSlice and CiliumIdentity CRDs with internal annotation allowlisting them for export to etcd. A change in scope–export config would require restart of cilium agent and operator components for the changes to take effect. |
| 52 | + |
| 53 | +### Global Service Backend Cache |
| 54 | + |
| 55 | +#### Implementation Location |
| 56 | + |
| 57 | +The global service backend cache is implemented in both the Cilium Agent and Operator and works in conjunction with the ServiceCache. The cache resides both in the agent and operator maintaining a synchronized copy of global service backend endpoint IPs. |
| 58 | +The cache maps backend IP addresses to metadata including service name and namespace |
| 59 | + |
| 60 | +#### Cache Management |
| 61 | + |
| 62 | +The cache is populated via service endpoint updates and receives live updates when backends change. Stale entries are automatically cleaned up, and synchronization mechanisms ensure the global service endpoint cache contains the latest copy of backend endpoint ips. Both Cilium Agent and Cilium Operator already listen to existing Service Events. These events are consumed to populate the global service backend IP cache. |
| 63 | + |
| 64 | +### Cilium Agent Changes |
| 65 | + |
| 66 | +#### Service Cache Extension |
| 67 | + |
| 68 | +The agent's existing service cache is extended to support lookup for ips against global services. |
| 69 | + |
| 70 | +#### CiliumEndpoint Reconciliation |
| 71 | + |
| 72 | +Every 10 seconds, the agent checks each CiliumEndpoint IP against the global backend cache. If matched, it adds an internal annotation to mark it as a global backend; otherwise, it removes the annotation. This annotation is internal and not user-modifiable. |
| 73 | + |
| 74 | +#### CiliumIdentity Reconciliation |
| 75 | + |
| 76 | +When managing CiliumIdentities, the agent listens for global service IP cache events. On upserts or deletions, it reconciles the relevant identities and updates their annotations accordingly. |
| 77 | + |
| 78 | +### Cilium Operator Changes |
| 79 | + |
| 80 | +#### Service Cache Extension |
| 81 | + |
| 82 | +The agent's existing service cache is extended to support lookup for ips against global services. |
| 83 | + |
| 84 | +#### CiliumIdentity Handling |
| 85 | + |
| 86 | +If the operator manages CiliumIdentities, it responds to global service events from its cache and performs reconciliation to add or remove internal annotation as needed. |
| 87 | + |
| 88 | +#### CiliumEndpointSlice Controller Updates |
| 89 | + |
| 90 | +The controller, which already watches CiliumEndpoint resources, is updated to check for the global backend annotation. If present, it adds a corresponding annotation to the CiliumEndpointSlice. Annotation changes are tracked and updated accordingly. |
| 91 | + |
| 92 | +### ClusterMesh API Server Changes |
| 93 | + |
| 94 | +#### Export Filtering |
| 95 | + |
| 96 | +The ClusterMesh API server filters resources before exporting to etcd. Only those with the internal global backend annotation are shared for cross-cluster visibility. |
| 97 | + |
| 98 | +#### Supported Resource Types |
| 99 | + |
| 100 | +The filter applies to CiliumEndpoint, CiliumIdentity, and CiliumEndpointSlice. Only annotated resources are exported to be shared cross clusters. |
| 101 | + |
| 102 | +## Future Milestones |
| 103 | + |
| 104 | +### Dynamic Export Scope Configuration |
| 105 | + |
| 106 | +Introduce support for dynamic configuration of export scope per namespace or pod label. This would allow operators to fine-tune which endpoints and identities are shared across clusters beyond just global service association. |
| 107 | + |
| 108 | +### Policy-Aware Export Filtering |
| 109 | + |
| 110 | +Integrate network policy awareness into the export decision logic. Endpoints and identities could be exported based on whether they are referenced in cross-cluster network policies, even if not fronted by a global service. |
0 commit comments