Skip to content

Commit 4375b1d

Browse files
committed
docs: add proposal for OmniControl topology visualization
Signed-off-by: SunsetB612 <10235101575@stu.ecnu.edu.cn>
1 parent 7ca7c7c commit 4375b1d

File tree

4 files changed

+250
-0
lines changed

4 files changed

+250
-0
lines changed
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
2+
| title | authors | reviewers | approvers | creation-date |
3+
| --- |-----------| --- | --- |---------------|
4+
| OmniControl for Karmada Dashboard | @SunsetB612 | @ | @ | 2026-03-08 |
5+
6+
# OmniControl for Karmada Dashboard
7+
8+
## Summary
9+
10+
Karmada Dashboard has implemented resource management on both the control plane and member clusters, but resource management remains at the atomic level, with the relationships between resources not yet intuitively presented. Users find it difficult to start from a ResourceTemplate and quickly trace its matching PropagationPolicy, the generated ResourceBindings and Works, and the distribution status across clusters.
11+
12+
OmniControl aims to take ResourceTemplate as the core perspective, integrating and presenting the resource states of both the control plane and member cluster planes, building an end-to-end topology view that covers the full lifecycle of resources. Users can intuitively see the complete chain from policy matching and binding generation to cross-cluster distribution, enabling rapid problem identification when propagation or distribution failures occur.
13+
14+
To this end, we will design and implement topology visualization capabilities for Karmada's core resources, and provide comprehensive design documentation and API documentation.
15+
16+
## Motivation
17+
18+
When managing multi-cluster resources with Karmada Dashboard, users often need to navigate back and forth across multiple pages just to piece together the complete state of a single resource. For example, when a ResourceTemplate encounters a distribution anomaly, users must visit several resource pages separately — including PropagationPolicy, ResourceBinding, and Work — to troubleshoot the issue one by one. This process is tedious and prone to missing critical information.
19+
20+
The current atomic management approach fails to intuitively present the relationships between resources. Users can neither see the matching and binding chain of resources on the control plane, nor get an overview of the distribution and runtime status of a resource across member clusters. This makes fault diagnosis costly, and the difficulty is especially pronounced for users new to Karmada, who struggle to understand the end-to-end flow logic of resources.
21+
22+
Therefore, we aim to introduce OmniControl capabilities into Karmada Dashboard — using ResourceTemplate as the entry point to unify the visualization of related resource associations and cross-cluster states. This will help users quickly grasp the full picture of their resources and reduce the cognitive and operational overhead of multi-cluster management.
23+
24+
### Goals
25+
26+
- Design and implement a resource topology visualization API that supports querying, starting from a ResourceTemplate, its associated PropagationPolicy, ResourceBinding, Work, and the distribution status across member clusters
27+
- Integrate topology visualization components into the Karmada Dashboard frontend to intuitively display resource associations between the control plane and member cluster planes
28+
- Provide comprehensive design documentation and API documentation
29+
30+
### Non-Goals
31+
32+
- The existing atomic resource management pages will not be replaced or modified; OmniControl serves as an enhanced view layered on top of existing capabilities
33+
- No modifications to Karmada's control plane scheduling logic or policy engine are involved
34+
35+
## Proposal
36+
37+
### User Stories (Optional)
38+
39+
### Story 1 — Enhanced Resource Management Experience
40+
41+
As a Karmada user, managing multi-cluster resources currently requires navigating between separate pages for PropagationPolicy, ResourceBinding, Work, and member cluster workloads. This fragmented workflow makes it difficult to understand how resources are connected and increases the time needed for routine operations.
42+
43+
With OmniControl, users can start from any Workload and instantly see its full propagation topology in a single view — the matched PropagationPolicy, generated ResourceBindings, distributed Works, and the actual workloads running in each member cluster. Users can click any node in the topology to inspect or operate on that resource directly, without switching between pages. This unified view reduces cognitive overhead, simplifies daily resource management, and makes the Karmada Dashboard more intuitive for both new and experienced users.
44+
45+
### Story 2 — Fault Diagnosis
46+
47+
As a cluster administrator, when a Workload is not running as expected in a member cluster, the current troubleshooting process requires manually checking each stage of the propagation chain — PropagationPolicy matching, ResourceBinding generation, Work dispatch, and member cluster execution — across different pages and namespaces, making it easy to lose context or miss the root cause.
48+
49+
With OmniControl, the administrator can open the topology view for the problematic Workload and immediately identify the failing stage through status coloring (green/yellow/red). Clicking the abnormal node reveals its detailed status, events, and conditions without leaving the topology view, significantly reducing the mean time to diagnosis.
50+
51+
### Notes/Constraints/Caveats (Optional)
52+
53+
### Risks and Mitigations
54+
55+
## Design Details
56+
57+
### Overview
58+
59+
The overall overview is illustrated in the diagram below:
60+
61+
<img src="./overview.png" width="600" alt="Overview" />
62+
63+
### Resource Propagation Chain (Informer + Indexer)
64+
65+
**Core Idea**: Use SharedInformerFactory to build a local cache and register custom Indexers for O(1) lookups, minimizing API server pressure.
66+
67+
The complete propagation chain from control plane to member clusters:
68+
69+
```
70+
ResourceTemplate (control-plane)
71+
→ PropagationPolicy / ClusterPropagationPolicy
72+
→ ResourceBinding / ClusterResourceBinding
73+
→ OverridePolicy / ClusterOverridePolicy (applied on Work)
74+
→ Work (in namespace karmada-es-{cluster})
75+
→ Workload (member cluster)
76+
```
77+
78+
#### Workload ➡️ PropagationPolicy
79+
80+
Locate the associated PropagationPolicy by reading two annotations from the Workload:
81+
- `propagationpolicy.karmada.io/namespace`: The namespace where the PropagationPolicy is located
82+
- `propagationpolicy.karmada.io/name`: The name of the PropagationPolicy
83+
84+
Once these values are obtained, perform a direct GET to retrieve the PropagationPolicy resource.
85+
86+
#### Workload ➡️ ResourceBinding
87+
88+
Use a ResourceBinding Informer with a custom Indexer keyed by `ownerReferences[].uid`:
89+
90+
1. Start the ResourceBinding Informer to cache all ResourceBinding objects locally
91+
2. Register a custom Indexer with `ownerReferences[].uid` as the index key
92+
3. The Informer automatically maintains an `ownerUID → []ResourceBinding` index in memory
93+
4. At query time, use the Workload's `uid` as the key to retrieve matching ResourceBindings in O(1)
94+
95+
```go
96+
rbs, err := rbInformer.GetIndexer().ByIndex("byOwnerUID", string(deploy.UID))
97+
```
98+
99+
#### ResourceBinding ➡️ Work
100+
101+
Use a Work Informer with a custom Indexer keyed by annotation `resourcebinding.karmada.io/name`:
102+
103+
1. Start the Work Informer to sync and cache all Work objects locally
104+
2. Register a custom Indexer with `metadata.annotations["resourcebinding.karmada.io/name"]` as the index key
105+
3. The Informer automatically maintains an `rbName → []Work` index in memory
106+
4. At query time, use the ResourceBinding's name as the key to retrieve matching Works in O(1)
107+
108+
```go
109+
workItems, err := workInformer.GetIndexer().ByIndex("byRBName", rbName)
110+
```
111+
112+
#### Work ➡️ OverridePolicy
113+
114+
Retrieve applied overrides by reading Work annotations:
115+
- **Namespace-level**: `policy.karmada.io/applied-overrides` — OverridePolicies from the same namespace
116+
- **Cluster-level**: `policy.karmada.io/applied-cluster-overrides` — ClusterOverridePolicies
117+
118+
Each annotation contains a JSON array of applied policies. Use the `policyName` field to locate the actual OverridePolicy/ClusterOverridePolicy resource.
119+
120+
#### Work ➡️ Member Workload
121+
122+
Retrieve the actual workload from the member cluster:
123+
1. From the Cluster CR, obtain the cluster's API endpoint and authentication credentials (secret)
124+
2. Build a Kubernetes client using these credentials
125+
3. Perform a GET request to retrieve the workload (Deployment, Pod, etc.) from the member cluster
126+
127+
### Frontend Topology Visualization
128+
129+
The frontend uses the topology rendering engine ReactFlow to render the link data returned by the backend into an interactive Directed Acyclic Graph (DAG), with node levels arranged from top to bottom:
130+
131+
<table width="100%">
132+
<tr>
133+
<td width="50%"><img src="./karmada.png" width="100%" alt="Karmada" /></td>
134+
<td width="50%"><img src="./topology.png" width="100%" alt="Topology" /></td>
135+
</tr>
136+
</table>
137+
138+
Interaction Design:
139+
* **Entry Point:** Add a "Topology View" icon button to each row of the existing resource list page; clicking it opens the full-chain topology view for that resource.
140+
* **Node Click:** Clicking any node in the topology graph displays the detailed information of that resource.
141+
* **Status Coloring:** Nodes are colored according to resource status (green = healthy, yellow = in progress, red = abnormal), helping users quickly locate faulty nodes.
142+
* **Auto Layout:** The topology graph uses automatic layout algorithms such as Dagre to ensure that nodes do not overlap and connections remain clear in multi-cluster scenarios.
143+
144+
### Test Plan
145+
146+
## Alternatives
147+
148+
### Solution 1: Annotation-Based Forward Tracing
149+
150+
**Core Idea**: Starting from a control-plane Workload, trace downstream resources by following Karmada's injected annotations and ownerReferences to discover associated PropagationPolicies, ResourceBindings, Works, and their distribution across clusters.
151+
152+
#### Workload ➡️ PropagationPolicy
153+
154+
Same as the proposed solution.
155+
156+
#### Workload ➡️ ResourceBinding
157+
158+
List all ResourceBindings in the same namespace as the Workload, then filter by comparing the ownerReferences:
159+
- Check if any ResourceBinding's `ownerReferences[].uid` matches the Workload's `uid`
160+
- A matching ResourceBinding indicates it was created to manage this specific Workload
161+
162+
Multiple ResourceBindings may be created if the Workload matches multiple PropagationPolicies.
163+
164+
#### ResourceBinding ➡️ Work
165+
166+
For each cluster specified in the ResourceBinding's `spec.clusters`:
167+
1. Construct the namespace: `karmada-es-{ClusterName}`
168+
2. List all Works in that namespace
169+
3. Filter by annotation: `resourcebinding.karmada.io/name` equals the ResourceBinding's name
170+
171+
This returns the Work object that represents the Workload's deployment to that specific cluster.
172+
173+
#### Work ➡️ OverridePolicy
174+
175+
Same as the proposed solution.
176+
177+
#### Work ➡️ Member Workload
178+
179+
Same as the proposed solution.
180+
181+
### Solution 2: Direct GET via Naming Convention
182+
183+
**Core Idea**: Karmada uses deterministic naming rules for ResourceBinding and Work, allowing direct GET queries by constructing names instead of using List operations.
184+
185+
**Naming Rules**:
186+
- **ResourceBindingName**: `WorkloadName + "-" + WorkloadKind`
187+
- **WorkNamespace**: `"karmada-es-" + ClusterName`
188+
- **WorkName**: `ResourceBindingNamespace + "-" + ResourceBindingName + "-" + hash`
189+
190+
#### Workload ➡️ PropagationPolicy
191+
192+
Same as the proposed solution.
193+
194+
#### Workload ➡️ ResourceBinding
195+
196+
Use the deterministic naming convention to directly construct the ResourceBindingName and perform a GET query, avoiding List operations.
197+
198+
```go
199+
rbName := names.GenerateBindingName("Deployment", name)
200+
rb, err := clients.KarmadaClient.WorkV1alpha2().ResourceBindings(namespace).Get(ctx, rbName, v1.GetOptions{})
201+
```
202+
203+
#### ResourceBinding ➡️ Work
204+
205+
Retrieve all cluster information from the ResourceBinding, construct the deterministic WorkName for each cluster, and perform GET queries.
206+
207+
```go
208+
// Query ClusterName from ResourceBinding
209+
for _, binding := range rb.Spec.Clusters {
210+
clusterName := binding.Name
211+
212+
// Construct Work namespace
213+
workNamespace := "karmada-es-" + clusterName
214+
215+
// Construct Work name
216+
workName := names.GenerateWorkName("Deployment", name, namespace)
217+
218+
// Direct GET Work
219+
work, err := clients.KarmadaClient.WorkV1alpha1().Works(workNamespace).Get(ctx, workName, v1.GetOptions{})
220+
}
221+
```
222+
223+
#### Work ➡️ OverridePolicy
224+
225+
Same as the proposed solution.
226+
227+
#### Work ➡️ Member Workload
228+
229+
Same as the proposed solution.
230+
231+
### Comparison of Three Solutions
232+
233+
**Lookup Method per Step**:
234+
235+
| Step | Proposed (Informer + Indexer) | Solution 1 (Annotation-Based Tracing) | Solution 2 (Naming Convention GET) |
236+
|------|-------------------------------|----------------------------------------|------------------------------------|
237+
| RT → PP | Annotation direct GET | Annotation direct GET | Annotation direct GET |
238+
| RT → RB | Informer Indexer by `ownerReferences[].uid`, O(1) | List all RBs, filter by `ownerReferences[].uid` | Construct name via `GenerateBindingName`, direct GET |
239+
| RB → Work | Informer Indexer by annotation `resourcebinding.karmada.io/name`, O(1) | List Works in `karmada-es-{cluster}`, filter by annotation | Construct name via `GenerateWorkName`, direct GET |
240+
| Work → OP | Annotation direct GET | Annotation direct GET | Annotation direct GET |
241+
| Work → Member Workload | Member cluster client GET | Member cluster client GET | Member cluster client GET |
242+
243+
**Overall Comparison**:
244+
245+
| Aspect | Proposed (Informer + Indexer) | Solution 1 (Annotation-Based Tracing) | Solution 2 (Naming Convention GET) |
246+
|--------|-------------------------------|----------------------------------------|------------------------------------|
247+
| Performance | Best — O(1) in-memory lookup | Worst — multiple List + filter | Good — direct GET, still requires API calls |
248+
| Startup Cost | High — requires Watch to build local cache | None | None |
249+
| Memory Usage | High — maintains full resource cache | Low | Low |
250+
| Reliability | High — based on Kubernetes native mechanisms | High — works with any naming pattern | Low — breaks if naming rules change |
74.1 KB
Loading
140 KB
Loading
102 KB
Loading

0 commit comments

Comments
 (0)