-
Notifications
You must be signed in to change notification settings - Fork 36
Clustermesh APIserver: Add CFP to filter ciliumendpoints, identities and endpointslices exported to etcd #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustermesh APIserver: Add CFP to filter ciliumendpoints, identities and endpointslices exported to etcd #74
Conversation
6c64708
to
3992f0c
Compare
cc @cilium/sig-clustermesh |
Hi @ysksuzuki . The CFP is targeted for Clustermesh but it also touches the Cilium Agent and Cilium Operator (Specifically the CRD controllers for CiliumIdentity, CiliumEndpoint and CiliumEndpointSlice). Do you know the reviewers for these components? |
Hi @krunaljain , thank you for the CFP. I think the Cluster Mesh team is sufficient as the initial reviewer. If they evaluate the proposal and determine that it has a significant impact on specific subsystems of the agent or operator, they’ll likely ask the necessary teams for additional review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @krunaljain and @vakalapa for this proposal.
I totally agree that having a way to limit the amount of information exchanged via Cluster Mesh is beneficial for all those use-cases in which only a limited subset of workloads needs to communicate cross-cluster.
I've did a first pass on the proposal and left a few comments inline. Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal, I think it's definitely worth exploring and I would love to see an improvement in this area.
I've added one larger comment with most of my thoughts on the topic.
There are three major parts of it:
- What configurations it would work with
- What would be the user interface
- What other pros/cons would it have, including complexity and maintenance cost
I would also be happy to jump on a call if you think it makes sense to discuss :)
Happy to join as well if useful. |
be55711
to
57152bc
Compare
Hi @marseel @giorio94 @MrFreezeex Updated the CFP as per the discussion. Let me know if you have any questions with the same. |
Two high level comments, before I provide more detailed feedback.
|
@giorio94 updated with recommended changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this new iteration!
I left a few comment inline about a few points that are not clear to me.
Also could you include in that CFP how this would interact with MCS-API? I think if the namespace is not global/does not export things it should not export the service info and the ServiceExport condition should have a condition to signal that (something similar to this one with an updated message should work https://github.com/cilium/cilium/blob/53c7424657a088e6e4585ebbc072caf1b010f46c/pkg/clustermesh/mcsapi/serviceimport_controller.go#L376).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. One more round of high-level comments inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This lgtm from my perspective meaning UX wise and clustermesh logic, although there are still a few threads/suggestion still opened about connectivity & policies from @giorio94.
I was also wondering about possibly adding an annotation similar to service.cilium.io/shared: "false"
on the namespace level but I don't think it would be that necessary/convenient on the MCS-API side and the global namespace annotation seems to be a great start already anyway.
Ah and also but that's more a nit. Since we are talking about global/local namespace you could probably rephrase the mention of scoped export mode to replace it by local/global namespace. IMO it would be more understandable for people reading the CFP without having the full history of the discussion |
Done |
@krunaljain I do not see any mention of the |
@vakalapa Yes that is correct. That annotations is a part of the overall Global Service functionality which will only be honored in global namespaces. This is mentioned explicitly in the Goals section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel that the network policies section needs clarification.
3151b80
to
3528774
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal! I can see how sharing less information could help with the scalability.
I think it would be worth while for us to discuss the key question I suggested at the bottom here around defaults. This key question could significantly influence the "upgrade" impact that I also proposed. Users are already relying on the existing behavior of Cilium, so we'll want to be careful to consider the options during upgrade and any mitigations we might want to put in place to avoid breaking users as they transition to Cilium version that communicates less information.
I'm also thinking how a user might debug this configuration, and I think we'll want to put some thought into that. I suggested an impact for that below as well.
|
||
- Identity ID is exported ✅ | ||
- Labels are NOT exported ❌ | ||
- **Result**: Destination cluster knows _who_ the source is (by ID) but not _what_ it is (labels). The policy applied through the labels is not enforced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assuming the source cluster knows how to route this specific Pod's traffic to the destination cluster (presumably via the specific Node the destination resides on?). Is that a given?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the case where the destination is in a local namespace in a remote cluster, the traffic wouldn't even get routed to the destination, so the point is somewhat moot for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing some ideas out here, but if the source cluster knows that the source Identity is in a local namespace and the traffic is destined to a Pod in a remote cluster in a global namespace, maybe the source cluster should just drop with a new reason something like DROPPED (Local Pod unroutable on Cluster Mesh)
. This would act a bit like a reverse path filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joestringer I think the clustermesh prerequisite ensures there is a node level connectivity between clusters which ensures n/w connectivity. The optimization to drop network traffic off local namespaces can be implemented although will require changes in the underlying bpf programs to consume namespace events. If there is a marked global namespace, then the logic needs to be executed. It might be a bit complex to achieve the same. Hence, directly marked the use case as not supported for v1. Can this be a future goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assuming the source cluster knows how to route this specific Pod's traffic to the destination cluster (presumably via the specific Node the destination resides on?). Is that a given?
For the case where the destination is in a local namespace in a remote cluster, the traffic wouldn't even get routed to the destination, so the point is somewhat moot for that case.
AFAIK, routing strictly speaking would work in the majority of cases, as either the clusters are configured in native routing mode (or ENI), and in that case the network knows how to forward the packets, or it operates in tunnel mode, and in that case it is likely that the IPAM is node-based, and there's a fallback ipcache entry handling the routing to the CIDR associated with that node.
Throwing some ideas out here, but if the source cluster knows that the source Identity is in a local namespace and the traffic is destined to a Pod in a remote cluster in a global namespace, maybe the source cluster should just drop with a new reason something like DROPPED (Local Pod unroutable on Cluster Mesh)
This should be relatively easy to achieve whenever there's a per-node CIDR, as it would be basically a matter of assigning a different identity to the fallback ipcache entry (currently it is world for backwards compatibility) for the nodes in remote clusters. Ingress policies in tunnel mode would require an extra ipcache lookup, though, as it is not possible to rely on the source identity conveyed via the tunnel metadata (which would be the real ID, not this fallback).
If there's no per-node CIDR, users would need to configure an entry that covers the entire cluster CIDR, which can get tricky in cloud environments if the VPC is reused for e.g., multiple clusters (I don't have much experience there on the typical configurations though).
Overall, I agree with @krunaljain that this is something for a follow-up step though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the idea I was thinking of, it would only be on egress. Since the destination Pod would be in a global namespace I'm assuming that its info is propagated to the source. A quick check of the upper range of the Security Identity could tell that it's part of a remote cluster. The fact that the destination is known from ipcache would imply it's in a global namespace in that cluster.
But yes not a hard requirement, maybe a good extension on top. The problem I'm concerned about is when users encounter problems due to this feature, how will they debug and understand what is misconfigured. There may be some other ideas how to improve that aspect as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the idea I was thinking of, it would only be on egress. Since the destination Pod would be in a global namespace I'm assuming that its info is propagated to the source. A quick check of the upper range of the Security Identity could tell that it's part of a remote cluster. The fact that the destination is known from ipcache would imply it's in a global namespace in that cluster.
Yeah, that would work, as long as there's a way to let the datapath know that the source is not in a global namespace (e.g., via a load time config, or through an ipcache flag). The other caveat is that it would not be reliable for special destinations such as node, ingress and alike, as these are currently not scoped by cluster ID.
The reason I find the usage of an "unknown-clustermesh-endpoint" identity attractive is that it would allow writing a policy that allows traffic from/to unknown clustermesh entities, without conflating that with world traffic.
Good point about troubleshooting, I need to think a bit more about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the "unknown-clustermesh-endpoint" identity concept is interesting. Even better if there is one of these per cluster. If for instance we could guarantee in many cases that each cluster has distinct Pod IP range(s), and the pod IP range(s) have a "fallback" identity that consists of just the cluster's name / labels / etc., then you could imagine in a case where you have many large clusters, some Endpoints could have a policy like "allow egress to cluster A" or "allow egress to clusters with label B" which would not require propagating knowledge about every endpoint's location to every other cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that would be definitely better 👍
4172470
to
3ca8536
Compare
3ca8536
to
e9f34c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more minor clarifying questions.
1c1c613
to
51cf3d2
Compare
@joestringer @giorio94 are there any additional questions/ action items on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the proposal as is. Other @cilium/sig-policy members might be interested to look over the policy parts as well, but I don't think we need to block on that. Maybe we can aim to merge this around the end of the week if there's no other feedback?
51cf3d2
to
61a9612
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal looks good to me as well. Thanks!
61a9612
to
3ac805a
Compare
…and endpointslices exported to etcd Signed-off-by: krunaljain <[email protected]>
3ac805a
to
64d9050
Compare
Thanks! I'll merge in this proposal as "implementable". If there's subsequent learnings from review & implementation, we can always come back and update the proposal through another PR. |
This PR adds CFP to selectively export CiliumEndpoints, CiliumIdentities and CiliumEndpointSlices in Clustermesh APIserver. Only entities annotated with an internal annotation are expored to etcd when this "scoped-export" mode is enabled in the cilium-config.
Related: cilium/cilium#39876