Skip to content

[FlowExporter] More efficient IP checks#6960

Merged
antoninbas merged 2 commits intoantrea-io:mainfrom
antoninbas:improve-pod-ip-checks-in-flowexporter
Feb 21, 2025
Merged

[FlowExporter] More efficient IP checks#6960
antoninbas merged 2 commits intoantrea-io:mainfrom
antoninbas:improve-pod-ip-checks-in-flowexporter

Conversation

@antoninbas
Copy link
Copy Markdown
Contributor

@antoninbas antoninbas commented Jan 30, 2025

The FlowExporter in the Agent queries the NodeRouteController to determine whether the source / destination IPs are Pod IPs (NodeIPAM only). Prior to this change, these checks were expensive, involving an indexer lookup and conversions between different IP formats. The new implementation is about 10x faster, and performs no memory allocations.

The new implementation introduces a new set in the NodeRouteController, dedicated to storing all the PodCIDRs in the cluster. While I considered removing the dependency of the FlowExporter on the NodeRouteController altogether, it would have been a much bigger change. Additionally, in the long term, we could consider removing these checks from the FlowExporter altogether, and pushing the logic to the FlowAggregator.

We also make a few additional changes to the FlowExporter:

  • more consistently ignore connections where the source / destination IP is a gateway IP
  • classify Pod-to-Service traffic where the destination IP is not a Pod IP as Pod-to-External
  • log a warning if FlowExporter is enabled alongside AntreaIPAM

@antoninbas antoninbas force-pushed the improve-pod-ip-checks-in-flowexporter branch 2 times, most recently from 1d444b8 to 6b4b8e2 Compare January 30, 2025 06:08
@antoninbas antoninbas added this to the Antrea v2.3 release milestone Feb 7, 2025

func (exp *FlowExporter) exportConn(conn *flowexporter.Connection) error {
conn.FlowType = exp.findFlowType(*conn)
if conn.FlowType == flowTypeUnsupported {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will export the flowTypeUnknown flow anyway? or it's actually not possible when the FlowExporter is running? It seems the nodeRouteController will be nil only when it's running as a VM agent, Do we support FlowExporter when Antrea is running as a VM agent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preserved the existing behavior, we can revisit it later.

@antoninbas antoninbas force-pushed the improve-pod-ip-checks-in-flowexporter branch from dbcf705 to e1dac0b Compare February 12, 2025 21:15
@antoninbas antoninbas marked this pull request as ready for review February 12, 2025 21:16
return flowTypeUnsupported
}
if exp.nodeRouteController.IPInPodSubnets(conn.FlowKey.SourceAddress.AsSlice()) {
if conn.Mark&openflow.ServiceCTMark.GetRange().ToNXRange().ToUint32Mask() == openflow.ServiceCTMark.GetValue() || exp.nodeRouteController.IPInPodSubnets(conn.FlowKey.DestinationAddress.AsSlice()) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heanlan @yuntanghsu could you take a look at this change?
I tried to look through the Git history, but could not find the rationale for this check (conn.Mark&openflow.ServiceCTMark.GetRange().ToNXRange().ToUint32Mask() == openflow.ServiceCTMark.GetValue()) and there was no comment in the code. Why are we checking if the flow is a "Service flow"? it feels to me like the VIP does not matter when classifying the flow. Only the actual destination matters (is it a Pod and if yes, is it a local Pod or not?).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one might be copied and paste from deny_connections where we used this condition to determine whether we need to fill the service information?
I cannot remember it clearly.
But if it is a service flow, will LookupIPInPodSubnets return true for isPod? Because I think if a flow is a service flow then it must be FlowTypeInterNode or FlowTypeIntraNode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Service traffic, conn.FlowKey.DestinationAddress will always be the resolved Endpoint IP, not the ClusterIP, even at the source Node. This is ensured by using conn.TupleReply.IP.SourceAddress:

FlowKey: flowexporter.Tuple{
SourceAddress: conn.TupleOrig.IP.SourceAddress,
DestinationAddress: conn.TupleReply.IP.SourceAddress,
Protocol: conn.TupleOrig.Proto.Protocol,
SourcePort: conn.TupleOrig.Proto.SourcePort,
DestinationPort: conn.TupleReply.Proto.SourcePort,
},

So yes LookupIPInPodSubnets will return true for Pod-to-Service traffic, both at the source Node and destination Node (incl. if these 2 Nodes are the same).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the tests are passing, so I have to assume that the new logic is valid...

Copy link
Copy Markdown
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, two nits.

The FlowExporter in the Agent queries the NodeRouteController to
determine whether the source / destination IPs are Pod IPs (NodeIPAM
only). Prior to this change, these checks were expensive, involving an
indexer lookup and conversions between different IP formats. The new
implementation is about 10x faster, and peforms no memory allocations.

The new implementation introduces a new set in the NodeRouteController,
dedicated to storing all the PodCIDRs in the cluster. While I considered
removing the dependency of the FlowExporter on the NodeRouteController
altogether, it would have been a much bigger change. Additionally, in
the long term, we could consider removing these checks from the
FlowExporter altogether, and pushing the logic to the FlowAggregator.

We also make a few additional changes to the FlowExporter:
* more consistently ignore connections where the source / destination IP
  is a gateway IP
* classify Pod-to-Service traffic where the destination IP is not a Pod
  IP as Pod-to-External
* log a warning if FlowExporter is enabled alongside AntreaIPAM

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas antoninbas force-pushed the improve-pod-ip-checks-in-flowexporter branch from e1dac0b to 970411f Compare February 14, 2025 21:29
@antoninbas
Copy link
Copy Markdown
Contributor Author

/test-all

Copy link
Copy Markdown
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return flowTypeUnsupported
}
if exp.nodeRouteController.IPInPodSubnets(conn.FlowKey.SourceAddress.AsSlice()) {
if conn.Mark&openflow.ServiceCTMark.GetRange().ToNXRange().ToUint32Mask() == openflow.ServiceCTMark.GetValue() || exp.nodeRouteController.IPInPodSubnets(conn.FlowKey.DestinationAddress.AsSlice()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one might be copied and paste from deny_connections where we used this condition to determine whether we need to fill the service information?
I cannot remember it clearly.
But if it is a service flow, will LookupIPInPodSubnets return true for isPod? Because I think if a flow is a service flow then it must be FlowTypeInterNode or FlowTypeIntraNode?

}
return ipfixregistry.FlowTypeIntraNode
if !srcIsPod {
if klog.V(5).Enabled() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we haven't supported external to pod flow, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we have never supported this case, and we still don't.

}
return flowTypeUnsupported
}
if !dstIsPod {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment above. I think the destination can be a service ip and it would be an inter/intra Node flow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered the comment above. conn.FlowKey.DestinationAddress will never be a Service VIP.
At the source Node (and only at the source Node), conn.OriginalDestinationAddress will store the Service VIP.

@antoninbas antoninbas merged commit f74eee0 into antrea-io:main Feb 21, 2025
58 of 63 checks passed
@antoninbas antoninbas deleted the improve-pod-ip-checks-in-flowexporter branch February 21, 2025 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants