Skip to content

Conversation

@jshahriddhi
Copy link
Contributor

@jshahriddhi jshahriddhi commented Aug 29, 2022

consul-dataplane will allow configuring an xDS server for dynamic envoy configuration. By default, this xDS server will be hosted locally by the consul-dataplane process itself.
This PR adds functionality to host a gRPC server to serve xDS requests within consul-dataplane. The gRPC server mainly acts like a proxy to forward the envoy xDS requests to a Consul server the consul-dataplane process is connected to.

Related JIRA: https://hashicorp.atlassian.net/browse/NET-99

// TODO (NET-148): Ensure the server connection here is the one acquired via the server discovery library
return outCtx, cdp.consulServer.grpcClientConn, nil
}
gRPCServer := grpc.NewServer(grpc.UnknownServiceHandler(proxy.TransparentHandler(director)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library has most of the scaffolding required to proxy grpc requests to a desired target.
The main proxy logic is here.

@boxofrad
Copy link
Contributor

Really like this approach! ⭐

I was a little concerned that we'd end up "manually" proxying the request (i.e. unmarshaling and re-marshaling requests and responses) in a similar way to how we forward RPCs in Consul. This is much better because it means we don't need to consume and upgrade the Envoy protobufs.

I guess one concern is that the library is marked as "proof of concept" but as we control our gRPC version here I don't think it's a particularly big issue.

@jshahriddhi
Copy link
Contributor Author

Elaborating a bit on the library background (will update the PR with comments):
The core library being used is actually this - https://github.com/mwitkow/grpc-proxy. However, we needed this fix and hence I ended up using the forked library.

I guess one concern is that the library is marked as "proof of concept" but as we control our gRPC version here I don't think it's a particularly big issue.

Agreed. But, I did get some confidence in it seeing this - https://github.com/siderolabs/grpc-proxy, that uses mwitkow/grpc-proxy as the core foundation to build on more features.

This is a fork of awesome mwitkow/grpc-proxy package with added support for one to many proxying.

@boxofrad
Copy link
Contributor

boxofrad commented Sep 1, 2022

I realized while reviewing hashicorp/consul-server-connection-manager#3, that when the server terminates an xDS stream to rebalance load, we may need to also close Envoy's connection to get it to reset state (nonces etc.)

I'm not quite sure where that logic would go if using this library. Will do some digging!

Edit: I think it should "just work" and we'll pass the error along to Envoy as-is, which should cause it to retry.

Copy link
Contributor

@boxofrad boxofrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Left a couple of small points, but nothing blocking 👏🏻

@jshahriddhi jshahriddhi marked this pull request as ready for review September 6, 2022 14:18
Copy link
Contributor

@boxofrad boxofrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookin' good! I've left a handful of (mostly optional) comments 😅

Comment on lines 152 to 156
func (cdp *ConsulDataplane) checkAndEnableLocalXDSServer() {
if checkLocalXDSServer(cdp.cfg.XDSServer.BindAddress) {
cdp.localXDSServer.enabled = true
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd that somebody would want to use consul-dataplane without the xDS proxy, other than the example in your comment above - which I think is unlikely because they could run consul connect envoy against the server instead.

What do you think about just erroring out when a non-local bind address is given?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was split about this too. Will check with Matt about the use case of allowing non local xds-bind-address as mentioned in the RFC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkeeler clarified via slack that the address was not intended to be a non-local address (since there is no use case for it).
I will fix up the PR accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

return errors.New("envoy xDS bind address not specified")
case cfg.XDSServer.BindPort == 0 && !checkLocalXDSServer(cfg.XDSServer.BindAddress):
return errors.New("envoy xDS bind port not specified")
case !strings.HasPrefix(cfg.XDSServer.BindAddress, "unix://") && cfg.XDSServer.BindAddress != "127.0.0.1" && cfg.XDSServer.BindAddress != "localhost":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: We might want to parse the address and use IsLoopback in case the user provides an IPv6 address.

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// For now we just give the server address directly.
AgentAddress: cdp.consulServer.address.String(),
AgentPort: strconv.Itoa(cdp.cfg.Consul.GRPCPort),
AgentAddress: cdp.cfg.XDSServer.BindAddress,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to eventually refactor this to not call things Agent*. Its very non-urgent though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea for now we have copy-pasted a lot of this from the existing consul connect command in the consul repo. We can diverge to a more suitable naming at a later point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants