Skip to content

credentials: add GcpServiceAccountIdentityCallCredentials call credentials type (gRFC A83).#8974

Open
Pranjali-2501 wants to merge 1 commit intogrpc:masterfrom
Pranjali-2501:call-credential-changes
Open

credentials: add GcpServiceAccountIdentityCallCredentials call credentials type (gRFC A83).#8974
Pranjali-2501 wants to merge 1 commit intogrpc:masterfrom
Pranjali-2501:call-credential-changes

Conversation

@Pranjali-2501
Copy link
Contributor

@Pranjali-2501 Pranjali-2501 commented Mar 15, 2026

This PR implements GcpServiceAccountIdentityCallCredentials, a new call credentials type required by gRFC A83: xDS GCP Authentication Filter. This credential fetches and manages GCP Service Account Identity tokens for a given audience, allowing gRPC services running on GCP to authenticate RPCs.

Implementation Details & gRFC Deviations

The gRFC provides detailed specifications for how this credential should manually fetch tokens from the GCE metadata server, extract the exp field, and manually calculate refresh intervals (e.g., refreshing 30 seconds early).

However, we have decided to use google auth library to fetch the token.

In Go, the standard Google API authentication packages (golang.org/x/oauth2/google) only provide access tokens, which are fundamentally different from the identity tokens required here.

To properly fetch identity tokens, we have decided to use the officially supported Google Auth library for Go: cloud.google.com/go/auth/credentials/idtoken.

The library causes the following behavioral difference from gRFC.

  • Hardcoded early expiry: The idtoken package has its own internal logic and hardcoded values for when to proactively refresh a token (currently, it expires tokens 5 minutes early, not 30 seconds early as suggested by the gRFC).
  • Execution: We rely on idtoken.NewCredentials to handle the HTTP requests to the metadata server, extraction of the JWT, and cache invalidation.

Behavioral Guarantees Implemented:

  • Handles fetching the token on-demand based on the provided audience.
  • Ensures data-plane RPCs block concurrently while a single background network request fetches the token.
  • Applies standard exponential backoff if the token fetch fails.
  • Injects the authorization: Bearer header into outbound metadata.

RELEASE NOTES: N/A

@Pranjali-2501 Pranjali-2501 added this to the 1.81 Release milestone Mar 15, 2026
@Pranjali-2501 Pranjali-2501 added Type: Feature New features or improvements in behavior Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Mar 15, 2026
@Pranjali-2501 Pranjali-2501 requested a review from mbissa March 15, 2026 21:43
@codecov
Copy link

codecov bot commented Mar 15, 2026

Codecov Report

❌ Patch coverage is 64.00000% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.95%. Comparing base (f967422) to head (d36cb5a).

Files with missing lines Patch % Lines
...google/gcp_service_account_identity_credentials.go 64.00% 24 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8974      +/-   ##
==========================================
- Coverage   83.13%   82.95%   -0.18%     
==========================================
  Files         411      412       +1     
  Lines       32704    32779      +75     
==========================================
+ Hits        27187    27191       +4     
- Misses       4140     4193      +53     
- Partials     1377     1395      +18     
Files with missing lines Coverage Δ
...google/gcp_service_account_identity_credentials.go 64.00% <64.00%> (ø)

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mbissa
Copy link
Contributor

mbissa commented Mar 23, 2026

/gemini review

@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

// a fetch recently failed, the cached error is returned until the backoff
// interval expires. Otherwise, it initiates a new token fetch or blocks
// waiting for an already-in-progress fetch to complete.
func (c *gcpServiceAccountIdentityCallCreds) GetRequestMetadata(ctx context.Context, _ ...string) (map[string]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gRFC states : If the returned HTTP status maps to UNAVAILABLE in HTTP to gRPC Status Code Mapping, then the data plane RPCs will be failed with status UNAVAILABLE; otherwise, they will be failed with status UNAUTHENTICATED. If the request fails without an HTTP status (e.g., an I/O error), all queued data plane RPCs will be failed with UNAVAILABLE status.

Where exactly are these status code mappings handled?


type gcpServiceAccountIdentityCallCreds struct {
audience string
ts *auth.Credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the field be named something like "creds" instead of ts?

mu sync.Mutex
token *auth.Token

fetching chan struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a trailing comment for this chan

// a fetch recently failed, the cached error is returned until the backoff
// interval expires. Otherwise, it initiates a new token fetch or blocks
// waiting for an already-in-progress fetch to complete.
func (c *gcpServiceAccountIdentityCallCreds) GetRequestMetadata(ctx context.Context, _ ...string) (map[string]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if c.isTokenValid() {
c.mu.Unlock()
return map[string]string{
"authorization": "Bearer " + c.token.Value,
Copy link
Contributor

@mbissa mbissa Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are reading the token value outside the lock - can lead to data race. The unlock should be defer?


fetching chan struct{}
nextRetryTime time.Time // When we can try next (backoff)
retryAttempt int // consecutive failures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the client know how many retry attempts are pending?

c.fetching = make(chan struct{})
c.mu.Unlock()

token, err := c.ts.TokenProvider.Token(context.Background())
Copy link
Contributor

@mbissa mbissa Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if this gets stuck or hangs indefinitely? Can we configure a timeout for this context? Else all other go routines will simply keep waiting for this to come back.

*
*/

package google
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test to check the rejection over insecure connection.

@mbissa
Copy link
Contributor

mbissa commented Mar 23, 2026

Pull master to fix the checks.

@mbissa mbissa assigned Pranjali-2501 and unassigned mbissa Mar 23, 2026
@mbissa
Copy link
Contributor

mbissa commented Mar 23, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces GcpServiceAccountIdentityCallCredentials, a new call credentials type for authenticating with GCP Service Account Identity tokens. The implementation correctly uses the cloud.google.com/go/auth/credentials/idtoken library and includes robust handling of token fetching, caching, concurrent requests, and error backoff. The accompanying tests are thorough and cover happy paths, error conditions, and concurrency scenarios.

I have a couple of suggestions to refactor parts of the GetRequestMetadata method to improve readability and reduce code duplication. These are minor improvements to an already solid implementation.

Comment on lines +110 to +127
c.mu.Lock()

if err != nil {
c.setBackoff(err)
close(c.fetching)
c.fetching = nil
c.mu.Unlock()
return nil, err
}

c.setBackoff(nil)
c.token = token
close(c.fetching)
c.fetching = nil
c.mu.Unlock()
return map[string]string{
"authorization": "Bearer " + c.token.Value,
}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling the token fetch result can be simplified to reduce code duplication and improve readability. By using defer c.mu.Unlock() and consolidating the setBackoff call, you can make the code cleaner.

The setBackoff function already handles both error and non-error cases correctly, so you can call it once with the err from the token fetch.

Suggested change
c.mu.Lock()
if err != nil {
c.setBackoff(err)
close(c.fetching)
c.fetching = nil
c.mu.Unlock()
return nil, err
}
c.setBackoff(nil)
c.token = token
close(c.fetching)
c.fetching = nil
c.mu.Unlock()
return map[string]string{
"authorization": "Bearer " + c.token.Value,
}, nil
c.mu.Lock()
defer c.mu.Unlock()
close(c.fetching)
c.fetching = nil
c.setBackoff(err)
if err != nil {
return nil, err
}
c.token = token
return map[string]string{
"authorization": "Bearer " + c.token.Value,
}, nil

Comment on lines +132 to +142
case <-wait:
return func() (map[string]string, error) {
c.mu.Lock()
defer c.mu.Unlock()
if c.isTokenValid() {
return map[string]string{
"authorization": "Bearer " + c.token.Value,
}, nil
}
return nil, c.lastErr
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of an immediately-invoked function expression here makes the code slightly harder to read. You can simplify this by moving the logic out of the anonymous function directly into the case block. This makes the control flow more straightforward.

	case <-wait:
		c.mu.Lock()
		defer c.mu.Unlock()
		if c.isTokenValid() {
			return map[string]string{
				"authorization": "Bearer " + c.token.Value,
			}, nil
		}
		return nil, c.lastErr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants