Skip to content

[Issue #9021] Add retries if Login.gov fails when we ask for token details#9023

Open
mdragon wants to merge 3 commits intomainfrom
mdragon/9021-retry-on-login-gov-token-request
Open

[Issue #9021] Add retries if Login.gov fails when we ask for token details#9023
mdragon wants to merge 3 commits intomainfrom
mdragon/9021-retry-on-login-gov-token-request

Conversation

@mdragon
Copy link
Collaborator

@mdragon mdragon commented Mar 12, 2026

Summary

Fixes #9021

Changes proposed

Introduce a retry of up to 3 times for our call to Login.gov to fetch information about the token holder

Context for reviewers

We see occasional failures, once every few weeks, where upon a callback from Login.gov our request back to Login to get information about the holder of the token fails with a unspecified 500 error. This currently results in the user not successfully signing in. This change would give that user an opportunity to still get logged in.

Validation steps

  • Unit tests pass
  • Login still works in Staging after deploy following merge to main

Comment on lines +11 to 14
self.retries: dict[str, int] = {}

def add_token_response(self, code: str, response: OauthTokenResponse) -> None:
self.responses[code] = response
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than setting the retries by using the dict directly, what if we add a param to the add token function to do it?

Suggested change
self.retries: dict[str, int] = {}
def add_token_response(self, code: str, response: OauthTokenResponse) -> None:
self.responses[code] = response
self.retries: dict[str, int] = {}
def add_token_response(self, code: str, response: OauthTokenResponse, retry_count: int = 0) -> None:
self.responses[code] = response
self.retries[code] = retry_count

Then that simplifies the logic below a bit as it'll always be set.

Comment on lines +178 to +183
else:
logger.info(
"Retrying call to Login.gov after receiving error",
extra={"tries": tries, "limit": limit},
)
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to do this - as written it would always call 3 times as the loop never breaks before 3 tries.

Suggested change
else:
logger.info(
"Retrying call to Login.gov after receiving error",
extra={"tries": tries, "limit": limit},
)
continue
else:
logger.info(
"Retrying call to Login.gov after receiving error",
extra={"tries": tries, "limit": limit},
)
break

Alternatively, if we wanted to use tenacity, seems they have a TryAgain exception we could use - although I don't think it's that big a deal.

# If this request failed, we'll assume we're the issue and 500
if response.is_error_response():
raise_flask_error(500, response.error_description)
# If this request failed, we'll assume we're the issue and 500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should update this as with the retries it isn't quite right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add retries for when Login.gov Token Request fails

2 participants