Add timeouts for spanner queries#1815
Add timeouts for spanner queries#1815n-h-diaz wants to merge 3 commits intodatacommonsorg:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where Spanner queries could run for extended periods, consuming excessive resources even after the initiating API requests had timed out. By introducing explicit timeouts for all Spanner operations, the system can now proactively cancel long-running queries, improving resource efficiency and overall stability. The changes also enhance observability by logging details of timed-out queries, aiding in performance debugging. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively adds timeouts to Spanner queries, which is a crucial improvement for resource management and system stability. The approach of propagating API deadlines to Spanner queries is well-implemented, and the fallback timeout is a good safety measure. The added logging for timed-out queries, including the SQL statement, will be very valuable for performance monitoring and debugging. I have one suggestion to refactor the duplicated timeout-checking logic into a helper function to improve code clarity and maintainability, with a caveat to consider the deprecation status of the Spanner integration.
Currently we were using the default Spanner timeout which is 1hr, so even when our API timed out, we'd still keep trying to process the queries which uses excess resources. (This contributed to the recent Spanner outage.) Instead we'd like to cancel pending requests and free up resources.
This PR adds custom timeouts, including
mixer/esp/endpoints.yaml.tmpl
Lines 37 to 43 in 9e4bb86
Also adds some logging for queries that timeout to help us track inefficient queries for performance improvements