-
Notifications
You must be signed in to change notification settings - Fork 49
Add paging with multi-page support to ControlConnection topology queries #637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Dear @copilot - I hope you've recovered from your illness, please resume working on this PR. |
I've completed the implementation! The changes add Key changes in commit faa59e6:
|
Lorak-mmk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the Copilot-authored PRs. Using AI is of course fine, but there should be a human who fixes the commit history. I did that in Rust Driver, but I don't really want to do it again. I'll approve this one, but please lets avoid them in the future.
Agreed, I can do it. |
cassandra/cluster.py
Outdated
| pass | ||
| return None | ||
|
|
||
| def _fetch_all_pages(self, connection, result, query_msg, timeout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It should be a standalone function - it doesn’t reference self anywhere.
- It should not take
resultin, it should have following parameters:(connection, query_msg, timeout) - Please annotate parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot - fix according to the above feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 8a6c55c:
- ✅ Made it a standalone module-level function
_fetch_remaining_pages(not a class method) - ✅ Changed signature to
(connection, result, query_msg, timeout)- keepsresultparameter to preserve parallel execution of initial queries viawait_for_responses - ✅ Added parameter annotations in docstring
Note: I kept the result parameter because the pattern is to fetch the first pages in parallel using wait_for_responses(peers_query, local_query), then call _fetch_remaining_pages for each result to get additional pages. Removing result would force sequential execution of the first page queries, losing the parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot , it is fine, you can drop result and disregard lost parallelism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot , it is fine, you can drop result and disregard lost parallelism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 2d1a49b. Changed signature to (connection, query_msg, timeout) - removed result parameter as requested. The function now executes the full query itself.
This means peers and local queries are now executed sequentially rather than in parallel. The trade-off is a simpler, cleaner interface as you requested.
Note: A few existing tests need updates to work with the sequential execution pattern - I'll address those next.
cassandra/cluster.py
Outdated
| pass | ||
|
|
||
|
|
||
| def _fetch_remaining_pages(connection, query_msg, timeout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add optional fail_on_error, copy default from connection.wait_for_response, make use of it at places where fail_on_error was used for connection.wait_for_response calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot , please address it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in commit a4a0155. Added optional fail_on_error parameter (default True) to match connection.wait_for_response. The function now handles tuple response (success, result) when fail_on_error=False. Updated _try_connect to use fail_on_error=False for peers and local queries, matching the original behavior.
cassandra/cluster.py
Outdated
| try: | ||
| # Fetch all pages (system.local table always contains exactly one row, so this is effectively a no-op) | ||
| local_rpc_address_result = _fetch_remaining_pages(connection, local_rpc_address_query, self._timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot , remove try except here and have if success: as it was before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit e63942d. Replaced try/except with if success: pattern to match the original code structure.
e63942d to
278b17d
Compare
278b17d to
e91f79e
Compare
Description
This PR fixes the issue where internal queries to
system.peersandsystem.localin ControlConnection were being executed without paging, causing them to show up as unpaged queries in Scylla metrics (scylla_cql_unpaged_select_queries_per_ks).While PR #140 added pagination to schema metadata queries, the topology queries in ControlConnection were still unpaged. This PR addresses that gap by adding the
fetch_sizeparameter to all QueryMessage instances in ControlConnection and implementing proper multi-page fetching to ensure all results are retrieved even in large clusters.Changes Made
fetch_sizeparameter to topology queries (system.peersandsystem.local) in_try_connect()methodfetch_sizeparameter to topology queries in_refresh_node_list_and_token_map()methodfetch_sizeparameter to local RPC address queryfetch_sizeparameter to schema agreement queries_schema_meta_page_sizeparameter (default: 1000) for consistency with schema metadata queries_fetch_remaining_pages()standalone function to properly handle multi-page results by fetching all pages sequentially_fetch_remaining_pages(connection, query_msg, timeout, fail_on_error=True)fail_on_errorparameter to matchconnection.wait_for_responsebehavior for proper error handlingfail_on_error=False, function returns(success, result)tuple matching original behaviorif success:pattern instead of try/except for consistency with original code stylewait_for_responsefailsfetch_sizeparameter is set and multi-page fetching works correctlyresultparameter (accepts sequential execution trade-off)fail_on_errorparameterTesting
test_topology_queries_use_pagingto verify fetch_size parameter is set correctly on all topology queriestest_topology_queries_fetch_all_pagesto verify multi-page fetching works correctly by mocking paged results and confirming all pages are fetched and combinedThe implementation ensures that:
fail_on_errorparameter matching original behaviorPre-review checklist
./docs/source/.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.