Smaller local/peers queries #528

mykaul · 2025-08-31T15:30:26Z

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

…specific columns Specifically, supported_features took quite some space. Fetching only what is useful for the client. In follow-up PRs, will remove some (now) dead code that may have relied on some of those columns. Refs: scylladb/scylla-drivers#11 Signed-off-by: Yaniv Kaul <[email protected]>

Copilot

Pull Request Overview

This PR optimizes database queries by replacing wildcard selects with explicit column selections for system tables. The changes reduce data transfer by only selecting the columns that are actually needed by the application.

Replace SELECT * with explicit column lists for system.peers and system.local queries
Maintain consistency with existing no-tokens query patterns that already use explicit column selection

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

cassandra/cluster.py

mykaul · 2025-09-01T18:31:25Z

@scylladb/python-driver-maint - if that's an acceptable path, I'll now proceed to remove code that assumed there might be more fields returning from those queries (especially DSE related stuff).

mykaul · 2025-09-03T13:41:35Z

@scylladb/python-driver-maint - if that's an acceptable path, I'll now proceed to remove code that assumed there might be more fields returning from those queries (especially DSE related stuff).

Ah - I cannot remove those additional fields, as long as we do _SELECT_PEERS_V2 = "SELECT * FROM system.peers_v2" - should I fix this too?
If we don't need some unused items, such as dse_version, dse_workload, dse_workloads and others, more code can be dropped.

If we do keep this, I believe this PR is ready as is.

dkropachev · 2025-09-03T15:11:14Z

I am thinking that to make it future proof it would be better to have a logic that reads schema of the table and returns rows that exist there from which we can combine query that will contain all the columns we want.
It will fit perfectly into initiative to prepare all the statements that driver runs of it self.

@Lorak-mmk , WDYT ?

mykaul · 2025-09-03T16:00:06Z

I am thinking that to make it future proof it would be better to have a logic that reads schema of the table and returns rows that exist there from which we can combine query that will contain all the columns we want. It will fit perfectly into initiative to prepare all the statements that driver runs of it self.

@Lorak-mmk , WDYT ?

How many times those tables change? It's not a bad idea to get into CI the current schema and every version compare to what we stored, but I would not add it to the driver. If we add something to core, for the driver sake, you'd assume there will be an issue for the driver side too.

In any case, orthogonal to this PR.

dkropachev · 2025-09-04T00:08:25Z

I am thinking that to make it future proof it would be better to have a logic that reads schema of the table and returns rows that exist there from which we can combine query that will contain all the columns we want. It will fit perfectly into initiative to prepare all the statements that driver runs of it self.
@Lorak-mmk , WDYT ?

How many times those tables change? It's not a bad idea to get into CI the current schema and every version compare to what we stored, but I would not add it to the driver. If we add something to core, for the driver sake, you'd assume there will be an issue for the driver side too.

In any case, orthogonal to this PR.

I don't think it is orthogonal, select * there solves exactly this problem, it allows one query handle all the cases.
If we make it specific we will see some cases when it is broken that we have not considered, old scylla, cassandra.

mykaul · 2025-09-04T05:42:22Z

I am thinking that to make it future proof it would be better to have a logic that reads schema of the table and returns rows that exist there from which we can combine query that will contain all the columns we want. It will fit perfectly into initiative to prepare all the statements that driver runs of it self.
@Lorak-mmk , WDYT ?

How many times those tables change? It's not a bad idea to get into CI the current schema and every version compare to what we stored, but I would not add it to the driver. If we add something to core, for the driver sake, you'd assume there will be an issue for the driver side too.
In any case, orthogonal to this PR.

I don't think it is orthogonal, select * there solves exactly this problem, it allows one query handle all the cases. If we make it specific we will see some cases when it is broken that we have not considered, old scylla, cassandra.

OK, I don't know how to move forward here. It wasn't broken for years, the were minimal, if any, changes, none of them were critical to this part of the code (dse_workload? dse_workloads? What else was material that was added?). I thought such a benign change could get in easily.
I've looked at both the source code and a network traffic capture to see that I have all relevant items fetched, so I don't think we are missing anything. If we do miss it, it's easy to add it.

Let me know what the next step should be, or I'll just close the PR - both options are fine by me.

Lorak-mmk · 2025-09-08T11:46:18Z

Imo we should ditch the DSE-related stuff, and SELECT * - let's select only what we need.
Reading schema and then building the query dynamically is possible, but is it really necessary? I don't think the driver needs anything weird that would require such approach.

mykaul · 2025-09-08T11:49:47Z

Imo we should ditch the DSE-related stuff, and SELECT * - let's select only what we need. Reading schema and then building the query dynamically is possible, but is it really necessary? I don't think the driver needs anything weird that would require such approach.

This is the path I was going to pursue - remove some more dse stuff that now clearly is not fetched from system.local or system.peers.
Even if the driver uses system.peers_v2 - we don't really care about those fields (dse_workload, dse_workloads, as an example).

And removing the query to system.peers_v2 is also an option, but probably not in this PR.

Lorak-mmk · 2025-09-08T13:18:48Z

And removing the query to system.peers_v2 is also an option, but probably not in this PR.

Is Scylla never going to support it?

mykaul · 2025-09-08T13:23:42Z

And removing the query to system.peers_v2 is also an option, but probably not in this PR.

Is Scylla never going to support it?

Never say never. But even in that case, we wouldn't like to:

Perform 'SELECT *'
Try to pick up whatever we don't care about, such as 'dse_workload'

But again, this is not related to this PR. I'm not going to remove what peers_v2 might have, that is material (for example, some rpc port numbers, etc.)

mykaul · 2025-09-15T09:25:27Z

@scylladb/python-driver-maint - do let me know how to proceed here (or drop the patch - either works). I have minor follow-ups to remove some useless .get() calls which do not fetch anything, but this is really minor.
Alternatively, we can remove peers_v2 and then there's even more code simplifications that can be done all over.

dkropachev · 2025-09-15T11:51:36Z

@scylladb/python-driver-maint - do let me know how to proceed here (or drop the patch - either works). I have minor follow-ups to remove some useless .get() calls which do not fetch anything, but this is really minor. Alternatively, we can remove peers_v2 and then there's even more code simplifications that can be done all over.

History wise it would be better to drop DSE part of the code and then merge this one.

mykaul · 2025-09-15T11:55:46Z

@scylladb/python-driver-maint - do let me know how to proceed here (or drop the patch - either works). I have minor follow-ups to remove some useless .get() calls which do not fetch anything, but this is really minor. Alternatively, we can remove peers_v2 and then there's even more code simplifications that can be done all over.

History wise it would be better to drop DSE part of the code and then merge this one.

Should I then start with dropping PEERS_V2 stuff, then on top of it this item?

Lorak-mmk · 2025-09-15T12:01:29Z

Can we avoid dropping PEERS_V2? It is not DSE-specific.

dkropachev · 2025-09-15T12:11:45Z

TBH, 100% proper way would be to evacuate all these queries to some class with API get all the peers.
Init that class when control connection is created depending on whether it is scylla or not use one or another.

mykaul · 2025-09-15T12:15:14Z

TBH, 100% proper way would be to evacuate all these queries to some class with API get all the peers. Init that class when control connection is created depending on whether it is scylla or not use one or another.

Is it worth the effort, vs, this small contained patch? It is modeled after previous changes done (not fetch the tokens) nearby, and it's a very simple localized change.

dkropachev · 2025-09-15T13:41:45Z

TBH, 100% proper way would be to evacuate all these queries to some class with API get all the peers. Init that class when control connection is created depending on whether it is scylla or not use one or another.

Is it worth the effort, vs, this small contained patch? It is modeled after previous changes done (not fetch the tokens) nearby, and it's a very simple localized change.

Ok, I have checked all the execution paths where it is used, this PR is good.
Let's do DSE part separately, sync it is targeting only scylla tables we good to go.

mykaul requested a review from Copilot August 31, 2025 15:30

mykaul marked this pull request as draft August 31, 2025 15:30

This comment was marked as outdated.

Sign in to view

mykaul force-pushed the smaller_local_query branch from 17c27f3 to dfaaf5e Compare September 1, 2025 14:22

mykaul force-pushed the smaller_local_query branch from dfaaf5e to 92de3f3 Compare September 1, 2025 15:10

mykaul changed the title ~~Smaller local query~~ Smaller local/peers queries Sep 1, 2025

mykaul requested a review from Copilot September 1, 2025 18:30

Copilot AI reviewed Sep 1, 2025

View reviewed changes

cassandra/cluster.py Show resolved Hide resolved

cassandra/cluster.py Show resolved Hide resolved

dkropachev marked this pull request as ready for review September 15, 2025 13:41

dkropachev self-requested a review September 15, 2025 13:41

dkropachev approved these changes Sep 15, 2025

View reviewed changes

Lorak-mmk approved these changes Sep 15, 2025

View reviewed changes

dkropachev merged commit dd00221 into scylladb:master Sep 15, 2025
19 checks passed

mykaul mentioned this pull request Jan 2, 2026

Pull version information from systel.local, when version info is not present #643

Merged

8 tasks

Smaller local/peers queries #528

Smaller local/peers queries #528

Uh oh!

Conversation

mykaul commented Aug 31, 2025

Pre-review checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

mykaul commented Sep 1, 2025

Uh oh!

mykaul commented Sep 3, 2025

Uh oh!

dkropachev commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mykaul commented Sep 3, 2025

Uh oh!

dkropachev commented Sep 4, 2025

Uh oh!

mykaul commented Sep 4, 2025

Uh oh!

Lorak-mmk commented Sep 8, 2025

Uh oh!

mykaul commented Sep 8, 2025

Uh oh!

Lorak-mmk commented Sep 8, 2025

Uh oh!

mykaul commented Sep 8, 2025

Uh oh!

mykaul commented Sep 15, 2025

Uh oh!

dkropachev commented Sep 15, 2025

Uh oh!

mykaul commented Sep 15, 2025

Uh oh!

Lorak-mmk commented Sep 15, 2025

Uh oh!

dkropachev commented Sep 15, 2025

Uh oh!

mykaul commented Sep 15, 2025

Uh oh!

dkropachev commented Sep 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dkropachev commented Sep 3, 2025 •

edited

Loading