Skip to content

Conversation

@mykaul
Copy link

@mykaul mykaul commented Aug 31, 2025

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@mykaul mykaul requested a review from Copilot August 31, 2025 15:30
@mykaul mykaul marked this pull request as draft August 31, 2025 15:30

This comment was marked as outdated.

@mykaul mykaul force-pushed the smaller_local_query branch from 17c27f3 to dfaaf5e Compare September 1, 2025 14:22
…specific columns

Specifically, supported_features took quite some space.
Fetching only what is useful for the client.

In follow-up PRs, will remove some (now) dead code that may have relied on some of those columns.

Refs: scylladb/scylla-drivers#11

Signed-off-by: Yaniv Kaul <[email protected]>
@mykaul mykaul force-pushed the smaller_local_query branch from dfaaf5e to 92de3f3 Compare September 1, 2025 15:10
@mykaul mykaul changed the title Smaller local query Smaller local/peers queries Sep 1, 2025
@mykaul mykaul requested a review from Copilot September 1, 2025 18:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes database queries by replacing wildcard selects with explicit column selections for system tables. The changes reduce data transfer by only selecting the columns that are actually needed by the application.

  • Replace SELECT * with explicit column lists for system.peers and system.local queries
  • Maintain consistency with existing no-tokens query patterns that already use explicit column selection

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@mykaul
Copy link
Author

mykaul commented Sep 1, 2025

@scylladb/python-driver-maint - if that's an acceptable path, I'll now proceed to remove code that assumed there might be more fields returning from those queries (especially DSE related stuff).

@mykaul
Copy link
Author

mykaul commented Sep 3, 2025

@scylladb/python-driver-maint - if that's an acceptable path, I'll now proceed to remove code that assumed there might be more fields returning from those queries (especially DSE related stuff).

Ah - I cannot remove those additional fields, as long as we do _SELECT_PEERS_V2 = "SELECT * FROM system.peers_v2" - should I fix this too?
If we don't need some unused items, such as dse_version, dse_workload, dse_workloads and others, more code can be dropped.

If we do keep this, I believe this PR is ready as is.

@dkropachev
Copy link
Collaborator

dkropachev commented Sep 3, 2025

I am thinking that to make it future proof it would be better to have a logic that reads schema of the table and returns rows that exist there from which we can combine query that will contain all the columns we want.
It will fit perfectly into initiative to prepare all the statements that driver runs of it self.

@Lorak-mmk , WDYT ?

@mykaul
Copy link
Author

mykaul commented Sep 3, 2025

I am thinking that to make it future proof it would be better to have a logic that reads schema of the table and returns rows that exist there from which we can combine query that will contain all the columns we want. It will fit perfectly into initiative to prepare all the statements that driver runs of it self.

@Lorak-mmk , WDYT ?

How many times those tables change? It's not a bad idea to get into CI the current schema and every version compare to what we stored, but I would not add it to the driver. If we add something to core, for the driver sake, you'd assume there will be an issue for the driver side too.

In any case, orthogonal to this PR.

@dkropachev
Copy link
Collaborator

I am thinking that to make it future proof it would be better to have a logic that reads schema of the table and returns rows that exist there from which we can combine query that will contain all the columns we want. It will fit perfectly into initiative to prepare all the statements that driver runs of it self.
@Lorak-mmk , WDYT ?

How many times those tables change? It's not a bad idea to get into CI the current schema and every version compare to what we stored, but I would not add it to the driver. If we add something to core, for the driver sake, you'd assume there will be an issue for the driver side too.

In any case, orthogonal to this PR.

I don't think it is orthogonal, select * there solves exactly this problem, it allows one query handle all the cases.
If we make it specific we will see some cases when it is broken that we have not considered, old scylla, cassandra.

@mykaul
Copy link
Author

mykaul commented Sep 4, 2025

I am thinking that to make it future proof it would be better to have a logic that reads schema of the table and returns rows that exist there from which we can combine query that will contain all the columns we want. It will fit perfectly into initiative to prepare all the statements that driver runs of it self.
@Lorak-mmk , WDYT ?

How many times those tables change? It's not a bad idea to get into CI the current schema and every version compare to what we stored, but I would not add it to the driver. If we add something to core, for the driver sake, you'd assume there will be an issue for the driver side too.
In any case, orthogonal to this PR.

I don't think it is orthogonal, select * there solves exactly this problem, it allows one query handle all the cases. If we make it specific we will see some cases when it is broken that we have not considered, old scylla, cassandra.

OK, I don't know how to move forward here. It wasn't broken for years, the were minimal, if any, changes, none of them were critical to this part of the code (dse_workload? dse_workloads? What else was material that was added?). I thought such a benign change could get in easily.
I've looked at both the source code and a network traffic capture to see that I have all relevant items fetched, so I don't think we are missing anything. If we do miss it, it's easy to add it.

Let me know what the next step should be, or I'll just close the PR - both options are fine by me.

@Lorak-mmk
Copy link

Imo we should ditch the DSE-related stuff, and SELECT * - let's select only what we need.
Reading schema and then building the query dynamically is possible, but is it really necessary? I don't think the driver needs anything weird that would require such approach.

@mykaul
Copy link
Author

mykaul commented Sep 8, 2025

Imo we should ditch the DSE-related stuff, and SELECT * - let's select only what we need. Reading schema and then building the query dynamically is possible, but is it really necessary? I don't think the driver needs anything weird that would require such approach.

This is the path I was going to pursue - remove some more dse stuff that now clearly is not fetched from system.local or system.peers.
Even if the driver uses system.peers_v2 - we don't really care about those fields (dse_workload, dse_workloads, as an example).

And removing the query to system.peers_v2 is also an option, but probably not in this PR.

@Lorak-mmk
Copy link

And removing the query to system.peers_v2 is also an option, but probably not in this PR.

Is Scylla never going to support it?

@mykaul
Copy link
Author

mykaul commented Sep 8, 2025

And removing the query to system.peers_v2 is also an option, but probably not in this PR.

Is Scylla never going to support it?

Never say never. But even in that case, we wouldn't like to:

  1. Perform 'SELECT *'
  2. Try to pick up whatever we don't care about, such as 'dse_workload'

But again, this is not related to this PR. I'm not going to remove what peers_v2 might have, that is material (for example, some rpc port numbers, etc.)

@mykaul
Copy link
Author

mykaul commented Sep 15, 2025

@scylladb/python-driver-maint - do let me know how to proceed here (or drop the patch - either works). I have minor follow-ups to remove some useless .get() calls which do not fetch anything, but this is really minor.
Alternatively, we can remove peers_v2 and then there's even more code simplifications that can be done all over.

@dkropachev
Copy link
Collaborator

@scylladb/python-driver-maint - do let me know how to proceed here (or drop the patch - either works). I have minor follow-ups to remove some useless .get() calls which do not fetch anything, but this is really minor. Alternatively, we can remove peers_v2 and then there's even more code simplifications that can be done all over.

History wise it would be better to drop DSE part of the code and then merge this one.

@mykaul
Copy link
Author

mykaul commented Sep 15, 2025

@scylladb/python-driver-maint - do let me know how to proceed here (or drop the patch - either works). I have minor follow-ups to remove some useless .get() calls which do not fetch anything, but this is really minor. Alternatively, we can remove peers_v2 and then there's even more code simplifications that can be done all over.

History wise it would be better to drop DSE part of the code and then merge this one.

Should I then start with dropping PEERS_V2 stuff, then on top of it this item?

@Lorak-mmk
Copy link

Can we avoid dropping PEERS_V2? It is not DSE-specific.

@dkropachev
Copy link
Collaborator

TBH, 100% proper way would be to evacuate all these queries to some class with API get all the peers.
Init that class when control connection is created depending on whether it is scylla or not use one or another.

@mykaul
Copy link
Author

mykaul commented Sep 15, 2025

TBH, 100% proper way would be to evacuate all these queries to some class with API get all the peers. Init that class when control connection is created depending on whether it is scylla or not use one or another.

Is it worth the effort, vs, this small contained patch? It is modeled after previous changes done (not fetch the tokens) nearby, and it's a very simple localized change.

@dkropachev
Copy link
Collaborator

TBH, 100% proper way would be to evacuate all these queries to some class with API get all the peers. Init that class when control connection is created depending on whether it is scylla or not use one or another.

Is it worth the effort, vs, this small contained patch? It is modeled after previous changes done (not fetch the tokens) nearby, and it's a very simple localized change.

Ok, I have checked all the execution paths where it is used, this PR is good.
Let's do DSE part separately, sync it is targeting only scylla tables we good to go.

@dkropachev dkropachev marked this pull request as ready for review September 15, 2025 13:41
@dkropachev dkropachev self-requested a review September 15, 2025 13:41
@dkropachev dkropachev merged commit dd00221 into scylladb:master Sep 15, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants