-
Notifications
You must be signed in to change notification settings - Fork 22
Limit/Offset Pushdown Support #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is just an initial implementation. It currently only checks that the sort was pushed down, and even then I'm not sure where I might have made mistakes. |
|
I haven't looked at this in deep detail yet, in order to give you some quick feedback on the draft. The work quality seems great, as a starting point. 👍 One point immediately stuck out to me that I think will need a revision to your API design before picking through the code in depth. When a qual is pushed down into a The same is true for sorting. It needs to not only be pushed down, but also the Python driver needs to be able to say "yes, I will (or am) completely implementing this in-driver". My first thought for a proposed design would be to add |
|
Agree that we need to make sure the design is right before we go too much farther. Right now, "can_limit" will only be called if "can_sort" already said it could sort everything (because it appears in the input rel's pathkeys), otherwise there is no point asking the FWD about limits. "can_limit" wouldn't need to be passed the sort information again, because the FWD already said it could sort this query. There is already a test for this - please take a look and see if I made a mistake or missed an additional test case related to sorting. We would need a new method ("can_restrict"?) that confirmed which quals could be pushed, and as with sort, we would only check limits if the quals were all pushed down. For this new API, my thought was the default would return "None" which would retain the current behavior (quals are checked by both postgres and sent to FWD). If implemented, it would return which quals could be pushed down (and sent to execute/explain) and we would tell postgres not to recheck those. Thoughts? |
|
Hm...
eep. I get it... if you made a query with a limit of N, the FDW returned N, you wouldn't want PG to filter out a record. But this puts a burden for preciseness on the FDW author which is especially tricky in cases like NULL comparison operators, where PG's rules are... illogical for a human to code. 🤣 An FDW which supports limit/offset pushdown is going to be much harder to code correctly. As long as this doesn't compromise usability in the simpler cases and it ends up being clearly documented, I don't see a lot of good alternatives. So, let me share one thought as an alternative, and you can tell me how far this is from the use-case you need. I keep dreaming of a world where we just leverage def execute(self, ..., limit, offset):
while True:
page = self.get_page(...)
pg_opinion = (yield page)
# pg_opinion is an object w/ advisory information on how many records were returned,
# how many records met the limit needs, and how many more records are neededThis is a really rough thought, to be fair. But the core question is, could we make something where For clarity: I'm happy to continue on the path you're currently on if that's where you feel you need to go, but wanted to propose something less-perfect but much-easier to see if there's value in the idea. |
|
I agree with you - writing an FDW that supports limit/offset will be much harder than one that does not, and even then there will be a large list of queries where limit/offset cannot be pushed down. I might be missing your point, but I don't think we need to change anything in Passing down limit/offset ("page size hint"?) could be a good short-term win/compromise if we can make it work. I worry about things like Also, it's interesting that the test run failed on github because of the wrong row width. I saw some comments in the code about why width is stored, and I make use of that in my changes (which is why width is 20 and not 12 in the foreign scans), not sure why PG is using 12 for the local operations and why I don't see this when I run my tests. (FWIW, I've been unable to get the PG18 tests to even run for me... PG17/16 seem to work just fine) |
Apologies for any confusion -- my thoughts on
Yeah... that would be unfortunate. The FDW could hypothetically expand its page size if it kept failing to provide the needed data... 🤔 It's not super clear to me how bad this situation is though, because this would happen in the situation where the FDW couldn't pushdown its quals. So, the proposed design with Here's what I think: your proposed design with |
|
ok, let's continue step by step and see how it goes. There is still a lot to do even before |
|
Ok, the PR has been updated to be "complete" in the sense it should always do the right thing. I'd be interested in more test cases you might have to confirm that. I put "complete" in quotes because instead adding "can_restrict", I've updated the code so it won't push down limit/offset if there are any quals - pushed down or not. My proposal, assuming we don't find any other corner cases, is that this would be enough to include in a release. It's "limit/offset support", but only in limited cases... which will always be true, it's just about how "limited". We can make it less limited over time. Thoughts? |
mfenniak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach of just not applying limit when there are any where clauses, and adding that later (if needed).
The test cases here look pretty comprehensive to me. It stretches the limit of my imagination to come up with anything different enough that it should matter?
I think this looks good -- I've reviewed the C code and haven't spotted anything of concern. If you've got some doc polishing in mind, then have at it and we'll get this to the finish line.
@luss This is probably the most significant enhancement in a while and might benefit from another pass through the C side, especially with the PG interactions. It's an OK area for me, but not an expert area.
|
Awesome. I pushed the doc changes last night, so I have nothing more to add at this point unless the docs are not clear, someone comes up with an additional test case, or someone finds a problem in the C code (it's been 25 years since I actively coded in C :) ) |
mfenniak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's a test variation in PG14 -- typically you can address this by making a copy of the .out test results and having multicorn_test_limit_1.out (assuming that the difference, which seems to be in cost estimation calculations, is stable).
Other than that, my review is good-to-go. Let me know if you push a commit for a test fix and I'll approve the rerun, and then we'll merge it. (If we get a response from @luss related to the previous comment, we can always tweak/adjust or even revert if there are major risks)
|
I pushed the test fix; please rerun. |
|
Oops - I didn't have the fix committed when I pushed. Should be ok now. |
|
Looks like there is also a variation in pg18 popping up now. |
|
I was having problems getting nix to build PG18; so I installed it directly to reproduce this last problem. should be ok now. |
|
@abramsh Great, thanks for your patience on the testing cycle. Merged! 🎉 |
This adds limit/offset pushdown support by implementing the GetForeignUpperPaths callback that first checks that the rest of the query can be pushed down before asking the Python FDW if it supports limit/offset push down.