Limit/Offset Pushdown Support #82

abramsh · 2025-09-03T19:51:29Z

This adds limit/offset pushdown support by implementing the GetForeignUpperPaths callback that first checks that the rest of the query can be pushed down before asking the Python FDW if it supports limit/offset push down.

abramsh · 2025-09-03T19:52:04Z

This is just an initial implementation. It currently only checks that the sort was pushed down, and even then I'm not sure where I might have made mistakes.

mfenniak · 2025-09-04T15:31:16Z

I haven't looked at this in deep detail yet, in order to give you some quick feedback on the draft. The work quality seems great, as a starting point. 👍 One point immediately stuck out to me that I think will need a revision to your API design before picking through the code in depth.

When a qual is pushed down into a ForeignDataWrapper's execute method, there's no guarantee that the FDW actually implements that filter. It's valid and common for an FDW to pick out the subset of clauses that it is able to support and push them down to the underlying data storage query. I don't think it's enough of a complete design to say that just because the C-level multicorn can push the qual down to the Python-level driver, it would be able support a limit.

The same is true for sorting. It needs to not only be pushed down, but also the Python driver needs to be able to say "yes, I will (or am) completely implementing this in-driver".

My first thought for a proposed design would be to add quals and sortkeys parameters to your new can_limit method... but... as I'm also an FDW author I'm thinking of how I would actually implement that... and whatever logic I'm currently using in execute to analyze the quals would need to be duplicated in can_limit. An FDW would effectively have to "plan" the query that I'm going to execute twice. 🤔 I think it's worth some time coming up with a better design here but I don't have an immediate epiphanies.

abramsh · 2025-09-04T15:47:53Z

Agree that we need to make sure the design is right before we go too much farther.

Right now, "can_limit" will only be called if "can_sort" already said it could sort everything (because it appears in the input rel's pathkeys), otherwise there is no point asking the FWD about limits. "can_limit" wouldn't need to be passed the sort information again, because the FWD already said it could sort this query. There is already a test for this - please take a look and see if I made a mistake or missed an additional test case related to sorting.

We would need a new method ("can_restrict"?) that confirmed which quals could be pushed, and as with sort, we would only check limits if the quals were all pushed down. For this new API, my thought was the default would return "None" which would retain the current behavior (quals are checked by both postgres and sent to FWD). If implemented, it would return which quals could be pushed down (and sent to execute/explain) and we would tell postgres not to recheck those.

Thoughts?

mfenniak · 2025-09-04T21:02:59Z

Hm... can_restrict sounds OK... my objections are...

it adds to the Python interface complexity; but this might be unavoidable
decisions made in can_restrict, or even can_sort, about how the FDW will process the query probably need to be recomputed in execute as there's no way to persist this information between method invocations (as self is not a unique instance for each query, just each backend, IIRC); but I think at worst this is a minor code-organization problem and optimization problem and not worth getting stuck on.

... we would tell postgres not to recheck those.

eep. I get it... if you made a query with a limit of N, the FDW returned N, you wouldn't want PG to filter out a record. But this puts a burden for preciseness on the FDW author which is especially tricky in cases like NULL comparison operators, where PG's rules are... illogical for a human to code. 🤣

An FDW which supports limit/offset pushdown is going to be much harder to code correctly. As long as this doesn't compromise usability in the simpler cases and it ends up being clearly documented, I don't see a lot of good alternatives.

So, let me share one thought as an alternative, and you can tell me how far this is from the use-case you need. I keep dreaming of a world where we just leverage execute returning a generator to optimistically get right-sized pages of data. Perhaps execute gets an offset and limit parameter.

def execute(self, ..., limit, offset):
    while True:
        page = self.get_page(...)
        pg_opinion = (yield page)
        # pg_opinion is an object w/ advisory information on how many records were returned,
        # how many records met the limit needs, and how many more records are needed

This is a really rough thought, to be fair. But the core question is, could we make something where execute can operate "loosey goosey", PostgreSQL can operate strictly, and the two collaborate to minimize the number of FDW interactions but never reach an exactly optimal number?

For clarity: I'm happy to continue on the path you're currently on if that's where you feel you need to go, but wanted to propose something less-perfect but much-easier to see if there's value in the idea.

abramsh · 2025-09-04T22:31:48Z

I agree with you - writing an FDW that supports limit/offset will be much harder than one that does not, and even then there will be a large list of queries where limit/offset cannot be pushed down.

I might be missing your point, but I don't think we need to change anything in execute. If can_restrict and can_sort says it can push everything down, we're assuming execute will push everything down and we would only send execute what it said it can push down. Having a query context passed around to save state might be a nice to have, but I don't know if that is required for it to work.

Passing down limit/offset ("page size hint"?) could be a good short-term win/compromise if we can make it work. I worry about things like select col1, sum(col2) group by col1 limit 1 and the FDW getting a hint of a "1" page size causing it to make 1000s of requests instead of 1. Do you have additional ideas that might make the hint better?

Also, it's interesting that the test run failed on github because of the wrong row width. I saw some comments in the code about why width is stored, and I make use of that in my changes (which is why width is 20 and not 12 in the foreign scans), not sure why PG is using 12 for the local operations and why I don't see this when I run my tests. (FWIW, I've been unable to get the PG18 tests to even run for me... PG17/16 seem to work just fine)

mfenniak · 2025-09-05T02:48:57Z

I might be missing your point, but I don't think we need to change anything in execute. If can_restrict and can_sort says it can push everything down, we're assuming execute will push everything down and we would only send execute what it said it can push down. Having a query context passed around to save state might be a nice to have, but I don't know if that is required for it to work.

Apologies for any confusion -- my thoughts on execute are a completely different implementation concept that would eliminate the need for can_restrict.

Passing down limit/offset ("page size hint"?) could be a good short-term win/compromise if we can make it work. I worry about things like select col1, sum(col2) group by col1 limit 1 and the FDW getting a hint of a "1" page size causing it to make 1000s of requests instead of 1. Do you have additional ideas that might make the hint better?

Yeah... that would be unfortunate. The FDW could hypothetically expand its page size if it kept failing to provide the needed data... 🤔 It's not super clear to me how bad this situation is though, because this would happen in the situation where the FDW couldn't pushdown its quals. So, the proposed design with can_restrict would also have some suboptimal behavior. Which would be better, or worse, would all depend on the batch size and the backend's capabilities to handle multiple queries, and to cursor through them in pages (which might be a big no-no for some backends)... hm.

Here's what I think: your proposed design with can_restrict is going to be difficult for an FDW author to implement, but, it will operate in a way that can be easily understood. In this mode, the FDW has more responsibilities, but there's no weird magic. As long as the documentation is clear about those responsibilities, I think that's promising and I'd 👍 to moving forward with that approach.

abramsh · 2025-09-05T13:22:17Z

ok, let's continue step by step and see how it goes. There is still a lot to do even before can_restrict, let me get that working and see how things look.

abramsh · 2025-09-05T20:06:44Z

Ok, the PR has been updated to be "complete" in the sense it should always do the right thing. I'd be interested in more test cases you might have to confirm that.

I put "complete" in quotes because instead adding "can_restrict", I've updated the code so it won't push down limit/offset if there are any quals - pushed down or not.

My proposal, assuming we don't find any other corner cases, is that this would be enough to include in a release. It's "limit/offset support", but only in limited cases... which will always be true, it's just about how "limited". We can make it less limited over time.

Thoughts?
(I probably need to update the docs to make the limits more clear, but wanted to wait for your thoughts)

mfenniak

I like the approach of just not applying limit when there are any where clauses, and adding that later (if needed).

The test cases here look pretty comprehensive to me. It stretches the limit of my imagination to come up with anything different enough that it should matter?

I think this looks good -- I've reviewed the C code and haven't spotted anything of concern. If you've got some doc polishing in mind, then have at it and we'll get this to the finish line.

@luss This is probably the most significant enhancement in a while and might benefit from another pass through the C side, especially with the PG interactions. It's an OK area for me, but not an expert area.

abramsh · 2025-09-08T16:31:03Z

Awesome.

I pushed the doc changes last night, so I have nothing more to add at this point unless the docs are not clear, someone comes up with an additional test case, or someone finds a problem in the C code (it's been 25 years since I actively coded in C :) )

mfenniak

Looks like there's a test variation in PG14 -- typically you can address this by making a copy of the .out test results and having multicorn_test_limit_1.out (assuming that the difference, which seems to be in cost estimation calculations, is stable).

Other than that, my review is good-to-go. Let me know if you push a commit for a test fix and I'll approve the rerun, and then we'll merge it. (If we get a response from @luss related to the previous comment, we can always tweak/adjust or even revert if there are major risks)

abramsh · 2025-09-09T19:50:58Z

I pushed the test fix; please rerun.

abramsh · 2025-09-09T20:10:57Z

Oops - I didn't have the fix committed when I pushed. Should be ok now.

mfenniak · 2025-09-09T20:31:10Z

Looks like there is also a variation in pg18 popping up now.

abramsh · 2025-09-09T21:30:46Z

I was having problems getting nix to build PG18; so I installed it directly to reproduce this last problem. should be ok now.

mfenniak · 2025-09-09T21:39:53Z

@abramsh Great, thanks for your patience on the testing cycle. Merged! 🎉

Initial implementation

b5b5814

abramsh mentioned this pull request Sep 3, 2025

PR for Pushdown of LIMIT and OFFSET #81

Open

Fix build error on PG18

6cca844

Add more tests and prevent push down for where, group by, over, etc.

2f27ecd

Clean up code and docs.

a7490ab

mfenniak reviewed Sep 8, 2025

View reviewed changes

abramsh marked this pull request as ready for review September 9, 2025 14:53

mfenniak approved these changes Sep 9, 2025

View reviewed changes

Different expected costs for PG14

6ef85b9

Fix typo

a48b9e6

Fix expected widths/explain for PG18

67b56e8

mfenniak merged commit c47647b into pgsql-io:main Sep 9, 2025
1 check passed

abramsh deleted the limit-pushdown branch September 9, 2025 21:41

Limit/Offset Pushdown Support #82

Limit/Offset Pushdown Support #82

Uh oh!

Conversation

abramsh commented Sep 3, 2025

Uh oh!

abramsh commented Sep 3, 2025

Uh oh!

mfenniak commented Sep 4, 2025

Uh oh!

abramsh commented Sep 4, 2025

Uh oh!

mfenniak commented Sep 4, 2025

Uh oh!

abramsh commented Sep 4, 2025

Uh oh!

mfenniak commented Sep 5, 2025

Uh oh!

abramsh commented Sep 5, 2025

Uh oh!

abramsh commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfenniak left a comment

Choose a reason for hiding this comment

Uh oh!

abramsh commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfenniak left a comment

Choose a reason for hiding this comment

Uh oh!

abramsh commented Sep 9, 2025

Uh oh!

abramsh commented Sep 9, 2025

Uh oh!

mfenniak commented Sep 9, 2025

Uh oh!

abramsh commented Sep 9, 2025

Uh oh!

Uh oh!

mfenniak commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abramsh commented Sep 5, 2025 •

edited

Loading

abramsh commented Sep 8, 2025 •

edited

Loading