-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
What type of enhancement is this?
Refactor, User experience, Performance
What subsystems and features will be improved?
Continuous aggregate, Query planner
What does the enhancement do?
Proposed change is to accompany #8967: rewrite queries with Caggs
If we can rewrite eligible queries with Caggs, we have to guarantee that query output will be the same with or without Cagg rewrites, as we are only changing the query planner.
To provide this guarantee we should be able to combine data materialized from Caggs with data which has not been materialized yet, that is: data above the current watermark and data in current invalidation ranges at the moment of query request.
The proposal is to refactor current Cagg refresh code to separate logic collecting invalidation and materialization ranges from the DML logic refreshing entries in _timescaledb_catalog Cagg tables.
Then for query rewrites we would only call on logic collecting materialization ranges and will add border conditions for those ranges to current "live" portion of real-time Cagg UNION ALL via OR.
We can use GUC which limits the number of materialization ranges per query to bail out and not rewrite the query with Caggs if there are too many materialization ranges.
This enhancement can also be useful independently of query Cagg rewrites.
We had some customers asking for a GUC to toggle real-time Cagg mode ON/OFF instead of having to alter Cagg view every time to change timescaledb.materialized_only from True to False.
This GUC can be more finely tuned to include (1) materialized-only data, or (2) data from (1) plus live data above watermark (current Real-Time Cagg setting), or (3) data from (1) and (2) plus live data in current invalidation ranges.
For queries rewritten with Caggs we would always use mode (3) as we need to provide the same output for queries with and without Cagg rewrites.
It will also be useful for customers who wish to define one view per each Cagg and then toggle the mode on how much real-time data they want along with materialized data.
Implementation challenges
Main challenge is to cleanly refactor existing Cagg refresh code to separate range gathering from writing new ranges into catalog.
It should also work with upcoming granular Cagg refresh changes.