Skip to content

[Enhancement]: Include live data from invalidated ranges into Real Time Caggs, toggle Real Time mode via a GUC #9225

@natalya-aksman

Description

@natalya-aksman

What type of enhancement is this?

Refactor, User experience, Performance

What subsystems and features will be improved?

Continuous aggregate, Query planner

What does the enhancement do?

Proposed change is to accompany #8967: rewrite queries with Caggs

If we can rewrite eligible queries with Caggs, we have to guarantee that query output will be the same with or without Cagg rewrites, as we are only changing the query planner.
To provide this guarantee we should be able to combine data materialized from Caggs with data which has not been materialized yet, that is: data above the current watermark and data in current invalidation ranges at the moment of query request.

The proposal is to refactor current Cagg refresh code to separate logic collecting invalidation and materialization ranges from the DML logic refreshing entries in _timescaledb_catalog Cagg tables.

Then for query rewrites we would only call on logic collecting materialization ranges and will add border conditions for those ranges to current "live" portion of real-time Cagg UNION ALL via OR.
We can use GUC which limits the number of materialization ranges per query to bail out and not rewrite the query with Caggs if there are too many materialization ranges.

This enhancement can also be useful independently of query Cagg rewrites.
We had some customers asking for a GUC to toggle real-time Cagg mode ON/OFF instead of having to alter Cagg view every time to change timescaledb.materialized_only from True to False.

This GUC can be more finely tuned to include (1) materialized-only data, or (2) data from (1) plus live data above watermark (current Real-Time Cagg setting), or (3) data from (1) and (2) plus live data in current invalidation ranges.

For queries rewritten with Caggs we would always use mode (3) as we need to provide the same output for queries with and without Cagg rewrites.

It will also be useful for customers who wish to define one view per each Cagg and then toggle the mode on how much real-time data they want along with materialized data.

Implementation challenges

Main challenge is to cleanly refactor existing Cagg refresh code to separate range gathering from writing new ranges into catalog.
It should also work with upcoming granular Cagg refresh changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions