Skip to content

Conversation

@rafiss
Copy link
Collaborator

@rafiss rafiss commented Jan 7, 2026

Backport 4/4 commits from #160138.

/cc @cockroachdb/release


Previously, INSPECT jobs without a historical AS OF SYSTEM TIME clause
would not create protected timestamp records, but still used an AOST
clause with the current timestamp. If span processing took a long time
(especially with BulkLowQoS admission control), garbage collection could
occur before the query completed, resulting in "batch timestamp must be
after replica GC threshold" errors.

This change adds per-span protected timestamp protection when INSPECT
uses "now" as the AOST. The implementation uses a coordinator-based
approach where:

  1. When a processor starts processing a span and picks "now" as the
    timestamp, it sends a new "span started" progress message containing
    the span and timestamp via InspectProcessorProgress.

  2. The coordinator's progress tracker receives this message and calls
    TryToProtectBeforeGC for the relevant tables in that span. This
    waits until 80% of the GC TTL has elapsed before creating a PTS,
    avoiding unnecessary PTS creation for quick operations.

  3. When span processing completes (existing behavior), the coordinator
    cleans up the PTS for that span. Any remaining PTS records are
    cleaned up when the tracker terminates (e.g., on job cancellation).

This coordinator-based design keeps PTS management centralized rather
than distributed across processors, simplifying cleanup and error
handling. PTS failures are logged but don't fail the job since the
protection is best-effort.

sql/inspect: use minimum timestamp for PTS protection

Previously, the INSPECT job called TryToProtectBeforeGC per span with
different timestamps. Since the job only stores one PTS record, each
new span's call to Protect would update the existing record's timestamp
via UpdateTimestamp, which removes protection for older spans.

To address this, this patch changes the PTS strategy to track the
minimum (oldest) timestamp across all active spans and protect only at
that timestamp. Since PROTECT_AFTER mode protects all data at or after
the specified timestamp, protecting at the minimum covers all active
spans. When the oldest span completes, the PTS is updated to the new
minimum timestamp, allowing GC of data between the old and new minimum.

Resolves: #159866
Epic: None

Release note: None

Release justification: fix for feature becoming GA in 26.1

rafiss added 4 commits January 6, 2026 21:33
Previously, INSPECT jobs without a historical AS OF SYSTEM TIME clause
would not create protected timestamp records, but still used an AOST
clause with the current timestamp. If span processing took a long time
(especially with BulkLowQoS admission control), garbage collection could
occur before the query completed, resulting in "batch timestamp must be
after replica GC threshold" errors.

This change adds per-span protected timestamp protection when INSPECT
uses "now" as the AOST. The implementation uses a coordinator-based
approach where:

1. When a processor starts processing a span and picks "now" as the
    timestamp, it sends a new "span started" progress message containing
    the span and timestamp via InspectProcessorProgress.

2. The coordinator's progress tracker receives this message and calls
    TryToProtectBeforeGC for the relevant tables in that span. This
    waits until 80% of the GC TTL has elapsed before creating a PTS,
    avoiding unnecessary PTS creation for quick operations.

3. When span processing completes (existing behavior), the coordinator
    cleans up the PTS for that span. Any remaining PTS records are
    cleaned up when the tracker terminates (e.g., on job cancellation).

This coordinator-based design keeps PTS management centralized rather
than distributed across processors, simplifying cleanup and error
handling. PTS failures are logged but don't fail the job since the
protection is best-effort.

Resolves: cockroachdb#159866

Release note: None
In addition to checkpointing in the job, now we also log progress to
text logs periodically in order to enhance observability.

Release note: None
Previously, TryToProtectBeforeGC accepted a catalog.TableDescriptor
parameter but only used it to call GetID() in two places. This was
unnecessarily restrictive and forced callers to load a full table
descriptor just to pass the ID.

This change simplifies the function signature to accept a descpb.ID
directly. The most significant improvement is in inspect/progress.go,
where this eliminates an unnecessary DescsTxn call that was only used
to load the descriptor for its ID.

Release note: None
Epic: None
Previously, the INSPECT job called TryToProtectBeforeGC per span with
different timestamps. Since the job only stores one PTS record, each
new span's call to Protect would update the existing record's timestamp
via UpdateTimestamp, which removes protection for older spans.

To address this, this patch changes the PTS strategy to track the
minimum (oldest) timestamp across all active spans and protect only at
that timestamp. Since PROTECT_AFTER mode protects all data at or after
the specified timestamp, protecting at the minimum covers all active
spans. When the oldest span completes, the PTS is updated to the new
minimum timestamp, allowing GC of data between the old and new minimum.

Release note: None
@rafiss rafiss requested review from a team as code owners January 7, 2026 02:34
@rafiss rafiss requested review from kev-cao and removed request for a team January 7, 2026 02:34
@blathers-crl
Copy link

blathers-crl bot commented Jan 7, 2026

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Jan 7, 2026
@rafiss rafiss requested a review from spilchen January 7, 2026 02:34
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@spilchen reviewed 13 files and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kev-cao).

@rafiss rafiss merged commit 48d36a5 into cockroachdb:release-26.1 Jan 7, 2026
44 of 47 checks passed
@rafiss rafiss deleted the backport26.1-160138 branch January 9, 2026 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) target-release-26.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants