Skip to content

Conversation

@bharathv
Copy link
Contributor

@bharathv bharathv commented Dec 3, 2025

Fixes the reconciliation logic to stop the replicators that are no longer needed after topic got deleted.

Additionally

  • Improves logging for state mutating requests (bumped to info)
  • Sort partitions and topics in the status output for consistent reporting.

Fixes

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x
  • v24.3.x

Release Notes

Bug Fixes

  • Avoids runaway replicators after topics got deleted. Additionally improve some observability related to shadow requests.

Copilot AI review requested due to automatic review settings December 3, 2025 22:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where shadow replicators were not properly stopped when topics were deleted, causing runaway replicators. The fix adds logic to detect when topics are no longer being mirrored and stops their replicators during link configuration updates.

Key changes:

  • Added reconciliation logic to stop replicators for deleted topics
  • Improved observability by upgrading state-mutating request logs from trace to info level
  • Enhanced status output consistency by sorting topics and partitions

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/v/cluster_link/link.cc Adds tracking and cleanup of replicators for topics that are no longer being mirrored
src/v/redpanda/admin/services/shadow_link/shadow_link.cc Moves logging of state-mutating requests from trace to info level for better observability
src/v/redpanda/admin/services/shadow_link/converter.cc Sorts shadow topics and partition information in status output for consistent reporting
tests/rptest/tests/cluster_linking_e2e_test.py Extends test coverage to verify replicators are properly stopped and can be restarted after topic deletion and re-creation

@bharathv bharathv added this to the v25.3.2 milestone Dec 3, 2025
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 4, 2025

CI test results

test results on build#77284
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ReplicatedMetastoreTest TestBasicRemoveTopics unit https://buildkite.com/redpanda/redpanda/builds/77284#019ae651-7169-4d2e-92e0-01e481c65faf FAIL 0/1
TxAtomicProduceConsumeTest test_basic_tx_consumer_transform_produce {"with_failures": true} integration https://buildkite.com/redpanda/redpanda/builds/77284#019ae67f-3f5b-4a0b-9d58-adab053691ea FLAKY 20/21 upstream reliability is '98.45758354755783'. current run reliability is '95.23809523809523'. drift is 3.21949 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TxAtomicProduceConsumeTest&test_method=test_basic_tx_consumer_transform_produce
test results on build#77301
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkingReplicationTests test_topic_delete {"source_cluster_spec": {"cluster_type": "redpanda"}} integration https://buildkite.com/redpanda/redpanda/builds/77301#019ae6fc-d03e-442c-b0c2-6d8ab87b8124 FLAKY 45/50 upstream reliability is '100.0'. current run reliability is '79.16666666666666'. drift is 20.83333 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_topic_delete
ShadowLinkingReplicationTests test_topic_delete {"source_cluster_spec": {"cluster_type": "redpanda"}} integration https://buildkite.com/redpanda/redpanda/builds/77301#019ae6fc-d044-44fb-a155-c0146d2a1425 FLAKY 33/50 upstream reliability is '100.0'. current run reliability is '46.875'. drift is 53.125 and the allowed drift is set to 50. The test should FAIL https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_topic_delete
test results on build#77348
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
MountUnmountIcebergTest test_simple_remount {"cloud_storage_type": 1} integration https://buildkite.com/redpanda/redpanda/builds/77348#019aeaa1-1bc5-4889-9e27-91ebd23013b8 FLAKY 13/21 upstream reliability is '79.46537059538275'. current run reliability is '61.904761904761905'. drift is 17.56061 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=MountUnmountIcebergTest&test_method=test_simple_remount
PartitionReassignmentsTest test_reassignments_cancel null integration https://buildkite.com/redpanda/redpanda/builds/77348#019aea9d-2807-4216-ac9d-0f823ff387ac FLAKY 15/21 upstream reliability is '94.22750424448218'. current run reliability is '71.42857142857143'. drift is 22.79893 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionReassignmentsTest&test_method=test_reassignments_cancel

@bharathv
Copy link
Contributor Author

bharathv commented Dec 4, 2025

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
dt-repeat=30
tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingReplicationTests.test_topic_delete

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proto changes LGTM

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@bharathv bharathv merged commit e459f9c into redpanda-data:dev Dec 4, 2025
30 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v25.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v25.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-28825-v25.3.x-828 remotes/upstream/v25.3.x
git cherry-pick -x 30e9f5a99f ce9388edf9 274b9f2ea6 77c17d3afe 9263bed079

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants