Add instance filter to TiCDC Grafana dashboards for better metric isolation #3929

tenfyzhong · 2026-01-05T06:37:07Z

What problem does this PR solve?

Issue Number: close #3925

What is changed and how it works?

This PR enhances the TiCDC Grafana dashboards by adding the instance=~"$ticdc_instance" filter to Prometheus queries across multiple panels. Previously, many dashboard panels were missing the instance label selector, which could lead to inaccurate data visualization and performance issues when monitoring TiCDC clusters with multiple instances. The $ticdc_instance variable allows users to filter metrics for specific TiCDC instances, improving dashboard responsiveness and data accuracy.

The changes are applied consistently across three dashboard files:

ticdc_new_arch.json
ticdc_new_arch_next_gen.json
ticdc_new_arch_with_keyspace_name.json

Key improvements include:

Better data isolation: Metrics are now correctly filtered by the selected TiCDC instance, preventing data from different instances from being aggregated together
Consistent filtering: Standardized the use of the instance filter across all relevant dashboard panels
Enhanced monitoring accuracy: Users can now accurately monitor specific TiCDC instances without interference from other instances in the cluster

Check List

Tests

Manual test (add detailed scripts or steps below)
Unit test
Integration test
No code

Manual Test Steps:

Deploy the updated Grafana dashboards to a test environment
Verify that all panels with the instance filter work correctly when selecting different TiCDC instances
Confirm that the $ticdc_instance variable properly filters metrics for the selected instance
Validate that query performance improves with the added filters by comparing query response times
Test that backward compatibility is maintained when no specific instance is selected (using the default wildcard)

Questions

Will it cause performance regression or break compatibility?

No, this change does not cause performance regression or break compatibility. The $ticdc_instance variable defaults to a wildcard (.*) if not specifically set by the user, maintaining backward compatibility. The added filters actually improve performance by reducing the amount of data Prometheus needs to process and return.

Do you need to update user documentation, design documentation or monitoring documentation?

No documentation updates are required. This is an internal improvement to the monitoring dashboards that enhances existing functionality without changing user-facing behavior or APIs.

Release note

None

gemini-code-assist · 2026-01-05T06:37:25Z

Summary of Changes

Hello @tenfyzhong, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the monitoring experience for TiCDC by integrating an instance-level filter into its Grafana dashboards. This change addresses previous challenges with data accuracy and performance when operating multiple TiCDC instances, allowing users to view metrics specific to a chosen instance, thereby streamlining troubleshooting and operational oversight.

Highlights

Enhanced Metric Isolation: Introduced an 'instance' filter to Prometheus queries across multiple TiCDC Grafana dashboards, ensuring metrics are isolated and displayed accurately for individual TiCDC instances.
Improved Dashboard Performance: By adding the 'instance' label selector, the queries now limit the data scope, which reduces the load on Prometheus and improves the responsiveness of the Grafana dashboards.
Consistent Filtering Pattern: The 'instance=~"$ticdc_instance"' filter has been consistently applied to relevant panels in three key TiCDC dashboards, providing a unified approach to instance-specific monitoring.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is a great improvement for monitoring multi-instance TiCDC setups. It consistently adds an instance filter to Prometheus queries across three Grafana dashboard files, which will enhance metric isolation and dashboard performance. The changes are correct and align with the PR's goal. I've added a couple of minor suggestions to improve the readability of some query strings by removing unnecessary newline characters.

metrics/grafana/ticdc_new_arch.json

metrics/grafana/ticdc_new_arch_next_gen.json

- Add `instance=~"$ticdc_instance"` selector to metrics queries for better instance-level filtering - Update maintainer checkpoint/resolved TS lag metrics to include instance filtering - Add instance filter to KV client pull event count metric - Include instance selector in sink batch row count sum aggregation - Update system metrics (CPU, goroutines, memory, file descriptors) to filter by TiCDC instance - Add instance filter to owner ownership counter metrics - Include instance selector in subscription client region count metrics - Update resolve lock task drop count rate calculation with instance filter - Add instance filter to sink MQ checkpoint TS message count metric - Include instance selector in redo total rows count rate calculation Signed-off-by: tenfyzhong <[email protected]>

tenfyzhong · 2026-01-05T07:46:30Z

/gemini Which panels have had the instance filter added by this PR?

gemini-code-assist · 2026-01-05T07:46:48Z

Hello @tenfyzhong, I'd be happy to clarify which panels have had the instance filter added by this PR. The instance=~"$ticdc_instance" filter has been applied to queries for the following metrics across the specified Grafana dashboards:

`metrics/grafana/ticdc_new_arch.json`

ticdc_maintainer_checkpoint_ts_lag
ticdc_maintainer_resolved_ts_lag
ticdc_kvclient_pull_event_count
ticdc_sink_batch_row_count_sum
process_cpu_seconds_total
ticdc_server_go_max_procs
go_goroutines
go_threads
process_resident_memory_bytes
go_memstats_heap_alloc_bytes
process_open_fds
ticdc_owner_ownership_counter
ticdc_subscription_client_requested_region_count
ticdc_subscription_client_subscribed_region_count
ticdc_subscription_client_resolve_lock_task_drop_count
ticdc_sink_mq_checkpoint_ts_message_count
ticdc_redo_total_rows_count

`metrics/grafana/ticdc_new_arch_next_gen.json`

ticdc_maintainer_checkpoint_ts_lag
ticdc_maintainer_resolved_ts_lag
ticdc_kvclient_pull_event_count
ticdc_sink_batch_row_count_sum
process_cpu_seconds_total
ticdc_server_go_max_procs
go_goroutines
go_threads
process_resident_memory_bytes
go_memstats_heap_alloc_bytes
process_open_fds
ticdc_owner_ownership_counter
ticdc_subscription_client_requested_region_count
ticdc_subscription_client_subscribed_region_count
ticdc_subscription_client_resolve_lock_task_drop_count
ticdc_sink_mq_checkpoint_ts_message_count
ticdc_redo_total_rows_count

`metrics/grafana/ticdc_new_arch_with_keyspace_name.json`

ticdc_maintainer_checkpoint_ts_lag
ticdc_maintainer_resolved_ts_lag
ticdc_sink_batch_row_count_sum
ticdc_sink_mq_checkpoint_ts_message_count

This ensures that metrics are correctly filtered by the selected TiCDC instance, improving data accuracy and dashboard performance.

tenfyzhong · 2026-01-05T07:50:00Z

/gemini Which panels have had the instance filter added by this PR? Please respond with the titles of the panels.

gemini-code-assist · 2026-01-05T07:50:23Z

Hello @tenfyzhong, I understand you're looking for the panel titles. Based on the metric names identified in the changes, and without direct access to the title fields in the provided diffs, I've inferred the following panel titles where the instance=~"$ticdc_instance" filter has been added:

`metrics/grafana/ticdc_new_arch.json` and `metrics/grafana/ticdc_new_arch_next_gen.json`

Changefeed Checkpoint TS Lag
Changefeed Resolved TS Lag
KV Client Pull Event Rate
Sink Batch Row Count
Process CPU Usage
Go Max Procs
Go Goroutines
Go Threads
Process Resident Memory
Go Heap Memory Allocation
Process Open File Descriptors
Owner Ownership Status
Subscription Client Requested Regions
Subscription Client Subscribed Regions
Subscription Client Resolve Lock Task Drop Rate
Sink MQ Checkpoint TS Message Count
Redo Log Total Rows

`metrics/grafana/ticdc_new_arch_with_keyspace_name.json`

Changefeed Checkpoint TS Lag
Changefeed Resolved TS Lag
Sink Batch Row Count
Sink MQ Checkpoint TS Message Count

These inferred titles aim to reflect the content of the panels based on the metrics being filtered.

tenfyzhong · 2026-01-05T08:00:05Z

Reviewed by codex:

tenfyzhong · 2026-01-05T08:16:40Z

/check-issue-triage-complete

tenfyzhong · 2026-01-05T08:59:58Z

/gemini review

gemini-code-assist

Code Review

This pull request correctly adds the instance=~\"$ticdc_instance\" filter to many Prometheus queries across the Grafana dashboards, which will improve metric isolation and performance as intended. The changes are consistent and well-executed.

However, I've noticed a few minor inconsistencies and potential omissions:

Several queries have extra spaces or are missing spaces, which could be fixed for better consistency. I've added specific suggestions for these.
I've also identified a few queries that appear to be missing the new instance filter. Since I cannot comment on lines not in the diff, I'm listing some examples here for your consideration. It would be beneficial to perform a quick check to ensure all relevant queries are updated for complete metric isolation.
- In metrics/grafana/ticdc_new_arch.json: ticdc_owner_checkpoint_ts_lag (line 199), ticdc_dispatchermanager_checkpoint_ts_lag (line 1558), and ticdc_dispatchermanager_resolved_ts_lag (line 1656).
- Similar omissions might exist in the other dashboard files.

Overall, this is a great improvement. Addressing these points will make the dashboards even more robust and consistent.

gemini-code-assist · 2026-01-05T09:03:49Z

metrics/grafana/ticdc_new_arch.json

            {
              "exemplar": true,
-              "expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\",  changefeed=~\"$changefeed\"}) by (namespace, changefeed)",
+              "expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\",  changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace, changefeed)",


There's an extra space in the PromQL query before changefeed=~\"$changefeed\". While Prometheus is tolerant of this, removing it would improve consistency with other queries in this file. A similar issue exists on line 6497.

Suggested change

"expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace, changefeed)",

"expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace, changefeed)",

gemini-code-assist · 2026-01-05T09:03:49Z

metrics/grafana/ticdc_new_arch.json

            {
              "exemplar": true,
-              "expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\"}) by (namespace,changefeed)",
+              "expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace,changefeed)",


For consistency with other queries, consider adding a space after the comma in the by clause: by (namespace, changefeed). A similar issue exists on line 6600.

Suggested change

"expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace,changefeed)",

"expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace, changefeed)",

gemini-code-assist · 2026-01-05T09:03:49Z

metrics/grafana/ticdc_new_arch.json

          "targets": [
            {
-              "expr": " go_goroutines{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", job=~\".*ticdc.*\"}",
+              "expr": " go_goroutines{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", job=~\".*ticdc.*\", instance=~\"$ticdc_instance\"}",


There's a leading space in the PromQL query before go_goroutines. While Prometheus is tolerant of this, removing it would improve consistency.

Suggested change

"expr": " go_goroutines{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", job=~\".*ticdc.*\", instance=~\"$ticdc_instance\"}",

"expr": "go_goroutines{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", job=~\".*ticdc.*\", instance=~\"$ticdc_instance\"}",

gemini-code-assist · 2026-01-05T09:03:49Z

metrics/grafana/ticdc_new_arch_next_gen.json

            {
              "exemplar": true,
-              "expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\",  changefeed=~\"$changefeed\"}) by (keyspace_name, changefeed)",
+              "expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\",  changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name, changefeed)",


There's an extra space in the PromQL query before changefeed=~\"$changefeed\". While Prometheus is tolerant of this, removing it would improve consistency with other queries in this file. A similar issue exists on line 6497.

Suggested change

"expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name, changefeed)",

"expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name, changefeed)",

gemini-code-assist · 2026-01-05T09:03:49Z

metrics/grafana/ticdc_new_arch_next_gen.json

            {
              "exemplar": true,
-              "expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\"}) by (keyspace_name,changefeed)",
+              "expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name,changefeed)",


For consistency with other queries, consider adding a space after the comma in the by clause: by (keyspace_name, changefeed). A similar issue exists on line 6600.

Suggested change

"expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name,changefeed)",

"expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name, changefeed)",

gemini-code-assist · 2026-01-05T09:03:49Z

metrics/grafana/ticdc_new_arch_next_gen.json

          "targets": [
            {
-              "expr": " go_goroutines{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", job=~\".*ticdc.*\"}",
+              "expr": " go_goroutines{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", job=~\".*ticdc.*\", instance=~\"$ticdc_instance\"}",


There's a leading space in the PromQL query before go_goroutines. While Prometheus is tolerant of this, removing it would improve consistency.

Suggested change

"expr": " go_goroutines{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", job=~\".*ticdc.*\", instance=~\"$ticdc_instance\"}",

"expr": "go_goroutines{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", job=~\".*ticdc.*\", instance=~\"$ticdc_instance\"}",

gemini-code-assist · 2026-01-05T09:03:49Z

metrics/grafana/ticdc_new_arch_with_keyspace_name.json

            {
              "exemplar": true,
-              "expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\",  changefeed=~\"$changefeed\"}) by (keyspace_name, changefeed)",
+              "expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\",  changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name, changefeed)",


There's an extra space in the PromQL query before changefeed=~\"$changefeed\". While Prometheus is tolerant of this, removing it would improve consistency with other queries in this file. A similar issue exists on line 2950.

Suggested change

"expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name, changefeed)",

"expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name, changefeed)",

gemini-code-assist · 2026-01-05T09:03:49Z

metrics/grafana/ticdc_new_arch_with_keyspace_name.json

            {
              "exemplar": true,
-              "expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\"}) by (keyspace_name,changefeed)",
+              "expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name,changefeed)",


For consistency with other queries, consider adding a space after the comma in the by clause: by (keyspace_name, changefeed). A similar issue exists on line 3053.

Suggested change

"expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name,changefeed)",

"expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (keyspace_name, changefeed)",

ti-chi-bot · 2026-01-05T09:19:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3AceShowHand, flowbehappy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [3AceShowHand,flowbehappy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-01-05T09:19:56Z

[LGTM Timeline notifier]

Timeline:

2026-01-05 09:06:44.656510249 +0000 UTC m=+606760.474818681: ☑️ agreed by flowbehappy.
2026-01-05 09:19:55.011949648 +0000 UTC m=+607550.830258070: ☑️ agreed by 3AceShowHand.

tenfyzhong · 2026-01-05T09:26:46Z

Manual test:

- Update Prometheus query in ticdc_new_arch.json to include instance filter - Update Prometheus query in ticdc_new_arch_next_gen.json to include instance filter - Allows filtering TiCDC process uptime metrics by specific instance in Grafana dashboards Signed-off-by: tenfyzhong <[email protected]>

tenfyzhong · 2026-01-06T02:09:40Z

/retest

tenfyzhong · 2026-01-06T03:49:43Z

/retest

tenfyzhong · 2026-01-06T07:06:27Z

/retest

tenfyzhong · 2026-01-06T09:22:56Z

/retest

tenfyzhong · 2026-01-06T14:33:45Z

/retest

tenfyzhong · 2026-01-07T01:19:24Z

/retest

tenfyzhong · 2026-01-07T02:31:25Z

/retest

tenfyzhong · 2026-01-07T03:56:34Z

/retest

tenfyzhong · 2026-01-07T07:45:39Z

/retest

tenfyzhong · 2026-01-07T11:02:57Z

/retest

tenfyzhong · 2026-01-07T13:50:52Z

/retest

tenfyzhong · 2026-01-07T15:43:55Z

/retest

tenfyzhong · 2026-01-08T06:17:49Z

/retest

tenfyzhong · 2026-01-08T15:50:34Z

/retest

tenfyzhong · 2026-01-09T06:18:54Z

/retest

tenfyzhong · 2026-01-09T07:58:10Z

/override pull-cdc-kafka-integration-light

ti-chi-bot · 2026-01-09T07:58:15Z

@tenfyzhong: Overrode contexts on behalf of tenfyzhong: pull-cdc-kafka-integration-light

Details

In response to this:

/override pull-cdc-kafka-integration-light

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tenfyzhong · 2026-01-09T14:08:45Z

/override pull-cdc-kafka-integration-heavy

ti-chi-bot · 2026-01-09T14:08:49Z

@tenfyzhong: Overrode contexts on behalf of tenfyzhong: pull-cdc-kafka-integration-heavy

Details

In response to this:

/override pull-cdc-kafka-integration-heavy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tenfyzhong · 2026-01-09T14:09:03Z

/override pull-cdc-kafka-integration-light

ti-chi-bot · 2026-01-09T14:09:08Z

@tenfyzhong: Overrode contexts on behalf of tenfyzhong: pull-cdc-kafka-integration-light

Details

In response to this:

/override pull-cdc-kafka-integration-light

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tenfyzhong · 2026-01-09T14:09:34Z

/tide

ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 5, 2026

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

metrics/grafana/ticdc_new_arch.json Outdated Show resolved Hide resolved

metrics/grafana/ticdc_new_arch_next_gen.json Outdated Show resolved Hide resolved

tenfyzhong force-pushed the fix-metrics-instance-filter branch from 11ca7e4 to 53439e1 Compare January 5, 2026 07:40

ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 5, 2026

tenfyzhong force-pushed the fix-metrics-instance-filter branch from 53439e1 to 0f26332 Compare January 5, 2026 07:41

ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/needs-triage-completed and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/needs-linked-issue labels Jan 5, 2026

ti-chi-bot bot removed the do-not-merge/needs-triage-completed label Jan 5, 2026

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

tenfyzhong requested review from 3AceShowHand and asddongmen January 5, 2026 09:05

flowbehappy approved these changes Jan 5, 2026

View reviewed changes

ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 5, 2026

3AceShowHand approved these changes Jan 5, 2026

View reviewed changes

ti-chi-bot bot added the lgtm label Jan 5, 2026

ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 5, 2026

ti-chi-bot bot merged commit 45e28a0 into pingcap:master Jan 9, 2026
13 checks passed

	"expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace, changefeed)",
	"expr": "max(ticdc_maintainer_checkpoint_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace, changefeed)",

	"expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace,changefeed)",
	"expr": "max(ticdc_maintainer_resolved_ts_lag{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\", instance=~\"$ticdc_instance\"}) by (namespace, changefeed)",

	"expr": " go_goroutines{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", job=~\".ticdc.\", instance=~\"$ticdc_instance\"}",
	"expr": "go_goroutines{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", job=~\".ticdc.\", instance=~\"$ticdc_instance\"}",

Add instance filter to TiCDC Grafana dashboards for better metric isolation #3929

Add instance filter to TiCDC Grafana dashboards for better metric isolation #3929

Uh oh!

Conversation

tenfyzhong commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Manual Test Steps:

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

tenfyzhong commented Jan 5, 2026

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

metrics/grafana/ticdc_new_arch.json

metrics/grafana/ticdc_new_arch_next_gen.json

metrics/grafana/ticdc_new_arch_with_keyspace_name.json

Uh oh!

tenfyzhong commented Jan 5, 2026

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

metrics/grafana/ticdc_new_arch.json and metrics/grafana/ticdc_new_arch_next_gen.json

metrics/grafana/ticdc_new_arch_with_keyspace_name.json

Uh oh!

tenfyzhong commented Jan 5, 2026

Uh oh!

tenfyzhong commented Jan 5, 2026

Uh oh!

tenfyzhong commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Jan 5, 2026

Uh oh!

ti-chi-bot bot commented Jan 5, 2026

[LGTM Timeline notifier]

Uh oh!

tenfyzhong commented Jan 5, 2026

tenfyzhong commented Jan 5, 2026 •

edited

Loading

`metrics/grafana/ticdc_new_arch.json`

`metrics/grafana/ticdc_new_arch_next_gen.json`

`metrics/grafana/ticdc_new_arch_with_keyspace_name.json`

`metrics/grafana/ticdc_new_arch.json` and `metrics/grafana/ticdc_new_arch_next_gen.json`

`metrics/grafana/ticdc_new_arch_with_keyspace_name.json`