[TD] Script to determine which reverts are caused by bad TD #6911

clee2000 · 2025-07-09T21:44:52Z

Honestly a pretty messy script

Determines a commit was caused by bad TD, basically:

find revert
find originally merged commit
find last commit on the PR before the merge
check merged commit for failures
check if failure was excluded by TD on the last commit on the PR

Obviously not perfect, but spot checking it does seem to be ok

Counts are pretty volatile if granularity is too small (week granularity is very volatile), so I'm not sure if this can really be displayed. Maybe should make this periodic and still upload to clickhouse? idk

End of the output looks like this but theres a lot of extra output before it to help debug

CAUSED BY BAD TD: 27 / 184 = 14.67%
Unable to check (lack run id) on PR: 1 / 184 = 0.54%
Total caused by bad TD: 27 / 184 = 14.67%
Month 674: 13 bad TD / 77 total = 16.88%
Month 675: 14 bad TD / 107 total = 13.08%

Also make clickhouse.py client able to be used in thread pool executor (http client didn't like having 1 http client for multiple threads i think)

vercel · 2025-07-09T21:44:57Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Updated (UTC)
torchci	⬜️ Ignored (Inspect)	Visit Preview	Aug 5, 2025 10:36pm

github-advanced-security

lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copilot

Pull Request Overview

This PR adds a script to analyze reverts and determine which were caused by bad TD (Test Dependency) exclusions, along with making the ClickHouse client thread-safe.

Key changes:

Adds a comprehensive script to analyze Git commit history and correlate reverts with TD exclusions
Removes the @lru_cache decorator from get_clickhouse_client() to enable thread-safe usage
Implements parallel processing using ThreadPoolExecutor for performance optimization

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
tools/torchci/td/get_reverts_caused_by_td.py	New script that analyzes Git history, queries ClickHouse for job failures, and determines if reverts were caused by bad TD exclusions
tools/torchci/clickhouse.py	Removes `@lru_cache` decorator from `get_clickhouse_client()` to make it thread-safe for concurrent usage

Copilot · 2025-08-07T15:52:04Z

tools/torchci/td/get_reverts_caused_by_td.py

+            month_groups[month] = (0, 0)
+        month_groups[month] = (month_groups[month][0] + 1, month_groups[month][1])
+    for commit in commits_reverted:
+        month = commit.timestamp_of_merge // (30 * 24 * 60 * 60)


This line uses timestamp_of_merge but the loop is iterating over commits_reverted, not caused_by_bad_td like the previous loop. This will incorrectly group all reverted commits by their merge timestamp instead of revert timestamp for the month calculation.

Suggested change

month = commit.timestamp_of_merge // (30 * 24 * 60 * 60)

month = commit.timestamp_of_revert // (30 * 24 * 60 * 60)

Copilot · 2025-08-07T15:52:04Z

tools/torchci/td/get_reverts_caused_by_td.py

+                    failed_test=get_test_file(line),
+                )
+            )
+    del futures


[nitpick] Explicit deletion of futures is unnecessary as it will be garbage collected when it goes out of scope. This adds clutter without benefit.

Suggested change

del futures

Copilot · 2025-08-07T15:52:04Z

tools/torchci/td/get_reverts_caused_by_td.py

+                f"found last pr sha != alt, {commit.last_pr_sha} != {alt_last_pr_sha[0]}"
+            )
+            bad += 1
+        if commit.last_pr_sha is None:


If alt_last_pr_sha[0] is empty string (when no matching commit is found), setting commit.last_pr_sha to an empty string could cause issues in subsequent queries. Consider checking if alt_last_pr_sha[0] is not empty before assignment.

Suggested change

if commit.last_pr_sha is None:

if commit.last_pr_sha is None and alt_last_pr_sha[0]:

Copilot · 2025-08-07T15:52:04Z

tools/torchci/td/get_reverts_caused_by_td.py

+            x.revert_commit_sha,
+            x.merge_commit_sha,
+            x.merge_commit_sha_prev,
+            x.last_pr_sha,


[nitpick] This commented-out line should either be removed if it's not needed or properly documented with a comment explaining why it's disabled.

Suggested change

x.last_pr_sha,

x.last_pr_sha,

# Disabled: revert_commit_sha_prev is not always available or needed for this analysis

tc

6cd5ea9

pytorch-bot bot added the ci-no-td label Jul 9, 2025

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 9, 2025

clee2000 marked this pull request as ready for review July 9, 2025 21:45

clee2000 requested a review from a team July 9, 2025 21:45

github-advanced-security bot found potential problems Jul 9, 2025

View reviewed changes

clee2000 added 3 commits July 9, 2025 14:52

tc

b4268b0

tc

af35d79

tc

dbeace0

clee2000 force-pushed the csl/td_analyzer branch from 3c1b952 to dbeace0 Compare August 5, 2025 22:36

zxiiro requested a review from Copilot August 7, 2025 15:50

Copilot AI reviewed Aug 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TD] Script to determine which reverts are caused by bad TD #6911

[TD] Script to determine which reverts are caused by bad TD #6911

Uh oh!

clee2000 commented Jul 9, 2025 •

edited

Loading

Uh oh!

vercel bot commented Jul 9, 2025 •

edited

Loading

Uh oh!

github-advanced-security bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 7, 2025

Uh oh!

Copilot AI Aug 7, 2025

Uh oh!

Copilot AI Aug 7, 2025

Uh oh!

Copilot AI Aug 7, 2025

Uh oh!

Uh oh!

	month = commit.timestamp_of_merge // (30 * 24 * 60 * 60)
	month = commit.timestamp_of_revert // (30 * 24 * 60 * 60)

	if commit.last_pr_sha is None:
	if commit.last_pr_sha is None and alt_last_pr_sha[0]:

	x.last_pr_sha,
	x.last_pr_sha,
	# Disabled: revert_commit_sha_prev is not always available or needed for this analysis

[TD] Script to determine which reverts are caused by bad TD #6911

Are you sure you want to change the base?

[TD] Script to determine which reverts are caused by bad TD #6911

Uh oh!

Conversation

clee2000 commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clee2000 commented Jul 9, 2025 •

edited

Loading

vercel bot commented Jul 9, 2025 •

edited

Loading