Skip to content

Improve benchmarks PCC computation#3306

Open
vkovacevicTT wants to merge 2 commits intomainfrom
vkovacevic/improve-llms-pcc
Open

Improve benchmarks PCC computation#3306
vkovacevicTT wants to merge 2 commits intomainfrom
vkovacevic/improve-llms-pcc

Conversation

@vkovacevicTT
Copy link
Contributor

@vkovacevicTT vkovacevicTT commented Feb 13, 2026

Problem description

In Performance benchmark PCC for LLMs is calculated based on the first decode token.
We should take into consideration all tokens.

What's changed

PCC for LLMs is now computed for all tokens, final PCC is min PCC of all tokens.
Simplified compute_pcc function.

Checklist

@codecov-commenter
Copy link

codecov-commenter commented Feb 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 28.12%. Comparing base (c666faf) to head (3bf1de7).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3306   +/-   ##
=======================================
  Coverage   28.12%   28.12%           
=======================================
  Files          33       33           
  Lines        4132     4132           
=======================================
  Hits         1162     1162           
  Misses       2970     2970           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vkovacevicTT vkovacevicTT force-pushed the vkovacevic/improve-llms-pcc branch from fe71c3b to 90f443f Compare February 18, 2026 11:07
Comment on lines 549 to 550
# Use MAX PCC as final PCC value
pcc_value = min_pcc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the comment and the PR desc say "use min pcc"?

@vkovacevicTT vkovacevicTT force-pushed the vkovacevic/improve-llms-pcc branch from 90f443f to efbf9ce Compare February 19, 2026 10:52
Copy link
Contributor

@vladimirjovanovicTT vladimirjovanovicTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need slight improvements to this approach to reflect causal nature of LLM decode process:

  • I propose to check PCC for output tensors sequentially - we check PCC for each decode step, and assert on the first output that fails. When one token differs between CPU and TT, we can expect the following tokens to diverge, so PCC information for following tokens is of limited use.
  • We should enable teacher forcing behavior as described in tenstorrent/tt-forge#859 (regenerating golden outputs).

Mid-term plan (IMO not needed for this PR), as discussed in https://docs.google.com/document/d/1nmFd002Ycv8wadpyOs1ZMMkV6zV6v0pWCl-LZsap1J8/edit?tab=t.0, "4. correctness" section, is to have a better metric than PCC for checking LLM generation correctness.

FYI/thoughts? @odjuricicTT @umalesTT @pglusacTT

@umalesTT
Copy link
Contributor

We need slight improvements to this approach to reflect causal nature of LLM decode process:

  • I propose to check PCC for output tensors sequentially - we check PCC for each decode step, and assert on the first output that fails. When one token differs between CPU and TT, we can expect the following tokens to diverge, so PCC information for following tokens is of limited use.
  • We should enable teacher forcing behavior as described in tenstorrent/tt-forge#859 (regenerating golden outputs).

Mid-term plan (IMO not needed for this PR), as discussed in https://docs.google.com/document/d/1nmFd002Ycv8wadpyOs1ZMMkV6zV6v0pWCl-LZsap1J8/edit?tab=t.0, "4. correctness" section, is to have a better metric than PCC for checking LLM generation correctness.

FYI/thoughts? @odjuricicTT @umalesTT @pglusacTT

+1
We should also test tokens all the way up to full sequence length, we shouldn't make number of tokens for which we check pcc/other metrics configurable.

@vkovacevicTT vkovacevicTT force-pushed the vkovacevic/improve-llms-pcc branch from efbf9ce to 3bf1de7 Compare February 19, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants