Improve benchmarks PCC computation by vkovacevicTT · Pull Request #3306 · tenstorrent/tt-xla

vkovacevicTT · 2026-02-13T14:30:24Z

Problem description

In Performance benchmark PCC for LLMs is calculated based on the first decode token.
We should take into consideration all tokens.

What's changed

PCC for LLMs is now computed for all tokens, final PCC is min PCC of all tokens.
Simplified compute_pcc function.

Checklist

codecov-commenter · 2026-02-13T14:55:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 28.12%. Comparing base (c666faf) to head (3bf1de7).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3306   +/-   ##
=======================================
  Coverage   28.12%   28.12%           
=======================================
  Files          33       33           
  Lines        4132     4132           
=======================================
  Hits         1162     1162           
  Misses       2970     2970

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bmalesevicTT · 2026-02-18T15:13:51Z

tests/benchmark/benchmarks/llm_benchmark.py

+    # Use MAX PCC as final PCC value
+    pcc_value = min_pcc


Shouldn't the comment and the PR desc say "use min pcc"?

vladimirjovanovicTT

We need slight improvements to this approach to reflect causal nature of LLM decode process:

I propose to check PCC for output tensors sequentially - we check PCC for each decode step, and assert on the first output that fails. When one token differs between CPU and TT, we can expect the following tokens to diverge, so PCC information for following tokens is of limited use.
We should enable teacher forcing behavior as described in tenstorrent/tt-forge#859 (regenerating golden outputs).

Mid-term plan (IMO not needed for this PR), as discussed in https://docs.google.com/document/d/1nmFd002Ycv8wadpyOs1ZMMkV6zV6v0pWCl-LZsap1J8/edit?tab=t.0, "4. correctness" section, is to have a better metric than PCC for checking LLM generation correctness.

FYI/thoughts? @odjuricicTT @umalesTT @pglusacTT

umalesTT · 2026-02-19T13:31:00Z

We need slight improvements to this approach to reflect causal nature of LLM decode process:

I propose to check PCC for output tensors sequentially - we check PCC for each decode step, and assert on the first output that fails. When one token differs between CPU and TT, we can expect the following tokens to diverge, so PCC information for following tokens is of limited use.

We should enable teacher forcing behavior as described in tenstorrent/tt-forge#859 (regenerating golden outputs).

Mid-term plan (IMO not needed for this PR), as discussed in https://docs.google.com/document/d/1nmFd002Ycv8wadpyOs1ZMMkV6zV6v0pWCl-LZsap1J8/edit?tab=t.0, "4. correctness" section, is to have a better metric than PCC for checking LLM generation correctness.

FYI/thoughts? @odjuricicTT @umalesTT @pglusacTT

+1
We should also test tokens all the way up to full sequence length, we shouldn't make number of tokens for which we check pcc/other metrics configurable.

vkovacevicTT requested review from bmalesevicTT, mvasiljevicTT, odjuricicTT and rpavlovicTT as code owners February 13, 2026 14:30

vkovacevicTT force-pushed the vkovacevic/improve-llms-pcc branch from fe71c3b to 90f443f Compare February 18, 2026 11:07

bmalesevicTT reviewed Feb 18, 2026

View reviewed changes

vkovacevicTT force-pushed the vkovacevic/improve-llms-pcc branch from 90f443f to efbf9ce Compare February 19, 2026 10:52

vkovacevicTT requested a review from bmalesevicTT February 19, 2026 10:55

vladimirjovanovicTT requested changes Feb 19, 2026

View reviewed changes

vkovacevicTT added 2 commits February 19, 2026 15:51

Calculate LLM PCC for all tokens and simplify compute_pcc function

27bbe8d

Teacher forcing and sequential PCC asserts

3bf1de7

vkovacevicTT force-pushed the vkovacevic/improve-llms-pcc branch from efbf9ce to 3bf1de7 Compare February 19, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve benchmarks PCC computation#3306

Improve benchmarks PCC computation#3306
vkovacevicTT wants to merge 2 commits intomainfrom
vkovacevic/improve-llms-pcc

vkovacevicTT commented Feb 13, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 13, 2026 •

edited

Loading

Uh oh!

bmalesevicTT Feb 18, 2026

Uh oh!

vladimirjovanovicTT left a comment

Uh oh!

umalesTT commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

vkovacevicTT commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem description

What's changed

Checklist

Uh oh!

codecov-commenter commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bmalesevicTT Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

vladimirjovanovicTT left a comment

Choose a reason for hiding this comment

Uh oh!

umalesTT commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vkovacevicTT commented Feb 13, 2026 •

edited

Loading

codecov-commenter commented Feb 13, 2026 •

edited

Loading