Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3306 +/- ##
=======================================
Coverage 28.12% 28.12%
=======================================
Files 33 33
Lines 4132 4132
=======================================
Hits 1162 1162
Misses 2970 2970 ☔ View full report in Codecov by Sentry. |
fe71c3b to
90f443f
Compare
| # Use MAX PCC as final PCC value | ||
| pcc_value = min_pcc |
There was a problem hiding this comment.
Shouldn't the comment and the PR desc say "use min pcc"?
90f443f to
efbf9ce
Compare
vladimirjovanovicTT
left a comment
There was a problem hiding this comment.
We need slight improvements to this approach to reflect causal nature of LLM decode process:
- I propose to check PCC for output tensors sequentially - we check PCC for each decode step, and assert on the first output that fails. When one token differs between CPU and TT, we can expect the following tokens to diverge, so PCC information for following tokens is of limited use.
- We should enable teacher forcing behavior as described in tenstorrent/tt-forge#859 (regenerating golden outputs).
Mid-term plan (IMO not needed for this PR), as discussed in https://docs.google.com/document/d/1nmFd002Ycv8wadpyOs1ZMMkV6zV6v0pWCl-LZsap1J8/edit?tab=t.0, "4. correctness" section, is to have a better metric than PCC for checking LLM generation correctness.
FYI/thoughts? @odjuricicTT @umalesTT @pglusacTT
+1 |
efbf9ce to
3bf1de7
Compare
Problem description
In Performance benchmark PCC for LLMs is calculated based on the first decode token.
We should take into consideration all tokens.
What's changed
PCC for LLMs is now computed for all tokens, final PCC is min PCC of all tokens.
Simplified
compute_pccfunction.Checklist