[Sharktank] Fix _TEST_LAST_OP_DISPATCH for wrapped functions #2564

Alex-Vasile · 2025-10-20T22:56:54Z

The transfer_n_pin wrapper is fairly straightforward, we simply have to plumb the underlying wrapped function through.

The trivially_replicable wrapper is messier. We don't know which version of the op it will dispatch on the shards until after it's done it. We don't update the last op dispatched and instead leave _TEST_LAST_OP_DISPATCH pointing at the ops used by the last shard.

codecov-commenter · 2025-10-20T23:02:15Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 98.83721% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@d12d8ab). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
sharktank/tests/ops/dispatch_test.py	98.11%	1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2564   +/-   ##
=======================================
  Coverage        ?   77.57%           
=======================================
  Files           ?      264           
  Lines           ?    25154           
  Branches        ?        0           
=======================================
  Hits            ?    19512           
  Misses          ?     5642           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sogartar

I guess more often than not you want the underlying op and not to test the trivial replication mechanism.

I am curious how you debug with this. Is it that somewhere in

sogartar · 2025-10-21T12:31:41Z

sharktank/sharktank/ops/_registry.py

        selected_override, *results = trampoline(self, *args, **kwargs)
        if _ENABLE_TEST_LAST_OP_DISPATCH:
            global _TEST_LAST_OP_DISPATCH
-            _TEST_LAST_OP_DISPATCH = selected_override
+
+            if hasattr(selected_override, "_trivially_replicable_wrapper"):
+                # For trivially replicable wrappers, don't set _TEST_LAST_OP_DISPATCH
+                # the inner calls (which occured already)will set it to the actual op.
+                # NOTE: This assumes that all shards called the same op.
+                pass
+            else:
+                # For wrappers such as `transfer_n_pin`, we set _TEST_LAST_OP_DISPATCH to the original op (not the wrapper).
+                _TEST_LAST_OP_DISPATCH = getattr(
+                    selected_override, "_unwrapped", selected_override
+                )


I guess more often than not you want the underlying op and not to test the trivial replication mechanism.

I am curious how you debug with this. Is it that somewhere in the model code you would enable recording the last dispatch and then inspect what is the recorded function to see if it is the correct one? Do you call it again with the same arguments?

If it is about tracing the dispatches maybe we can log what is getting called here.

Actually to log the outer calls first we would need to do that in the trampoline before the call.

There are a few existing tests using this, search for the _test_enable_last_op_dispatch.

It seems to be used for unit tests when we have complicated overrides.

If we want to check in tests what override gets selected maybe we should expose this.
E.g.

override = ops.my_op.get_override(*args)

The problem is that we have coupling of the dispatch mechanism and the actual execution of the selected override as the op itself may return NotImplemented or continue its execution to fulfil the request.

There may even be multiple ops that could match but will return NotImplemented based on arg combinations. It also won't work for trivially_replicable since we don't know what it calls on the shards until after it's done so.

Any sort of nested dispatching would change what is happening. Here we only have some sort of patch for 2 special cases, transfer_n_pin and trivially_replicable.
Is your goal to make the unsharded tests seamlessly work in the replicated case? Meaning to have the same test code.
These tests rely on logic that is supposed to be not part of the API, which is usually a bad practice.

Maybe we can record not jus the last dispatch, but append to a list. Then the tests can check if the override is in the traced list.

That would have to make assumptions about how far back in the list to look. A simple wrapper like transfer_n_pin is not a problem since we know exactly which override it's wrapping. The issue is trivially_replicable. I think we should refactor it so that it wraps each individual override rather than the op as a whole.

Signed-off-by: Alex Vasile <[email protected]>

Alex-Vasile requested review from rsuderman and sogartar October 20, 2025 22:56

sogartar approved these changes Oct 21, 2025

View reviewed changes

Alex-Vasile added 3 commits October 21, 2025 16:34

Handling for transfer_n_pin and trivially_replicable

c409a0d

Signed-off-by: Alex Vasile <[email protected]>

Fix leftover name

bab0d83

Signed-off-by: Alex Vasile <[email protected]>

Add helper function and cleanup tests

1c3c8b2

Signed-off-by: Alex Vasile <[email protected]>

Alex-Vasile force-pushed the op_fix branch from 19b9d67 to 1c3c8b2 Compare October 21, 2025 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Sharktank] Fix _TEST_LAST_OP_DISPATCH for wrapped functions #2564

[Sharktank] Fix _TEST_LAST_OP_DISPATCH for wrapped functions #2564

Uh oh!

Alex-Vasile commented Oct 20, 2025

Uh oh!

codecov-commenter commented Oct 20, 2025 •

edited

Loading

Uh oh!

sogartar left a comment

Uh oh!

sogartar Oct 21, 2025

Uh oh!

sogartar Oct 21, 2025

Uh oh!

Alex-Vasile Oct 21, 2025

Uh oh!

sogartar Oct 21, 2025

Uh oh!

Alex-Vasile Oct 21, 2025

Uh oh!

sogartar Oct 21, 2025 •

edited

Loading

Uh oh!

sogartar Oct 21, 2025

Uh oh!

Alex-Vasile Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Sharktank] Fix _TEST_LAST_OP_DISPATCH for wrapped functions #2564

Are you sure you want to change the base?

[Sharktank] Fix _TEST_LAST_OP_DISPATCH for wrapped functions #2564

Uh oh!

Conversation

Alex-Vasile commented Oct 20, 2025

Uh oh!

codecov-commenter commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sogartar left a comment

Choose a reason for hiding this comment

Uh oh!

sogartar Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

sogartar Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Alex-Vasile Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

sogartar Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Alex-Vasile Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

sogartar Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sogartar Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Alex-Vasile Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Oct 20, 2025 •

edited

Loading

sogartar Oct 21, 2025 •

edited

Loading