Skip to content

Conversation

AndreyPavlenko
Copy link
Contributor

@AndreyPavlenko AndreyPavlenko commented Jul 25, 2025

To enable the tracking, set the environment variable TRITON_TRACK_DUMP to either 1, true, yes, on, y or a path to a directory where the tracking reports will be dumped.
To add the profiling statistics to the reports, set the TRITON_TRACK_PROFILE environment variable.
To track the kernel launches, set the TRITON_TRACK_RUN environment variable.

Link #4716

@AndreyPavlenko AndreyPavlenko force-pushed the AndreyPavlenko/track branch 3 times, most recently from 41015d0 to 1216480 Compare July 25, 2025 20:48
@AndreyPavlenko AndreyPavlenko changed the title Implemented compile time/size tracking and profiling utility A tracking utility for gathering the compile and/or runtime time, size, profiling and other statistics Jul 25, 2025
@AndreyPavlenko AndreyPavlenko force-pushed the AndreyPavlenko/track branch 2 times, most recently from 7843958 to 9752167 Compare July 29, 2025 13:44
@AndreyPavlenko AndreyPavlenko marked this pull request as ready for review July 29, 2025 18:26
Comment on lines -268 to -269
},
py::call_guard<py::gil_scoped_release>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't allow calling the callback function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make conditional? For example, still use it if pyCb=std::nullopt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's released in the beginning of the lambda and acquired on each callback call.

@anmyachev
Copy link
Contributor

I would also add tests for this utility so that the code does not become outdated unexpectedly.

@Egor-Krivov
Copy link
Contributor

Egor-Krivov commented Aug 11, 2025

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

@AndreyPavlenko
Copy link
Contributor Author

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

There will be separate reports for each compilation.

@Egor-Krivov
Copy link
Contributor

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

There will be separate reports for each compilation.

Can you show how to distinguish between them? I think currently I only see a folder with kernel name and inside a lot of files with similar names, like kernel.run_3842.json. Can I somehow extract which run corresponds to which shape? Maybe I could somehow affect naming, like calling some sort of `profiling.label("m32_n32_k32") and affect naming? Or store all results in one large json based on my provided labels?

@AndreyPavlenko
Copy link
Contributor Author

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

There will be separate reports for each compilation.

Can you show how to distinguish between them? I think currently I only see a folder with kernel name and inside a lot of files with similar names, like kernel.run_3842.json. Can I somehow extract which run corresponds to which shape? Maybe I could somehow affect naming, like calling some sort of `profiling.label("m32_n32_k32") and affect naming? Or store all results in one large json based on my provided labels?

Currently it has the same name as the kernel name and it's difficult to distinguish. A similar issue is discussed here - #4800 (comment) .

kernel.run_3842.json is related to kernel runs tracking, not the compilation, and you probably don't need it. Just do not set the TRITON_TRACK_RUN env var.

@AndreyPavlenko
Copy link
Contributor Author

Now constexprs are added to the kernel names and the grid is added to the kernel runs.

@vlad-penkin vlad-penkin linked an issue Aug 25, 2025 that may be closed by this pull request
@AndreyPavlenko AndreyPavlenko force-pushed the AndreyPavlenko/track branch 3 times, most recently from f16b622 to c465ae3 Compare August 29, 2025 13:43
@@ -13,6 +13,7 @@
import os
import subprocess
from pathlib import Path
from .track import track
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be aligned with current style and move it to top of this file.

Suggested change
from .track import track
from triton.backends.intel.track import track

@@ -68,6 +68,7 @@ env:
VERIFY: ${{ (github.event_name == 'pull_request' || github.event_name == 'schedule' || inputs.verify) && '1' || '0' }}
TAG: ${{ inputs.tag || (github.event_name == 'pull_request' && format('pr-{0}', github.event.number)) || (github.event_name == 'schedule' && 'ci') || 'test' }}
N_RUNS: ${{ inputs.n_runs || '1' }}
TRITON_TRACK_DUMP: "$PWD/reports/track"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it optional depending on input from user. It can cause overhead, which can generally be avoided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also enable this profiling for some test in intel folder at least.



def _tr_env(name: str, default: str = "", type: Any = str) -> Any:
return type(os.environ.get(name, default).strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a type, not a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns the value of the specified type - str, int, etc.

Comment on lines +6 to +8
# To enable the tracking, set the environment variable ``TRITON_TRACK_DUMP``
# to either ``1``, ``true``, ``yes``, ``on``, ``y`` or a path to a directory
# where the tracking reports will be dumped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need all these possible values for TRITON_TRACK_DUMP?

I would leave only path to a directory and undefined cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also print dumps into console. So many values to be consistent with other bool env vars, that support all these values.

else:
import pathlib
_TR_DUMP = lambda tr, dir=pathlib.Path(_TR_DUMP): tr.dump(dir)
if _TR_DUMP is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reduce number of code under top level if else branches.

We can define several classes and choose appropriate one in the end of this class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow us to move almost all python imports at the top of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, in this case, it will do useless calculations and imports. I can split it into multiple files and import the required, depending on the env vars.

return time, time
return 0., {k: Track._to_value(v) if isinstance(v, dict) else v for k, v in values.items()}

if _tr_env_on("TRITON_TRACK_SORT", True): # Sort results by the total time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undocumented


if _tr_env_on("TRITON_TRACK_SORT", True): # Sort results by the total time

def _to_value(values: Dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to define duplicate function and redefine it?

We could check TRITON_TRACK_SORT inside of _to_value function.

@staticmethod
def on_exit(self):
self.pr.disable()
st = pstats.Stats(self.pr, stream=TrackAndProfile._DEVNULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not necessary unless st.print_results is called?

Suggested change
st = pstats.Stats(self.pr, stream=TrackAndProfile._DEVNULL)
st = pstats.Stats(self.pr)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required, because when st.get_print_list() is called, it prints messages to the stream.


return decorator(funcOrName) if callable(funcOrName) else decorator

# This ugly hook is used to decorate the upstream functions and avoid circular imports.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do circular imports appear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we decorate the functions in the triton.runtime.jit module from backend, but the backend is called by that module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compile Time Tracking for Key Workloads
3 participants