A tracking utility for gathering the compile and/or runtime time, size, profiling and other statistics #4777

AndreyPavlenko · 2025-07-25T16:03:14Z

To enable the tracking, set the environment variable TRITON_TRACK_DUMP to either 1, true, yes, on, y or a path to a directory where the tracking reports will be dumped.
To add the profiling statistics to the reports, set the TRITON_TRACK_PROFILE environment variable.
To track the kernel launches, set the TRITON_TRACK_RUN environment variable.

Link #4716

anmyachev · 2025-08-06T13:30:45Z

third_party/intel/triton_xpu.cc

-      },
-      py::call_guard<py::gil_scoped_release>());


Why removed?

It doesn't allow calling the callback function.

Is it possible to make conditional? For example, still use it if pyCb=std::nullopt.

Now it's released in the beginning of the lambda and acquired on each callback call.

anmyachev · 2025-08-06T13:56:39Z

I would also add tests for this utility so that the code does not become outdated unexpectedly.

Egor-Krivov · 2025-08-11T16:20:11Z

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

AndreyPavlenko · 2025-08-12T13:54:29Z

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

There will be separate reports for each compilation.

Egor-Krivov · 2025-08-12T15:13:13Z

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

There will be separate reports for each compilation.

Can you show how to distinguish between them? I think currently I only see a folder with kernel name and inside a lot of files with similar names, like kernel.run_3842.json. Can I somehow extract which run corresponds to which shape? Maybe I could somehow affect naming, like calling some sort of `profiling.label("m32_n32_k32") and affect naming? Or store all results in one large json based on my provided labels?

AndreyPavlenko · 2025-08-12T20:23:59Z

@AndreyPavlenko Will it be possible to distinguish between configurations for the single script like our microbenchmarks? Like if I call kernel with different input parameters, I would probably want to get separate compile time for each input size.

There will be separate reports for each compilation.

Can you show how to distinguish between them? I think currently I only see a folder with kernel name and inside a lot of files with similar names, like kernel.run_3842.json. Can I somehow extract which run corresponds to which shape? Maybe I could somehow affect naming, like calling some sort of `profiling.label("m32_n32_k32") and affect naming? Or store all results in one large json based on my provided labels?

Currently it has the same name as the kernel name and it's difficult to distinguish. A similar issue is discussed here - #4800 (comment) .

kernel.run_3842.json is related to kernel runs tracking, not the compilation, and you probably don't need it. Just do not set the TRITON_TRACK_RUN env var.

AndreyPavlenko · 2025-08-18T20:21:01Z

Now constexprs are added to the kernel names and the grid is added to the kernel runs.

third_party/intel/backend/track.py

anmyachev · 2025-08-29T14:27:39Z

third_party/intel/backend/compiler.py

@@ -13,6 +13,7 @@
 import os
 import subprocess
 from pathlib import Path
+from .track import track


To be aligned with current style and move it to top of this file.

Suggested change

from .track import track

from triton.backends.intel.track import track

anmyachev · 2025-08-29T14:30:46Z

.github/workflows/triton-benchmarks.yml

@@ -68,6 +68,7 @@ env:
  VERIFY: ${{ (github.event_name == 'pull_request' || github.event_name == 'schedule' || inputs.verify) && '1' || '0' }}
  TAG: ${{ inputs.tag || (github.event_name == 'pull_request' && format('pr-{0}', github.event.number)) || (github.event_name == 'schedule' && 'ci') || 'test' }}
  N_RUNS: ${{ inputs.n_runs || '1' }}
+  TRITON_TRACK_DUMP: "$PWD/reports/track"


Let's make it optional depending on input from user. It can cause overhead, which can generally be avoided.

I would also enable this profiling for some test in intel folder at least.

anmyachev · 2025-08-29T14:35:44Z

third_party/intel/backend/track.py

+
+
+def _tr_env(name: str, default: str = "", type: Any = str) -> Any:
+    return type(os.environ.get(name, default).strip())


This returns a type, not a value.

It returns the value of the specified type - str, int, etc.

anmyachev · 2025-08-29T14:40:05Z

third_party/intel/backend/track.py

+# To enable the tracking, set the environment variable ``TRITON_TRACK_DUMP``
+# to either ``1``, ``true``, ``yes``, ``on``, ``y`` or a path to a directory
+# where the tracking reports will be dumped.


Do we really need all these possible values for TRITON_TRACK_DUMP?

I would leave only path to a directory and undefined cases.

It can also print dumps into console. So many values to be consistent with other bool env vars, that support all these values.

anmyachev · 2025-08-29T14:42:48Z

third_party/intel/backend/track.py

+else:
+    import pathlib
+    _TR_DUMP = lambda tr, dir=pathlib.Path(_TR_DUMP): tr.dump(dir)
+if _TR_DUMP is not None:


Let's reduce number of code under top level if else branches.

We can define several classes and choose appropriate one in the end of this class.

This will allow us to move almost all python imports at the top of the file.

But, in this case, it will do useless calculations and imports. I can split it into multiple files and import the required, depending on the env vars.

anmyachev · 2025-08-29T14:49:19Z

third_party/intel/backend/track.py

+                return time, time
+            return 0., {k: Track._to_value(v) if isinstance(v, dict) else v for k, v in values.items()}
+
+    if _tr_env_on("TRITON_TRACK_SORT", True):  # Sort results by the total time


Undocumented

anmyachev · 2025-08-29T14:51:28Z

third_party/intel/backend/track.py

+
+    if _tr_env_on("TRITON_TRACK_SORT", True):  # Sort results by the total time
+
+        def _to_value(values: Dict):


Do we really need to define duplicate function and redefine it?

We could check TRITON_TRACK_SORT inside of _to_value function.

anmyachev · 2025-08-29T14:59:43Z

third_party/intel/backend/track.py

+        @staticmethod
+        def on_exit(self):
+            self.pr.disable()
+            st = pstats.Stats(self.pr, stream=TrackAndProfile._DEVNULL)


This is probably not necessary unless st.print_results is called?

Suggested change

st = pstats.Stats(self.pr, stream=TrackAndProfile._DEVNULL)

st = pstats.Stats(self.pr)

It's required, because when st.get_print_list() is called, it prints messages to the stream.

anmyachev · 2025-08-29T15:01:50Z

third_party/intel/backend/track.py

+
+        return decorator(funcOrName) if callable(funcOrName) else decorator
+
+    # This ugly hook is used to decorate the upstream functions and avoid circular imports.


Why do circular imports appear?

Because we decorate the functions in the triton.runtime.jit module from backend, but the backend is called by that module.

AndreyPavlenko mentioned this pull request Jul 25, 2025

[test_mxfp8_mxfp4_matmul] The kernels compilation time is too long #4062

Open

AndreyPavlenko force-pushed the AndreyPavlenko/track branch 3 times, most recently from 41015d0 to 1216480 Compare July 25, 2025 20:48

AndreyPavlenko changed the title ~~Implemented compile time/size tracking and profiling utility~~ A tracking utility for gathering the compile and/or runtime time, size, profiling and other statistics Jul 25, 2025

AndreyPavlenko force-pushed the AndreyPavlenko/track branch 2 times, most recently from 7843958 to 9752167 Compare July 29, 2025 13:44

AndreyPavlenko requested review from anmyachev, whitneywhtsang and Egor-Krivov July 29, 2025 14:26

AndreyPavlenko marked this pull request as ready for review July 29, 2025 18:26

AndreyPavlenko force-pushed the AndreyPavlenko/track branch from 9752167 to d2a0de4 Compare July 31, 2025 13:29

Egor-Krivov approved these changes Aug 5, 2025

View reviewed changes

Egor-Krivov mentioned this pull request Aug 6, 2025

Compile Time Tracking for Key Workloads #4716

Open

anmyachev reviewed Aug 6, 2025

View reviewed changes

AndreyPavlenko force-pushed the AndreyPavlenko/track branch from d2a0de4 to 55be9d9 Compare August 18, 2025 20:19

AndreyPavlenko force-pushed the AndreyPavlenko/track branch from 55be9d9 to 7b81c66 Compare August 19, 2025 11:43

vlad-penkin linked an issue Aug 25, 2025 that may be closed by this pull request

Compile Time Tracking for Key Workloads #4716

Open

AndreyPavlenko force-pushed the AndreyPavlenko/track branch 3 times, most recently from f16b622 to c465ae3 Compare August 29, 2025 13:43

anmyachev reviewed Aug 29, 2025

View reviewed changes

third_party/intel/backend/track.py Outdated Show resolved Hide resolved

Implemented compile time/size tracking and profiling utility

b15f57b

AndreyPavlenko force-pushed the AndreyPavlenko/track branch from c465ae3 to b15f57b Compare August 29, 2025 14:21

anmyachev requested changes Aug 29, 2025

View reviewed changes

	from .track import track
	from triton.backends.intel.track import track



		def _tr_env(name: str, default: str = "", type: Any = str) -> Any:
		return type(os.environ.get(name, default).strip())


		if _tr_env_on("TRITON_TRACK_SORT", True): # Sort results by the total time

		def _to_value(values: Dict):

	st = pstats.Stats(self.pr, stream=TrackAndProfile._DEVNULL)
	st = pstats.Stats(self.pr)


		return decorator(funcOrName) if callable(funcOrName) else decorator

		# This ugly hook is used to decorate the upstream functions and avoid circular imports.

A tracking utility for gathering the compile and/or runtime time, size, profiling and other statistics #4777

Are you sure you want to change the base?

A tracking utility for gathering the compile and/or runtime time, size, profiling and other statistics #4777

Uh oh!

Conversation

AndreyPavlenko commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anmyachev commented Aug 6, 2025

Uh oh!

Egor-Krivov commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreyPavlenko commented Aug 12, 2025

Uh oh!

Egor-Krivov commented Aug 12, 2025

Uh oh!

AndreyPavlenko commented Aug 12, 2025

Uh oh!

AndreyPavlenko commented Aug 18, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AndreyPavlenko commented Jul 25, 2025 •

edited

Loading

Egor-Krivov commented Aug 11, 2025 •

edited

Loading