[D2M][ttnn-jit] Nightly perf collection for Superset#7495
Open
sgholamiTT wants to merge 12 commits intomainfrom
Open
[D2M][ttnn-jit] Nightly perf collection for Superset#7495sgholamiTT wants to merge 12 commits intomainfrom
sgholamiTT wants to merge 12 commits intomainfrom
Conversation
This reverts commit 81f47df.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a nightly CI job to collect TTNN-JIT vs TTNN op-level performance data (via tracy CSV exports), then summarizes it into per-test JSON files intended for downstream Superset ingestion.
Changes:
- Introduces a reusable GitHub Actions workflow to run JIT perf collection on TT hardware and upload results as artifacts.
- Adds a new nightly job (
jit-perf-test) to invoke the reusable workflow. - Adds a small perf CI suite: parametrized pytest ops, a bash orchestrator, and a CSV→JSON summarizer.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/schedule-nightly.yml |
Adds a new nightly job to run JIT perf collection. |
.github/workflows/call-jit-perf-test.yml |
New reusable workflow to set up env, run the perf collection, and upload reports. |
test/ttnn-jit/perf_ci/perf_tests.py |
New pytest suite defining op/dtype/memory/JIT parameterization for profiling. |
test/ttnn-jit/perf_ci/run_perf_collect.sh |
Orchestrates tracy-profiled per-test runs and invokes summarization. |
test/ttnn-jit/perf_ci/summarize_perf_results.py |
Aggregates profiler CSVs and emits per-case JSON reports for ingestion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| function_to_test = ( | ||
| ttnn_jit.jit(debug=True, enable_cache=True)(op) if jit_enabled else op |
Comment on lines
+99
to
+101
|
|
||
| print(f"output_tensor\n: {output_tensor}") | ||
| ttnn.close_device(device) |
Comment on lines
+288
to
+289
| filename = f"perf_{op}_{dtype}_{mem_cfg}{job_suffix}.json" | ||
| filepath = out_dir / filename |
Comment on lines
+243
to
+247
| g["jit_duration_ns"] = r["duration_ns"] | ||
| g["math_fidelity_jit"] = r["math_fidelity"] | ||
| else: | ||
| g["ttnn_duration_ns"] = r["duration_ns"] | ||
| g["math_fidelity_ttnn"] = r["math_fidelity"] |
Comment on lines
+52
to
+58
| jit-perf-test: | ||
| needs: [ build-image, release-build ] | ||
| uses: ./.github/workflows/call-jit-perf-test.yml | ||
| secrets: inherit | ||
| with: | ||
| docker_image: ${{ needs.build-image.outputs.docker-image }} | ||
|
|
|
|
||
| pip show ttmlir &> /dev/null && pip uninstall -y ttmlir | ||
| pip show ttnn-jit &> /dev/null && pip uninstall -y ttnn-jit | ||
| pip install ttnn_jit*.whl --find-links . --upgrade |
Comment on lines
+9
to
+10
| import pytest | ||
|
|
Comment on lines
+69
to
+73
| def test_op_compare( | ||
| h, w, op, dtype, ttnn_dtype, memory_config, memory_config_id, jit_enabled | ||
| ): | ||
| device = ttnn.open_device(device_id=0) | ||
| torch_tensor_a = torch.rand((h, w), dtype=dtype) * 100 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7495 +/- ##
==========================================
+ Coverage 69.25% 69.84% +0.59%
==========================================
Files 405 419 +14
Lines 71883 74115 +2232
==========================================
+ Hits 49781 51765 +1984
- Misses 22102 22350 +248 ☔ View full report in Codecov by Sentry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add JIT performance collection to nightly CI
Runs JIT vs TTNN op-level performance benchmarks in the nightly pipeline and exports structured results to Superset for dashboard visualization.
What
call-jit-perf-test.yml: New reusable workflow that sets up the environment (install artifacts, tracy profiler, ttnn-jit wheels), runsrun_perf_collect.sh, and uploads results as artifacts.schedule-nightly.yml: Adds thejit-perf-testjob afterrelease-build.perf_tests.py: Parametrized pytest suite comparing JIT and non-JIT execution ofabs,exp,add,mul,matmulacross dtypes (bf16,bfp8) and memory configs (dram_interleaved). More configs are yet to be decided.run_perf_collect.sh: Orchestrates tracy-profiled test runs and invokes the summarizer.summarize_perf_results.py: Parses per-test CSV profiler output and produces one JSON report per test case (op + dtype + memory config). Each report maps to a separatebenchmark_runrow in Superset with clean, filterable columns (model=op,precision=dtype,config=shape/memory/fidelity) and three measurements:jit_kernel_duration_ns,ttnn_kernel_duration_ns,perf_ratio.Superset integration
Reports are picked up by the existing
collect_dataaction, SFTP'd to the perf ingestion server, and loaded intosw_test.benchmark_run/sw_test.benchmark_measurementtables. No changes needed to the data pipeline.Dashboard: https://superset.tenstorrent.com/superset/dashboard/p/WpVZ5d8pj90/
Note
This workflow should not increase the nightly duration as it runs in parallel with all other text matrix jobs.
Currently, the e2e process takes around 10 mins (evidence)
Test plan
collect_datafinds and processes allperf_*_<job_id>.jsonreportsbenchmark_run/benchmark_measurementtables