Skip to content

[D2M][ttnn-jit] Nightly perf collection for Superset#7495

Open
sgholamiTT wants to merge 12 commits intomainfrom
sgholami/jit-performance-collection
Open

[D2M][ttnn-jit] Nightly perf collection for Superset#7495
sgholamiTT wants to merge 12 commits intomainfrom
sgholami/jit-performance-collection

Conversation

@sgholamiTT
Copy link
Contributor

@sgholamiTT sgholamiTT commented Mar 13, 2026

Add JIT performance collection to nightly CI

Runs JIT vs TTNN op-level performance benchmarks in the nightly pipeline and exports structured results to Superset for dashboard visualization.

What

  • call-jit-perf-test.yml: New reusable workflow that sets up the environment (install artifacts, tracy profiler, ttnn-jit wheels), runs run_perf_collect.sh, and uploads results as artifacts.
  • schedule-nightly.yml: Adds the jit-perf-test job after release-build.
  • perf_tests.py: Parametrized pytest suite comparing JIT and non-JIT execution of abs, exp, add, mul, matmul across dtypes (bf16, bfp8) and memory configs (dram_interleaved). More configs are yet to be decided.
  • run_perf_collect.sh: Orchestrates tracy-profiled test runs and invokes the summarizer.
  • summarize_perf_results.py: Parses per-test CSV profiler output and produces one JSON report per test case (op + dtype + memory config). Each report maps to a separate benchmark_run row in Superset with clean, filterable columns (model=op, precision=dtype, config=shape/memory/fidelity) and three measurements: jit_kernel_duration_ns, ttnn_kernel_duration_ns, perf_ratio.

Superset integration

Reports are picked up by the existing collect_data action, SFTP'd to the perf ingestion server, and loaded into sw_test.benchmark_run / sw_test.benchmark_measurement tables. No changes needed to the data pipeline.

Dashboard: https://superset.tenstorrent.com/superset/dashboard/p/WpVZ5d8pj90/

Note

This workflow should not increase the nightly duration as it runs in parallel with all other text matrix jobs.
Currently, the e2e process takes around 10 mins (evidence)

Test plan

  • Nightly CI run completes successfully and uploads artifacts (Link)
  • collect_data finds and processes all perf_*_<job_id>.json reports
  • Data appears in Superset benchmark_run / benchmark_measurement tables

Copilot AI review requested due to automatic review settings March 13, 2026 20:24
@sgholamiTT sgholamiTT changed the title Sgholami/jit performance collection [D2M][ttnn-jit] Nightly perf collection for Superset Mar 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a nightly CI job to collect TTNN-JIT vs TTNN op-level performance data (via tracy CSV exports), then summarizes it into per-test JSON files intended for downstream Superset ingestion.

Changes:

  • Introduces a reusable GitHub Actions workflow to run JIT perf collection on TT hardware and upload results as artifacts.
  • Adds a new nightly job (jit-perf-test) to invoke the reusable workflow.
  • Adds a small perf CI suite: parametrized pytest ops, a bash orchestrator, and a CSV→JSON summarizer.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
.github/workflows/schedule-nightly.yml Adds a new nightly job to run JIT perf collection.
.github/workflows/call-jit-perf-test.yml New reusable workflow to set up env, run the perf collection, and upload reports.
test/ttnn-jit/perf_ci/perf_tests.py New pytest suite defining op/dtype/memory/JIT parameterization for profiling.
test/ttnn-jit/perf_ci/run_perf_collect.sh Orchestrates tracy-profiled per-test runs and invokes summarization.
test/ttnn-jit/perf_ci/summarize_perf_results.py Aggregates profiler CSVs and emits per-case JSON reports for ingestion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

function_to_test = (
ttnn_jit.jit(debug=True, enable_cache=True)(op) if jit_enabled else op
Comment on lines +99 to +101

print(f"output_tensor\n: {output_tensor}")
ttnn.close_device(device)
Comment on lines +288 to +289
filename = f"perf_{op}_{dtype}_{mem_cfg}{job_suffix}.json"
filepath = out_dir / filename
Comment on lines +243 to +247
g["jit_duration_ns"] = r["duration_ns"]
g["math_fidelity_jit"] = r["math_fidelity"]
else:
g["ttnn_duration_ns"] = r["duration_ns"]
g["math_fidelity_ttnn"] = r["math_fidelity"]
Comment on lines +52 to +58
jit-perf-test:
needs: [ build-image, release-build ]
uses: ./.github/workflows/call-jit-perf-test.yml
secrets: inherit
with:
docker_image: ${{ needs.build-image.outputs.docker-image }}


pip show ttmlir &> /dev/null && pip uninstall -y ttmlir
pip show ttnn-jit &> /dev/null && pip uninstall -y ttnn-jit
pip install ttnn_jit*.whl --find-links . --upgrade
Comment on lines +9 to +10
import pytest

Comment on lines +69 to +73
def test_op_compare(
h, w, op, dtype, ttnn_dtype, memory_config, memory_config_id, jit_enabled
):
device = ttnn.open_device(device_id=0)
torch_tensor_a = torch.rand((h, w), dtype=dtype) * 100
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.84%. Comparing base (aa43448) to head (081e865).
⚠️ Report is 52 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7495      +/-   ##
==========================================
+ Coverage   69.25%   69.84%   +0.59%     
==========================================
  Files         405      419      +14     
  Lines       71883    74115    +2232     
==========================================
+ Hits        49781    51765    +1984     
- Misses      22102    22350     +248     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants