- 
                Notifications
    You must be signed in to change notification settings 
- Fork 121
Nsight Profiling to Phoenix Benchmark Cases #929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| PR Reviewer Guide 🔍Here are some key observations to aid the review process: 
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds Nsight Systems profiling to the Phoenix benchmarking workflows to allow side-by-side comparison of master vs. PR performance.
- Wraps both GPU and CPU benchmark invocations in nsys profilewithinbench.sh.
- Processes and prints key profiling metrics (NVTX, CUDA API calls, GPU kernels) in the CI via bench.yml.
- Uploads the generated report.nsys-repas an artifact for later inspection.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description | 
|---|---|
| .github/workflows/phoenix/bench.sh | Prepend nsys profileto the existing benchmark commands | 
| .github/workflows/bench.yml | Add a “Process Nsight Profiling Report” step and include the report.nsys-rep artifact | 
Comments suppressed due to low confidence (4)
.github/workflows/phoenix/bench.sh:19
- Using a fixed output name reportmay cause collisions or overwrite data when running multiple jobs; consider parameterizing the output file (e.g.-o "$job_slug") to improve traceability.
    nsys profile -o report ./mfc.sh bench --mem 12 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
.github/workflows/phoenix/bench.sh:21
- Same as above: the static reportfilename will be reused for CPU runs—consider including$job_slugor device name in the output filename to avoid overwrites.
    nsys profile -o report ./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
.github/workflows/bench.yml:97
- The workflow checks for pr/report.nsys-rep, butbench.shemitsreport.nsys-repin the workspace root (or TMPDIR). The path should match where the file is actually written or copy the report intopr/beforehand.
          if [ -f "pr/report.nsys-rep" ]; then
.github/workflows/bench.yml:104
- [nitpick] The profiling section hardcodes just three report types—consider looping over the full set of reports listed in the PR description to reduce duplication and ensure all metrics are covered.
            echo "=== CUDA API CALLS ==="
| PR Code Suggestions ✨Explore these optional code suggestions: 
 | ||||||||||||
| Codecov ReportAll modified and coverable lines are covered by tests ✅ 
 Additional details and impacted files@@           Coverage Diff           @@
##           master     #929   +/-   ##
=======================================
  Coverage   43.71%   43.71%           
=======================================
  Files          68       68           
  Lines       18360    18360           
  Branches     2292     2292           
=======================================
  Hits         8026     8026           
  Misses       8945     8945           
  Partials     1389     1389           ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
| if only the benchmarks matter, comment out or delete the test suite so it doesn't rerun everytime. | 
| fails everywhere? | 
| this PR keeps failing benchmarking... what is going on? | 
| 
 For Nsight (       - name: Process Nsight Profiling Report
        run: |
          if [ -f "pr/report.nsys-rep" ]; then
            echo "=== Nsight Profiling Summary ==="
            echo "Master"
            (cd master && nsys stats --report nvtx_sum report.nsys-rep)
            echo "Pr"
            (cd pr && nsys stats --report nvtx_sum report.nsys-rep)
            echo "=== CUDA API CALLS ==="
            echo "Master"
            (cd master && nsys stats --report cuda_api_sum --format table report.nsys-rep | head -100)
            echo "Pr"
            (cd pr && nsys stats --report cuda_api_sum --format table report.nsys-rep | head -100)
            echo "=== GPU KERNELS ==="
            echo "Master"
            (cd master && nsys stats --report cuda_gpu_kern_sum --format table report.nsys-rep | head -100)
            echo "Pr"
            (cd pr && nsys stats --report cuda_gpu_kern_sum --format table report.nsys-rep | head -100)
            
          else
            echo "No Nsight report found, skipping profiling analysis"
          fi
``` | 
| What does  | 
| working on another thing right now, but will follow up by the morning. | 
| status? did you use nvhpc_acc_time (or whatever it is called)? | 
| Note to Self:Phoenix-GPU Phoenix-CPU On login node, git clone https://github.com/microsoft/WSL2-Linux-Kernel WSL --depth 1
cd WSL2-Linux-Kernel/tools/perf
module load anaconda3
conda activate
make NO_LIBELF=1 NO_LIBTRACEEVENT=1On compute node, module load anaconda3
conda activate
cd ~/WSL/tools/perf
alias perf="$PWD/perf"
cd ../../../ | 
User description
Description
Adding this feature to compare next to each other nsys reports of master vs. pr. I left it now for visual comparison.
To compare reports, use
difforcsv-diffafter exporting readable files withnsys analyze -f <format e.g. csv, txt, etc.> -o <output-file>.Nsight Docs: https://docs.nvidia.com/nsight-systems/UserGuide/index.html#report-scripts
Variety of Reports to Display:
nvtx_sum, osrt_sum, cuda_api_sum, cuda_gpu_kern_sum, cuda_gpu_mem_time_sum, cuda_gpu_mem_size_sum, openmp_sum, opengl_khr_range_sum, opengl_khr_gpu_range_sum, vulkan_marker_sum, vulkan_gpu_marker_sum, dx11_pix_sum, dx12_gpu_marker_sum, dx12_pix_sum, wddm_queue_sum, um_sum, um_total_sum, um_cpu_page_faults_sum, openacc_sumaims to close #392
PR Type
Enhancement
Description
Add Nsight Systems profiling to Phoenix benchmark workflow
Generate profiling reports for master and PR branches
Display NVTX, CUDA API, and GPU kernel summaries
Archive profiling reports as workflow artifacts
Changes diagram
Changes walkthrough 📝
bench.sh
Enable Nsight profiling for benchmark execution.github/workflows/phoenix/bench.sh
nsys profile -o reportcommandbench.yml
Add profiling report processing and archival.github/workflows/bench.yml