Skip to content

Commit c52fff2

Browse files
committed
fixup gpu docs
1 parent 063f554 commit c52fff2

File tree

3 files changed

+161
-161
lines changed

3 files changed

+161
-161
lines changed

docs/documentation/gpuDebugging.md

Lines changed: 0 additions & 156 deletions
This file was deleted.

docs/documentation/gpuParallelization.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -564,3 +564,160 @@ Uses FYPP eval directive using `$:`
564564
</details>
565565

566566
------------------------------------------------------------------------------------------
567+
568+
# Debugging Tools and Tips for GPUs
569+
570+
## Compiler agnostic tools
571+
572+
## OpenMP tools
573+
```bash
574+
OMP_DISPLAY_ENV=true | false | verbose
575+
```
576+
- Prints out the internal control values and environment variables at the beginning of the program if `true` or `verbose`
577+
- `verbose` will also print out vendor-specific internal control values and environment variables
578+
579+
```bash
580+
OMP_TARGET_OFFLOAD = MANDATORY | DISABLED | DEFAULT
581+
```
582+
- Quick way to turn off off-load (`DISABLED`) or make it abort if a GPU isn't found (`MANDATORY`)
583+
- Great first test: does the problem disappear when you drop back to the CPU?
584+
585+
```bash
586+
OMP_THREAD_LIMIT=<positive_integer>
587+
```
588+
- Sets the maximum number of OpenMP threads to use in a contention group
589+
- Might be useful in checking for issues with contention or race conditions
590+
591+
```bash
592+
OMP_DISPLAY_AFFINITY=TRUE
593+
```
594+
- Will display affinity bindings for each OpenMP thread, containing hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding.
595+
596+
## Cray Compiler Tools
597+
598+
### Cray General Options
599+
600+
```bash
601+
CRAY_ACC_DEBUG: 0 (off), 1, 2, 3 (very noisy)
602+
```
603+
604+
- Dumps a time-stamped log line (`ACC: ...`) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU".
605+
- Outputs on STDERR by default. Can be changed by setting `CRAY_ACC_DEBUG_FILE`.
606+
- Recognizes `stderr`, `stdout`, and `process`.
607+
- `process` automatically generates a new file based on `pid` (each MPI process will have a different file)
608+
- While this environment variable specifies ACC, it can be used for both OpenACC and OpenMP
609+
610+
```bash
611+
CRAY_ACC_FORCE_EARLY_INIT=1
612+
```
613+
614+
- Force full GPU initialization at program start so you can see start-up hangs immediately
615+
- Default behavior without an environment variable is to defer initialization on first use
616+
- Device initialization includes initializing the GPU vendor’s low-level device runtime library (e.g., libcuda for NVIDIA GPUs) and establishing all necessary software contexts for interacting with the device
617+
618+
### Cray OpenACC Options
619+
620+
```bash
621+
CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1
622+
```
623+
- Will cause `acc_present_dump()` to output variable names and file locations in addition to variable mappings
624+
- Add `acc_present_dump()` around hotspots to help find problems with data movements
625+
- Helps more if adding `CRAY_ACC_DEBUG` environment variable
626+
627+
## NVHPC Compiler Options
628+
629+
### NVHPC General Options
630+
631+
```bash
632+
STATIC_RANDOM_SEED=1
633+
```
634+
- Forces the seed returned by `RANDOM_SEED` to be constant, so it generates the same sequence of random numbers
635+
- Useful for testing issues with randomized data
636+
637+
```bash
638+
NVCOMPILER_TERM=option[,option]
639+
```
640+
- `[no]debug`: Enables/disables just-in-time debugging (debugging invoked on error)
641+
- `[no]trace`: Enables/disables stack traceback on error
642+
643+
### NVHPC OpenACC Options
644+
645+
```bash
646+
NVCOMPILER_ACC_NOTIFY= <bitmask>
647+
```
648+
- Assign the environment variable to a bitmask to print out information to stderr for the following
649+
- kernel launches: 1
650+
- data transfers: 2
651+
- region entry/exit: 4
652+
- wait operation of synchronizations with the device: 8
653+
- device memory allocations and deallocations: 16
654+
- 1 (kernels only) is the usual first step.3 (kernels + copies) is great for "why is it so slow?"
655+
656+
```bash
657+
NVCOMPILER_ACC_TIME=1
658+
```
659+
- Lightweight profiler
660+
- prints a tidy end-of-run table with per-region and per-kernel times and bytes moved
661+
- Do not use with CUDA profiler at the same time
662+
663+
```bash
664+
NVCOMPILER_ACC_DEBUG=1
665+
```
666+
- Spews everything the runtime sees: host/device addresses, mapping events, present-table look-ups, etc.
667+
- Great for "partially present" or "pointer went missing" errors.
668+
- [Doc for NVCOMPILER_ACC_DEBUG](https://docs.nvidia.com/hpc-sdk/archive/20.9/pdf/hpc209openacc_gs.pdf)
669+
- Ctrl+F for `NVCOMPILER_ACC_DEBUG`
670+
671+
### NVHPC OpenMP Options
672+
673+
```bash
674+
LIBOMPTARGET_PROFILE=run.json
675+
```
676+
- Emits a Chrome-trace (JSON) timeline you can open in chrome://tracing or Speedscope
677+
- Great lightweight profiler when Nsight is overkill.
678+
- Granularity in µs via `LIBOMPTARGET_PROFILE_GRANULARITY` (default 500).
679+
680+
```bash
681+
LIBOMPTARGET_INFO=<bitmask>
682+
```
683+
- Prints out different types of runtime information
684+
- Human-readable log of data-mapping inserts/updates, kernel launches, copies, waits.
685+
- Perfect first stop for "why is nothing copied?"
686+
- Flags
687+
- Print all data arguments upon entering an OpenMP device kernel: 0x01
688+
- Indicate when a mapped address already exists in the device mapping table: 0x02
689+
- Dump the contents of the device pointer map at kernel exit: 0x04
690+
- Indicate when an entry is changed in the device mapping table: 0x08
691+
- Print OpenMP kernel information from device plugins: 0x10
692+
- Indicate when data is copied to and from the device: 0x20
693+
694+
```bash
695+
LIBOMPTARGET_DEBUG=1
696+
```
697+
- Developer-level trace (host-side)
698+
- Much noisier than `INFO`
699+
- Only works if the runtime was built with `-DOMPTARGET_DEBUG`.
700+
701+
```bash
702+
LIBOMPTARGET_JIT_OPT_LEVEL=-O{0,1,2,3}
703+
```
704+
- This environment variable can be used to change the optimization pipeline used to optimize the embedded device code as part of the device JIT.
705+
- The value corresponds to the `-O{0,1,2,3}` command line argument passed to clang.
706+
707+
```bash
708+
LIBOMPTARGET_JIT_SKIP_OPT=1
709+
```
710+
- This environment variable can be used to skip the optimization pipeline during JIT compilation.
711+
- If set, the image will only be passed through the backend.
712+
- The backend is invoked with the `LIBOMPTARGET_JIT_OPT_LEVEL` flag.
713+
714+
## Compiler Documentation
715+
716+
- [Cray & OpenMP Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openmp.7.html#environment-variables)
717+
- [Cray & OpenACC Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openacc.7.html#environment-variables)
718+
- [NVHPC & OpenACC Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#environment-variables)
719+
- [NVHPC & OpenMP Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#id2)
720+
- [LLVM & OpenMP Docs](https://openmp.llvm.org/design/Runtimes.html)
721+
- NVHPC is built on top of LLVM
722+
- [OpenMP Docs](https://www.openmp.org/spec-html/5.1/openmp.html)
723+
- [OpenACC Docs](https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.7.pdf)

docs/documentation/readme.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,14 @@
33
## User Documentation
44

55
- [Getting Started](getting-started.md)
6-
- [Testing MFC](testing.md)
6+
- [Testing](testing.md)
77
- [Case Files](case.md)
88
- [Example Cases](examples.md)
9-
- [Running MFC](running.md)
9+
- [Running](running.md)
1010
- [Flow Visualization](visualization.md)
1111
- [Performance](expectedPerformance.md)
12-
- [GPU Parallelization](gpuParallelization.md)
13-
- [GPU Debugging](gpuDebugging.md)
14-
- [MFC's Authors](authors.md)
12+
- [GPU Offloading](gpuParallelization.md)
13+
- [Authors](authors.md)
1514
- [References](references.md)
1615

1716
## Code/API Documentation

0 commit comments

Comments
 (0)