You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Quick way to turn off off-load (`DISABLED`) or make it abort if a GPU isn't found (`MANDATORY`)
583
+
- Great first test: does the problem disappear when you drop back to the CPU?
584
+
585
+
```bash
586
+
OMP_THREAD_LIMIT=<positive_integer>
587
+
```
588
+
- Sets the maximum number of OpenMP threads to use in a contention group
589
+
- Might be useful in checking for issues with contention or race conditions
590
+
591
+
```bash
592
+
OMP_DISPLAY_AFFINITY=TRUE
593
+
```
594
+
- Will display affinity bindings for each OpenMP thread, containing hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding.
595
+
596
+
## Cray Compiler Tools
597
+
598
+
### Cray General Options
599
+
600
+
```bash
601
+
CRAY_ACC_DEBUG: 0 (off), 1, 2, 3 (very noisy)
602
+
```
603
+
604
+
- Dumps a time-stamped log line (`ACC: ...`) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU".
605
+
- Outputs on STDERR by default. Can be changed by setting `CRAY_ACC_DEBUG_FILE`.
606
+
- Recognizes `stderr`, `stdout`, and `process`.
607
+
-`process` automatically generates a new file based on `pid` (each MPI process will have a different file)
608
+
- While this environment variable specifies ACC, it can be used for both OpenACC and OpenMP
609
+
610
+
```bash
611
+
CRAY_ACC_FORCE_EARLY_INIT=1
612
+
```
613
+
614
+
- Force full GPU initialization at program start so you can see start-up hangs immediately
615
+
- Default behavior without an environment variable is to defer initialization on first use
616
+
- Device initialization includes initializing the GPU vendor’s low-level device runtime library (e.g., libcuda for NVIDIA GPUs) and establishing all necessary software contexts for interacting with the device
617
+
618
+
### Cray OpenACC Options
619
+
620
+
```bash
621
+
CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1
622
+
```
623
+
- Will cause `acc_present_dump()` to output variable names and file locations in addition to variable mappings
624
+
- Add `acc_present_dump()` around hotspots to help find problems with data movements
625
+
- Helps more if adding `CRAY_ACC_DEBUG` environment variable
626
+
627
+
## NVHPC Compiler Options
628
+
629
+
### NVHPC General Options
630
+
631
+
```bash
632
+
STATIC_RANDOM_SEED=1
633
+
```
634
+
- Forces the seed returned by `RANDOM_SEED` to be constant, so it generates the same sequence of random numbers
635
+
- Useful for testing issues with randomized data
636
+
637
+
```bash
638
+
NVCOMPILER_TERM=option[,option]
639
+
```
640
+
-`[no]debug`: Enables/disables just-in-time debugging (debugging invoked on error)
641
+
-`[no]trace`: Enables/disables stack traceback on error
642
+
643
+
### NVHPC OpenACC Options
644
+
645
+
```bash
646
+
NVCOMPILER_ACC_NOTIFY= <bitmask>
647
+
```
648
+
- Assign the environment variable to a bitmask to print out information to stderr for the following
649
+
- kernel launches: 1
650
+
- data transfers: 2
651
+
- region entry/exit: 4
652
+
- wait operation of synchronizations with the device: 8
653
+
- device memory allocations and deallocations: 16
654
+
- 1 (kernels only) is the usual first step.3 (kernels + copies) is great for "why is it so slow?"
655
+
656
+
```bash
657
+
NVCOMPILER_ACC_TIME=1
658
+
```
659
+
- Lightweight profiler
660
+
- prints a tidy end-of-run table with per-region and per-kernel times and bytes moved
661
+
- Do not use with CUDA profiler at the same time
662
+
663
+
```bash
664
+
NVCOMPILER_ACC_DEBUG=1
665
+
```
666
+
- Spews everything the runtime sees: host/device addresses, mapping events, present-table look-ups, etc.
667
+
- Great for "partially present" or "pointer went missing" errors.
668
+
-[Doc for NVCOMPILER_ACC_DEBUG](https://docs.nvidia.com/hpc-sdk/archive/20.9/pdf/hpc209openacc_gs.pdf)
669
+
- Ctrl+F for `NVCOMPILER_ACC_DEBUG`
670
+
671
+
### NVHPC OpenMP Options
672
+
673
+
```bash
674
+
LIBOMPTARGET_PROFILE=run.json
675
+
```
676
+
- Emits a Chrome-trace (JSON) timeline you can open in chrome://tracing or Speedscope
677
+
- Great lightweight profiler when Nsight is overkill.
678
+
- Granularity in µs via `LIBOMPTARGET_PROFILE_GRANULARITY` (default 500).
679
+
680
+
```bash
681
+
LIBOMPTARGET_INFO=<bitmask>
682
+
```
683
+
- Prints out different types of runtime information
684
+
- Human-readable log of data-mapping inserts/updates, kernel launches, copies, waits.
685
+
- Perfect first stop for "why is nothing copied?"
686
+
- Flags
687
+
- Print all data arguments upon entering an OpenMP device kernel: 0x01
688
+
- Indicate when a mapped address already exists in the device mapping table: 0x02
689
+
- Dump the contents of the device pointer map at kernel exit: 0x04
690
+
- Indicate when an entry is changed in the device mapping table: 0x08
691
+
- Print OpenMP kernel information from device plugins: 0x10
692
+
- Indicate when data is copied to and from the device: 0x20
693
+
694
+
```bash
695
+
LIBOMPTARGET_DEBUG=1
696
+
```
697
+
- Developer-level trace (host-side)
698
+
- Much noisier than `INFO`
699
+
- Only works if the runtime was built with `-DOMPTARGET_DEBUG`.
700
+
701
+
```bash
702
+
LIBOMPTARGET_JIT_OPT_LEVEL=-O{0,1,2,3}
703
+
```
704
+
- This environment variable can be used to change the optimization pipeline used to optimize the embedded device code as part of the device JIT.
705
+
- The value corresponds to the `-O{0,1,2,3}` command line argument passed to clang.
706
+
707
+
```bash
708
+
LIBOMPTARGET_JIT_SKIP_OPT=1
709
+
```
710
+
- This environment variable can be used to skip the optimization pipeline during JIT compilation.
711
+
- If set, the image will only be passed through the backend.
712
+
- The backend is invoked with the `LIBOMPTARGET_JIT_OPT_LEVEL` flag.
0 commit comments