Skip to content

Commit 6b02723

Browse files
obrotowyigcbot
authored andcommitted
IGC_StackOverflowDetection documentation
Add StackOverflowDetection.md documentation.
1 parent 8929bc7 commit 6b02723

File tree

3 files changed

+89
-4
lines changed

3 files changed

+89
-4
lines changed

IGC/Compiler/Optimizer/OpenCLPasses/PrivateMemory/PrivateMemoryResolution.cpp

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -193,9 +193,8 @@ void PrivateMemoryResolution::expandPrivateMemoryForVla(uint32_t &maxPrivateMem)
193193
"You can change this size by setting environmental variable IGC_ForcePerThreadPrivateMemorySize to a value in "
194194
"range [1024:20480]. "
195195
"Greater values can affect performance, and lower ones may lead to incorrect results of your program.\n"
196-
"To make sure your program runs correctly you can set environmental variable IGC_StackOverflowDetection=1. "
197-
"This flag will print \"Stack overflow detected!\" if insufficient memory value has led to stack overflow. "
198-
"It should be used for debugging only as it affects performance.";
196+
"To make sure your program runs correctly you can use IGC_StackOverflowDetection feature. See documentation:\n"
197+
"https://github.com/intel/intel-graphics-compiler/tree/master/documentation/igc/StackOverflowDetection/StackOverflowDetection.md";
199198

200199
getAnalysis<CodeGenContextWrapper>().getCodeGenContext()->EmitWarning(fullWarningMessage.c_str());
201200
}
@@ -369,6 +368,17 @@ bool PrivateMemoryResolution::runOnModule(llvm::Module &M) {
369368
if (FG->hasStackCall()) {
370369
// Analyze call depth for stack memory required
371370
maxPrivateMem = AnalyzeCGPrivateMemUsage(pKernel);
371+
std::string maxPrivateMemValue = std::to_string(maxPrivateMem);
372+
std::string fullWarningMessage =
373+
"Stack call has been detected, the private memory size is set to " + maxPrivateMemValue +
374+
"B. "
375+
"You can change this size by setting environmental variable IGC_ForcePerThreadPrivateMemorySize to a value in "
376+
"range [1024:20480]. "
377+
"Greater values can affect performance, and lower ones may lead to incorrect results of your program.\n"
378+
"To make sure your program runs correctly you can use StackOverflowDetection feature. See documentation:\n"
379+
"https://github.com/intel/intel-graphics-compiler/tree/master/documentation/igc/StackOverflowDetection/StackOverflowDetection.md";
380+
381+
getAnalysis<CodeGenContextWrapper>().getCodeGenContext()->EmitWarning(fullWarningMessage.c_str());
372382
}
373383
if (((FG->hasIndirectCall() && FG->hasPartialCallGraph()) || FG->hasRecursion()) &&
374384
Ctx.type != ShaderType::RAYTRACING_SHADER) {

IGC/common/igc_flags.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -900,7 +900,7 @@ DECLARE_IGC_REGKEY(bool, UseVMaskPredicate, false, "Use VMask as predicate for s
900900
DECLARE_IGC_REGKEY(bool, UseVMaskPredicateForLoads, true, "Use VMask as predicate for subspan usage (loads only)", true)
901901
DECLARE_IGC_REGKEY(bool, UseVMaskPredicateForIndirectMove, true,
902902
"Use VMask as predicate for subspan usage (indirect mov only)", true)
903-
DECLARE_IGC_REGKEY(bool, StackOverflowDetection, false, "Inserts checks for stack overflow when stack calls are used.",
903+
DECLARE_IGC_REGKEY(bool, StackOverflowDetection, false, "Inserts checks for stack overflow when stack calls or VLAs are used. See documentation: documentation/igc/StackOverflowDetection/StackOverflowDetection.md",
904904
true)
905905
DECLARE_IGC_REGKEY(bool, BufferBoundsChecking, false, "Setting this to 1 (true) enables buffer bounds checking", true)
906906
DECLARE_IGC_REGKEY(DWORD, MinimumValidAddress, 0,
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
To check for kernel stack overflow, set [configuration flag](https://github.com/intel/intel-graphics-compiler/blob/master/documentation/configuration_flags.md) `IGC_StackOverflowDetection=1`.
2+
Stack overflow can happen if the kernel uses non-inlined stack calls (such as recursion) or the Variable Length Array (VLA) feature.
3+
4+
This adds checks on each stack pointer modification that catch kernel private memory overflow. Note that enabling this feature will reduce kernel performance, so it should be used only for debugging.
5+
6+
If overflow is detected, the message "Stack overflow detected!" will be printed to the console and kernel will throw an assert.
7+
8+
You can change the default stack memory size with the `IGC_ForcePerThreadPrivateMemorySize` flag, using a value between 1024 and 20480. For example `IGC_ForcePerThreadPrivateMemorySize=4096`. Setting a higher value may impact performance.
9+
10+
## Usage examples
11+
### Recursive stack calls
12+
This OpenCL C kernel will lead to stack overflow for sufficiently large `n`. We will detect this stack overflow if the kernel is compiled with `IGC_StackOverflowDetection=1` flag.
13+
```c
14+
int fact(int n) {
15+
return n < 2 ? 1 : n*fact(n-1);
16+
}
17+
kernel void test_recursive(global int* out, int n) {
18+
out[0] = fact(n);
19+
}
20+
```
21+
### VLA feature
22+
This uses VLA feature via OpenMP Fortran. Increasing `m` will lead to stack overflow.
23+
```fortran
24+
program test_vla
25+
implicit none
26+
integer, parameter :: n = 1000, m = 600
27+
real, allocatable :: a(:), b(:)
28+
integer :: i, j
29+
30+
allocate(a(n), b(m))
31+
!$omp target teams distribute parallel do private(b)
32+
do i = 1, n
33+
b(1) = i
34+
do j = 2, m
35+
b(j) = b(j-1) + 1
36+
enddo
37+
a(i) = b(m)
38+
enddo
39+
40+
do i = 1, n
41+
write(*,*) 'Index ', i, ' Computed ', a(i)
42+
enddo
43+
44+
deallocate(a, b)
45+
end program test_vla
46+
```
47+
48+
Compiling this code with `ifx -fiopenmp -fopenmp-targets=spir64_gen -Xopenmp-target-backend "-device pvc" ./test.F90 -o dynam.AOT` emits following warning.
49+
```
50+
warning: VLA has been detected, the private memory size is set to 4240B.
51+
You can change this size by setting environmental variable IGC_ForcePerThreadPrivateMemorySize to a value in range [1024:20480].
52+
Greater values can affect performance, and lower ones may lead to incorrect results of your program.
53+
To make sure your program runs correctly you can set environmental variable IGC_StackOverflowDetection=1.
54+
This flag will print "Stack overflow detected!" if insufficient memory value has led to stack overflow.
55+
It should be used for debugging only as it affects performance.
56+
```
57+
58+
Compiling with `IGC_StackOverflowDetection=1 ifx -fiopenmp -fopenmp-targets=spir64_gen -Xopenmp-target-backend "-device pvc" ./test.F90 -o dynam.AOT` and then executing it will result in
59+
```
60+
Stack overflow detected!
61+
...
62+
Stack overflow detected!
63+
Stack overflow detected!
64+
Stack overflow detected!
65+
AssertHandler::printMessage
66+
forrtl: error (76): Abort trap signal
67+
```
68+
69+
We can adjust the private kernel memory size by `IGC_ForcePerThreadPrivateMemorySize` so our VLA won't overflow stack anymore. `IGC_StackOverflowDetection=1 IGC_ForcePerThreadPrivateMemorySize=8192 ifx -fiopenmp -fopenmp-targets=spir64_gen -Xopenmp-target-backend "-device pvc" ./test.F90 -o dynam.AOT` results in proper program execution.
70+
71+
```
72+
Index 1 Computed 600.0000
73+
...
74+
Index 1000 Computed 1599.000
75+
```

0 commit comments

Comments
 (0)