Skip to content

Commit ad6f6a1

Browse files
committed
llm: Change memory allocation backoff from exponential to incremental
If we create a memory layout that should fit based on report free VRAM but allocation still fails, we start applying a backoff. This reduces free VRAM by an exponential percentage (1%, 2%, 4%...). However, the points chosen tend to be too dense at the beginning and too sparse at the end. Therefore, this switches to an incremental backoff (10%, 20%, 30%...).
1 parent 6723a40 commit ad6f6a1

File tree

1 file changed

+2
-5
lines changed

1 file changed

+2
-5
lines changed

llm/server.go

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -766,15 +766,12 @@ nextOperation:
766766
// Memory allocation failed even though we created a layout that we thought should
767767
// fit in available memory. This could happen if either our free memory reports
768768
// are incorrect or if available memory is changing between layout and allocation
769-
// time. Apply an exponential backoff to try to find the real amount of available
770-
// space.
769+
// time. Apply a backoff to try to find the real amount of available space.
771770
if backoff > 1 {
772771
slog.Warn("memory layout cannot be allocated", "memory", resp.Memory)
773772
return nil, errors.New("memory layout cannot be allocated")
774-
} else if backoff == 0 {
775-
backoff = 0.01
776773
} else {
777-
backoff *= 2
774+
backoff += 0.1
778775
}
779776

780777
slog.Info("model layout did not fit, applying backoff", "backoff", fmt.Sprintf("%.2f", backoff))

0 commit comments

Comments
 (0)