Skip to content

Conversation

@malte-j
Copy link

@malte-j malte-j commented Jun 26, 2025

mmap actually slows down data loading if all layers are offloaded to the GPU and VRAM < RAM. I tested this using an A100 and Gemma3, and these are the results:

RAM mmap load time
40GB  no 14.3s 
40GB yes 33.677s
8GB no 14.490s
8GB yes 1.07m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant