Skip to content

Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.#11867

Open
aendk wants to merge 4 commits intoggml-org:masterfrom
aendk:akieslinger/reduce_cuda_graph_cpu_overhead
Open

Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.#11867
aendk wants to merge 4 commits intoggml-org:masterfrom
aendk:akieslinger/reduce_cuda_graph_cpu_overhead

Commits

Commits on Feb 13, 2025

Commits on Feb 14, 2025