Skip to content
Discussion options

You must be logged in to vote

"The KV caching mechanism uses forward hooks which are installed in the module objects and will cause race issues when used by multiple threads. The --threads option provides a more low-level control on how the CPU operations are parallelized, but it's less relevant if you're using GPU.

Even if you disabled KV caching, multi-threaded usage will be generally inefficient because of the GIL. "

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants