-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Update the docs on -t --threads #16236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Are we sure this is correct? I was chatting with @doringeman about this recently. I tested this is the past and the default was certainly low, 4 threads (unless something changed in the meantime): Line 228 in 3a59971
It was super apparent in the past when testing on an Ampere system with 100+ cores |
Oh, my bad, must have been an oversight, I will go through the code and confirm. In any case the docs could benefit from more descriptive wording, thanks for the review. |
I can confirm that it now by default uses 100+ threads when you have 100+ cores. As this is likely not desired (but on the other hand the -1 default likely makes sense for the majority of users) the improved documentation is very valuable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table is auto-generated, changes here will be discarded. Make your changes to arg.cpp
instead.
llama.cpp/tools/server/README.md
Line 25 in eba9734
<!-- Note for contributors: The list below is generated by llama-gen-docs --> |
This reverts commit eba9734.
… all available cores
Addressed and added to correct file. |
common/arg.cpp
Outdated
add_opt(common_arg( | ||
{"-t", "--threads"}, "N", | ||
string_format("number of threads to use during generation (default: %d)", params.cpuparams.n_threads), | ||
string_format("number of CPU threads to use during generation (default: %d, use all available.)", params.cpuparams.n_threads), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is not to use all available, the logic is more complex than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have taken out the "use all available" part, but kept the CPU because it makes it clearer. I will take a look at the logic and put up another PR with more descriptive message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slaren I was thinking the same, saw it being set as 4 on an Ampere ARM machine with a bazillion cores in the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some logic to avoid using logical cores (e.g. from SMT), but it may not work well in the Ampere CPU.
It's a documentation update for more clear wording on what -t parameter actually means.