Skip to content

Commit 92a9392

Browse files
committed
Updated arg.cpp instead of auto-generated README.md
1 parent 59940ef commit 92a9392

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

common/arg.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1174,15 +1174,15 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
11741174
).set_env("LLAMA_ARG_NO_KV_OFFLOAD"));
11751175
add_opt(common_arg(
11761176
{"-ctk", "--cache-type-k"}, "TYPE",
1177-
string_format("KV cache data type for K (default: %s)", params.cache_type_k.c_str()),
1177+
string_format("KV cache data type for K : f16, f32, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1 (default: %s)", params.cache_type_k.c_str()),
11781178
[](common_params & params, const std::string & value) {
11791179
// TODO: get the type right here
11801180
params.cache_type_k = value;
11811181
}
11821182
).set_env("LLAMA_ARG_CACHE_TYPE_K"));
11831183
add_opt(common_arg(
11841184
{"-ctv", "--cache-type-v"}, "TYPE",
1185-
string_format("KV cache data type for V (default: %s)", params.cache_type_v.c_str()),
1185+
string_format("KV cache data type for V : : f16, f32, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1 (default: %s)", params.cache_type_v.c_str()),
11861186
[](common_params & params, const std::string & value) {
11871187
// TODO: get the type right here
11881188
params.cache_type_v = value;

examples/server/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ The project is under active development, and we are [looking for feedback and co
6262
| `--yarn-beta-fast N` | YaRN: low correction dim or beta (default: 32.0)<br/>(env: LLAMA_ARG_YARN_BETA_FAST) |
6363
| `-dkvc, --dump-kv-cache` | verbose print of the KV cache |
6464
| `-nkvo, --no-kv-offload` | disable KV offload<br/>(env: LLAMA_ARG_NO_KV_OFFLOAD) |
65-
| `-ctk, --cache-type-k TYPE` | KV cache data type for K (default: f16, f32, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1)<br/>(env: LLAMA_ARG_CACHE_TYPE_K) |
66-
| `-ctv, --cache-type-v TYPE` | KV cache data type for V (default: f16, f32, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1)<br/>(env: LLAMA_ARG_CACHE_TYPE_V) |
65+
| `-ctk, --cache-type-k TYPE` | KV cache data type for K (default: f16)<br/>(env: LLAMA_ARG_CACHE_TYPE_K) |
66+
| `-ctv, --cache-type-v TYPE` | KV cache data type for V (default: f16)<br/>(env: LLAMA_ARG_CACHE_TYPE_V) |
6767
| `-dt, --defrag-thold N` | KV cache defragmentation threshold (default: 0.1, < 0 - disabled)<br/>(env: LLAMA_ARG_DEFRAG_THOLD) |
6868
| `-np, --parallel N` | number of parallel sequences to decode (default: 1)<br/>(env: LLAMA_ARG_N_PARALLEL) |
6969
| `--mlock` | force system to keep model in RAM rather than swapping or compressing<br/>(env: LLAMA_ARG_MLOCK) |

0 commit comments

Comments
 (0)