Worse performance with --amx #3

randomqhacker · 2025-09-15T19:52:30Z

randomqhacker
Sep 15, 2025

Gen 4 (Sapphire Rapids) Xeon CPU MAX 9480 w/ 64GB HBM + RTX 4070 Ti Super

Build script (probably redundant stuff in there):
CUDACXX=/usr/local/cuda/bin/nvcc cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_NATIVE=ON \
-DGGML_AVX512=ON \
-DGGML_AVX512_BF16=ON \
-DGGML_AVX512_VBMI=ON \
-DGGML_AVX512_VNNI=ON \
-DGGML_AMX=ON \
-DGGML_AMX_TILE=ON \
-DGGML_AMX_INT8=ON \
-DGGML_AMX_BF16=ON \
-DGGML_CUDA=ON \
-DGGML_CUDA_ARCH=89 \
-DCMAKE_CXX_FLAGS="-O3 -march=sapphirerapids -mtune=sapphirerapids"
cmake --build build --config Release -j 56

Launch script (without and with --amx):
echo 3 > /proc/sys/vm/drop_caches
numactl --interleave=0,1,2,3 \
build/bin/llama-cli --jinja \
-m /quants/GLM-4.5-Air-Q4_K_S-00001-of-00002.gguf \
-ngl 999 --n-cpu-moe 40 --amx \
-c 16384 -fa on --numa distribute -t 56 -n 512 -p "Write a complete novel about the AI Takeover." -no-cnv

Without AMX:
load_tensors: offloaded 48/48 layers to GPU
load_tensors: CUDA0 model buffer size = 11458.75 MiB
load_tensors: CPU_Mapped model buffer size = 46976.56 MiB
load_tensors: CPU_Mapped model buffer size = 6183.31 MiB
llama_perf_context_print: prompt eval time = 77.53 ms / 4 tokens ( 19.38 ms per token, 51.59 tokens per second)
llama_perf_context_print: eval time = 14752.42 ms / 511 runs ( 28.87 ms per token, 34.64 tokens per second)

With AMX:
load_tensors: offloaded 48/48 layers to GPU
load_tensors: CUDA0 model buffer size = 11458.75 MiB
load_tensors: CPU_REPACK model buffer size = 30888.00 MiB
load_tensors: CPU_Mapped model buffer size = 46976.56 MiB
load_tensors: CPU_Mapped model buffer size = 4523.67 MiB
llama_perf_context_print: prompt eval time = 576.45 ms / 4 tokens ( 144.11 ms per token, 6.94 tokens per second)
llama_perf_context_print: eval time = 21230.37 ms / 511 runs ( 41.55 ms per token, 24.07 tokens per second)

I did a few different runs, and it always seems slower for some reason.

Benchmark command gave this error:
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti SUPER, compute capability 8.9, VMM: yes

model	size	params	backend	ngl	n_batch	nopo	test	t/s
main: error: failed to load model '/quants/GLM-4.5-Air-Q4_K_S-00001-of-00002.gguf'
(It gave that error on single-file models as well.)

randomqhacker · 2025-09-15T20:02:10Z

randomqhacker
Sep 15, 2025
Author

Tried with FA off, still slower...

Without AMX, FA off:
llama_perf_context_print: prompt eval time = 213.56 ms / 10 tokens ( 21.36 ms per token, 46.83 tokens per second)
llama_perf_context_print: eval time = 19706.92 ms / 511 runs ( 38.57 ms per token, 25.93 tokens per second)

With AMX, FA off:
llama_perf_context_print: prompt eval time = 3134.65 ms / 10 tokens ( 313.46 ms per token, 3.19 tokens per second)
llama_perf_context_print: eval time = 31223.72 ms / 511 runs ( 61.10 ms per token, 16.37 tokens per second)

4 replies

Gadflyii Sep 15, 2025
Maintainer

Can you run with verbose? (-v)?

Are you running in HMB only mode, or as level 4 cache?

randomqhacker Sep 15, 2025
Author

HBM Only mode, don't even have any DIMMs installed. Here's the complete run with --amx -v:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti SUPER, compute capability 8.9, VMM: yes
build: 6461 (71cc890) with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
main: llama backend init
/proc/sys/kernel/numa_balancing is enabled, this has been observed to impair performance
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4070 Ti SUPER) (0000:be:00.0) - 15726 MiB free
llama_model_loader: additional 1 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 55 key-value pairs and 803 tensors from /mnt/vm100/quants/GLM-4.5-Air-Q4_K_S-00001-of-00002.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = glm4moe
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Glm-4.5-Air
llama_model_loader: - kv 3: general.basename str = Glm-4.5-Air
llama_model_loader: - kv 4: general.quantized_by str = Unsloth
llama_model_loader: - kv 5: general.size_label str = 128x9.4B
llama_model_loader: - kv 6: general.license str = mit
llama_model_loader: - kv 7: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 8: general.base_model.count u32 = 1
llama_model_loader: - kv 9: general.base_model.0.name str = GLM 4.5 Air
llama_model_loader: - kv 10: general.base_model.0.organization str = Zai Org
llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/zai-org/GLM-4....
llama_model_loader: - kv 12: general.tags arr[str,2] = ["unsloth", "text-generation"]
llama_model_loader: - kv 13: general.languages arr[str,2] = ["en", "zh"]
llama_model_loader: - kv 14: glm4moe.block_count u32 = 47
llama_model_loader: - kv 15: glm4moe.context_length u32 = 131072
llama_model_loader: - kv 16: glm4moe.embedding_length u32 = 4096
llama_model_loader: - kv 17: glm4moe.feed_forward_length u32 = 10944
llama_model_loader: - kv 18: glm4moe.attention.head_count u32 = 96
llama_model_loader: - kv 19: glm4moe.attention.head_count_kv u32 = 8
llama_model_loader: - kv 20: glm4moe.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 21: glm4moe.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 22: glm4moe.expert_used_count u32 = 8
llama_model_loader: - kv 23: glm4moe.attention.key_length u32 = 128
llama_model_loader: - kv 24: glm4moe.attention.value_length u32 = 128
llama_model_loader: - kv 25: glm4moe.rope.dimension_count u32 = 64
llama_model_loader: - kv 26: glm4moe.expert_count u32 = 128
llama_model_loader: - kv 27: glm4moe.expert_feed_forward_length u32 = 1408
llama_model_loader: - kv 28: glm4moe.expert_shared_count u32 = 1
llama_model_loader: - kv 29: glm4moe.leading_dense_block_count u32 = 1
llama_model_loader: - kv 30: glm4moe.expert_gating_func u32 = 2
llama_model_loader: - kv 31: glm4moe.expert_weights_scale f32 = 1.000000
llama_model_loader: - kv 32: glm4moe.expert_weights_norm bool = true
llama_model_loader: - kv 33: glm4moe.nextn_predict_layers u32 = 1
llama_model_loader: - kv 34: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 35: tokenizer.ggml.pre str = glm4
llama_model_loader: - kv 36: tokenizer.ggml.tokens arr[str,151552] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 37: tokenizer.ggml.token_type arr[i32,151552] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 38: tokenizer.ggml.merges arr[str,318088] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 39: tokenizer.ggml.eos_token_id u32 = 151329
llama_model_loader: - kv 40: tokenizer.ggml.padding_token_id u32 = 151330
llama_model_loader: - kv 41: tokenizer.ggml.bos_token_id u32 = 151331
llama_model_loader: - kv 42: tokenizer.ggml.eot_token_id u32 = 151336
llama_model_loader: - kv 43: tokenizer.ggml.unknown_token_id u32 = 151329
llama_model_loader: - kv 44: tokenizer.ggml.eom_token_id u32 = 151338
llama_model_loader: - kv 45: tokenizer.chat_template str = [gMASK]\n{%- if tools -%}\n<|syste...
llama_model_loader: - kv 46: general.quantization_version u32 = 2
llama_model_loader: - kv 47: general.file_type u32 = 14
llama_model_loader: - kv 48: quantize.imatrix.file str = GLM-4.5-Air-GGUF/imatrix_unsloth.gguf
llama_model_loader: - kv 49: quantize.imatrix.dataset str = unsloth_calibration_GLM-4.5-Air.txt
llama_model_loader: - kv 50: quantize.imatrix.entries_count u32 = 502
llama_model_loader: - kv 51: quantize.imatrix.chunks_count u32 = 88
llama_model_loader: - kv 52: split.no u16 = 0
llama_model_loader: - kv 53: split.tensors.count i32 = 803
llama_model_loader: - kv 54: split.count u16 = 2
llama_model_loader: - type f32: 331 tensors
llama_model_loader: - type q5_0: 84 tensors
llama_model_loader: - type q5_1: 9 tensors
llama_model_loader: - type q4_K: 374 tensors
llama_model_loader: - type q5_K: 4 tensors
llama_model_loader: - type q6_K: 1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Small
print_info: file size = 62.27 GiB (4.84 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151363 '<|image|>' is not marked as EOG
load: control token: 151362 '<|end_of_box|>' is not marked as EOG
load: control token: 151361 '<|begin_of_box|>' is not marked as EOG
load: control token: 151349 '<|code_suffix|>' is not marked as EOG
load: control token: 151348 '<|code_middle|>' is not marked as EOG
load: control token: 151346 '<|end_of_transcription|>' is not marked as EOG
load: control token: 151343 '<|begin_of_audio|>' is not marked as EOG
load: control token: 151342 '<|end_of_video|>' is not marked as EOG
load: control token: 151341 '<|begin_of_video|>' is not marked as EOG
load: control token: 151338 '<|observation|>' is not marked as EOG
load: control token: 151333 '' is not marked as EOG
load: control token: 151331 '[gMASK]' is not marked as EOG
load: control token: 151330 '[MASK]' is not marked as EOG
load: control token: 151347 '<|code_prefix|>' is not marked as EOG
load: control token: 151360 '/nothink' is not marked as EOG
load: control token: 151337 '<|assistant|>' is not marked as EOG
load: control token: 151332 '[sMASK]' is not marked as EOG
load: control token: 151334 '' is not marked as EOG
load: control token: 151335 '<|system|>' is not marked as EOG
load: control token: 151336 '<|user|>' is not marked as EOG
load: control token: 151340 '<|end_of_image|>' is not marked as EOG
load: control token: 151339 '<|begin_of_image|>' is not marked as EOG
load: control token: 151364 '<|video|>' is not marked as EOG
load: control token: 151345 '<|begin_of_transcription|>' is not marked as EOG
load: control token: 151344 '<|end_of_audio|>' is not marked as EOG
load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect
load: printing all EOG tokens:
load: - 151329 ('<|endoftext|>')
load: - 151336 ('<|user|>')
load: - 151338 ('<|observation|>')
load: special tokens cache size = 36
load: token to piece cache size = 0.9713 MB
print_info: arch = glm4moe
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 4096
print_info: n_layer = 47
print_info: n_head = 96
print_info: n_head_kv = 8
print_info: n_rot = 64
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 12
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 10944
print_info: n_expert = 128
print_info: n_expert_used = 8
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 131072
print_info: rope_finetuned = unknown
print_info: model type = 106B.A12B
print_info: model params = 110.47 B
print_info: general.name = Glm-4.5-Air
print_info: vocab type = BPE
print_info: n_vocab = 151552
print_info: n_merges = 318088
print_info: BOS token = 151331 '[gMASK]'
print_info: EOS token = 151329 '<|endoftext|>'
print_info: EOT token = 151336 '<|user|>'
print_info: EOM token = 151338 '<|observation|>'
print_info: UNK token = 151329 '<|endoftext|>'
print_info: PAD token = 151330 '[MASK]'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151347 '<|code_prefix|>'
print_info: FIM SUF token = 151349 '<|code_suffix|>'
print_info: FIM MID token = 151348 '<|code_middle|>'
print_info: EOG token = 151329 '<|endoftext|>'
print_info: EOG token = 151336 '<|user|>'
print_info: EOG token = 151338 '<|observation|>'
print_info: max token length = 1024
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer 0 assigned to device CUDA0, is_swa = 0
load_tensors: layer 1 assigned to device CUDA0, is_swa = 0
load_tensors: layer 2 assigned to device CUDA0, is_swa = 0
load_tensors: layer 3 assigned to device CUDA0, is_swa = 0
load_tensors: layer 4 assigned to device CUDA0, is_swa = 0
load_tensors: layer 5 assigned to device CUDA0, is_swa = 0
load_tensors: layer 6 assigned to device CUDA0, is_swa = 0
load_tensors: layer 7 assigned to device CUDA0, is_swa = 0
load_tensors: layer 8 assigned to device CUDA0, is_swa = 0
load_tensors: layer 9 assigned to device CUDA0, is_swa = 0
load_tensors: layer 10 assigned to device CUDA0, is_swa = 0
load_tensors: layer 11 assigned to device CUDA0, is_swa = 0
load_tensors: layer 12 assigned to device CUDA0, is_swa = 0
load_tensors: layer 13 assigned to device CUDA0, is_swa = 0
load_tensors: layer 14 assigned to device CUDA0, is_swa = 0
load_tensors: layer 15 assigned to device CUDA0, is_swa = 0
load_tensors: layer 16 assigned to device CUDA0, is_swa = 0
load_tensors: layer 17 assigned to device CUDA0, is_swa = 0
load_tensors: layer 18 assigned to device CUDA0, is_swa = 0
load_tensors: layer 19 assigned to device CUDA0, is_swa = 0
load_tensors: layer 20 assigned to device CUDA0, is_swa = 0
load_tensors: layer 21 assigned to device CUDA0, is_swa = 0
load_tensors: layer 22 assigned to device CUDA0, is_swa = 0
load_tensors: layer 23 assigned to device CUDA0, is_swa = 0
load_tensors: layer 24 assigned to device CUDA0, is_swa = 0
load_tensors: layer 25 assigned to device CUDA0, is_swa = 0
load_tensors: layer 26 assigned to device CUDA0, is_swa = 0
load_tensors: layer 27 assigned to device CUDA0, is_swa = 0
load_tensors: layer 28 assigned to device CUDA0, is_swa = 0
load_tensors: layer 29 assigned to device CUDA0, is_swa = 0
load_tensors: layer 30 assigned to device CUDA0, is_swa = 0
load_tensors: layer 31 assigned to device CUDA0, is_swa = 0
load_tensors: layer 32 assigned to device CUDA0, is_swa = 0
load_tensors: layer 33 assigned to device CUDA0, is_swa = 0
load_tensors: layer 34 assigned to device CUDA0, is_swa = 0
load_tensors: layer 35 assigned to device CUDA0, is_swa = 0
load_tensors: layer 36 assigned to device CUDA0, is_swa = 0
load_tensors: layer 37 assigned to device CUDA0, is_swa = 0
load_tensors: layer 38 assigned to device CUDA0, is_swa = 0
load_tensors: layer 39 assigned to device CUDA0, is_swa = 0
load_tensors: layer 40 assigned to device CUDA0, is_swa = 0
load_tensors: layer 41 assigned to device CUDA0, is_swa = 0
load_tensors: layer 42 assigned to device CUDA0, is_swa = 0
load_tensors: layer 43 assigned to device CUDA0, is_swa = 0
load_tensors: layer 44 assigned to device CUDA0, is_swa = 0
load_tensors: layer 45 assigned to device CUDA0, is_swa = 0
load_tensors: layer 46 assigned to device CUDA0, is_swa = 0
load_tensors: layer 47 assigned to device CUDA0, is_swa = 0
create_tensor: loading tensor token_embd.weight
create_tensor: loading tensor output_norm.weight
create_tensor: loading tensor output.weight
create_tensor: loading tensor blk.0.attn_norm.weight
create_tensor: loading tensor blk.0.attn_q.weight
create_tensor: loading tensor blk.0.attn_k.weight
create_tensor: loading tensor blk.0.attn_v.weight
create_tensor: loading tensor blk.0.attn_q.bias
create_tensor: loading tensor blk.0.attn_k.bias
create_tensor: loading tensor blk.0.attn_v.bias
create_tensor: loading tensor blk.0.attn_output.weight
create_tensor: loading tensor blk.0.post_attention_norm.weight
create_tensor: loading tensor blk.0.ffn_gate.weight
create_tensor: loading tensor blk.0.ffn_down.weight
create_tensor: loading tensor blk.0.ffn_up.weight
create_tensor: loading tensor blk.1.attn_norm.weight
create_tensor: loading tensor blk.1.attn_q.weight
create_tensor: loading tensor blk.1.attn_k.weight
create_tensor: loading tensor blk.1.attn_v.weight
create_tensor: loading tensor blk.1.attn_q.bias
create_tensor: loading tensor blk.1.attn_k.bias
create_tensor: loading tensor blk.1.attn_v.bias
create_tensor: loading tensor blk.1.attn_output.weight
create_tensor: loading tensor blk.1.post_attention_norm.weight
create_tensor: loading tensor blk.1.ffn_gate_inp.weight
create_tensor: loading tensor blk.1.exp_probs_b.bias
tensor blk.1.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.1.ffn_gate_exps.weight
tensor blk.1.ffn_down_exps.weight (528 MiB q5_1) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.1.ffn_down_exps.weight
tensor blk.1.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.1.ffn_up_exps.weight
create_tensor: loading tensor blk.1.ffn_gate_shexp.weight
create_tensor: loading tensor blk.1.ffn_down_shexp.weight
create_tensor: loading tensor blk.1.ffn_up_shexp.weight
create_tensor: loading tensor blk.2.attn_norm.weight
create_tensor: loading tensor blk.2.attn_q.weight
create_tensor: loading tensor blk.2.attn_k.weight
create_tensor: loading tensor blk.2.attn_v.weight
create_tensor: loading tensor blk.2.attn_q.bias
create_tensor: loading tensor blk.2.attn_k.bias
create_tensor: loading tensor blk.2.attn_v.bias
create_tensor: loading tensor blk.2.attn_output.weight
create_tensor: loading tensor blk.2.post_attention_norm.weight
create_tensor: loading tensor blk.2.ffn_gate_inp.weight
create_tensor: loading tensor blk.2.exp_probs_b.bias
tensor blk.2.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.2.ffn_gate_exps.weight
tensor blk.2.ffn_down_exps.weight (528 MiB q5_1) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.2.ffn_down_exps.weight
tensor blk.2.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.2.ffn_up_exps.weight
create_tensor: loading tensor blk.2.ffn_gate_shexp.weight
create_tensor: loading tensor blk.2.ffn_down_shexp.weight
create_tensor: loading tensor blk.2.ffn_up_shexp.weight
create_tensor: loading tensor blk.3.attn_norm.weight
create_tensor: loading tensor blk.3.attn_q.weight
create_tensor: loading tensor blk.3.attn_k.weight
create_tensor: loading tensor blk.3.attn_v.weight
create_tensor: loading tensor blk.3.attn_q.bias
create_tensor: loading tensor blk.3.attn_k.bias
create_tensor: loading tensor blk.3.attn_v.bias
create_tensor: loading tensor blk.3.attn_output.weight
create_tensor: loading tensor blk.3.post_attention_norm.weight
create_tensor: loading tensor blk.3.ffn_gate_inp.weight
create_tensor: loading tensor blk.3.exp_probs_b.bias
tensor blk.3.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.3.ffn_gate_exps.weight
tensor blk.3.ffn_down_exps.weight (528 MiB q5_1) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.3.ffn_down_exps.weight
tensor blk.3.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.3.ffn_up_exps.weight
create_tensor: loading tensor blk.3.ffn_gate_shexp.weight
create_tensor: loading tensor blk.3.ffn_down_shexp.weight
create_tensor: loading tensor blk.3.ffn_up_shexp.weight
create_tensor: loading tensor blk.4.attn_norm.weight
create_tensor: loading tensor blk.4.attn_q.weight
create_tensor: loading tensor blk.4.attn_k.weight
create_tensor: loading tensor blk.4.attn_v.weight
create_tensor: loading tensor blk.4.attn_q.bias
create_tensor: loading tensor blk.4.attn_k.bias
create_tensor: loading tensor blk.4.attn_v.bias
create_tensor: loading tensor blk.4.attn_output.weight
create_tensor: loading tensor blk.4.post_attention_norm.weight
create_tensor: loading tensor blk.4.ffn_gate_inp.weight
create_tensor: loading tensor blk.4.exp_probs_b.bias
tensor blk.4.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.4.ffn_gate_exps.weight
tensor blk.4.ffn_down_exps.weight (528 MiB q5_1) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.4.ffn_down_exps.weight
tensor blk.4.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.4.ffn_up_exps.weight
create_tensor: loading tensor blk.4.ffn_gate_shexp.weight
create_tensor: loading tensor blk.4.ffn_down_shexp.weight
create_tensor: loading tensor blk.4.ffn_up_shexp.weight
create_tensor: loading tensor blk.5.attn_norm.weight
create_tensor: loading tensor blk.5.attn_q.weight
create_tensor: loading tensor blk.5.attn_k.weight
create_tensor: loading tensor blk.5.attn_v.weight
create_tensor: loading tensor blk.5.attn_q.bias
create_tensor: loading tensor blk.5.attn_k.bias
create_tensor: loading tensor blk.5.attn_v.bias
create_tensor: loading tensor blk.5.attn_output.weight
create_tensor: loading tensor blk.5.post_attention_norm.weight
create_tensor: loading tensor blk.5.ffn_gate_inp.weight
create_tensor: loading tensor blk.5.exp_probs_b.bias
tensor blk.5.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.5.ffn_gate_exps.weight
tensor blk.5.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.5.ffn_down_exps.weight
tensor blk.5.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.5.ffn_up_exps.weight
create_tensor: loading tensor blk.5.ffn_gate_shexp.weight
create_tensor: loading tensor blk.5.ffn_down_shexp.weight
create_tensor: loading tensor blk.5.ffn_up_shexp.weight
create_tensor: loading tensor blk.6.attn_norm.weight
create_tensor: loading tensor blk.6.attn_q.weight
create_tensor: loading tensor blk.6.attn_k.weight
create_tensor: loading tensor blk.6.attn_v.weight
create_tensor: loading tensor blk.6.attn_q.bias
create_tensor: loading tensor blk.6.attn_k.bias
create_tensor: loading tensor blk.6.attn_v.bias
create_tensor: loading tensor blk.6.attn_output.weight
create_tensor: loading tensor blk.6.post_attention_norm.weight
create_tensor: loading tensor blk.6.ffn_gate_inp.weight
create_tensor: loading tensor blk.6.exp_probs_b.bias
tensor blk.6.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.6.ffn_gate_exps.weight
tensor blk.6.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.6.ffn_down_exps.weight
tensor blk.6.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.6.ffn_up_exps.weight
create_tensor: loading tensor blk.6.ffn_gate_shexp.weight
create_tensor: loading tensor blk.6.ffn_down_shexp.weight
create_tensor: loading tensor blk.6.ffn_up_shexp.weight
create_tensor: loading tensor blk.7.attn_norm.weight
create_tensor: loading tensor blk.7.attn_q.weight
create_tensor: loading tensor blk.7.attn_k.weight
create_tensor: loading tensor blk.7.attn_v.weight
create_tensor: loading tensor blk.7.attn_q.bias
create_tensor: loading tensor blk.7.attn_k.bias
create_tensor: loading tensor blk.7.attn_v.bias
create_tensor: loading tensor blk.7.attn_output.weight
create_tensor: loading tensor blk.7.post_attention_norm.weight
create_tensor: loading tensor blk.7.ffn_gate_inp.weight
create_tensor: loading tensor blk.7.exp_probs_b.bias
tensor blk.7.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.7.ffn_gate_exps.weight
tensor blk.7.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.7.ffn_down_exps.weight
tensor blk.7.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.7.ffn_up_exps.weight
create_tensor: loading tensor blk.7.ffn_gate_shexp.weight
create_tensor: loading tensor blk.7.ffn_down_shexp.weight
create_tensor: loading tensor blk.7.ffn_up_shexp.weight
create_tensor: loading tensor blk.8.attn_norm.weight
create_tensor: loading tensor blk.8.attn_q.weight
create_tensor: loading tensor blk.8.attn_k.weight
create_tensor: loading tensor blk.8.attn_v.weight
create_tensor: loading tensor blk.8.attn_q.bias
create_tensor: loading tensor blk.8.attn_k.bias
create_tensor: loading tensor blk.8.attn_v.bias
create_tensor: loading tensor blk.8.attn_output.weight
create_tensor: loading tensor blk.8.post_attention_norm.weight
create_tensor: loading tensor blk.8.ffn_gate_inp.weight
create_tensor: loading tensor blk.8.exp_probs_b.bias
tensor blk.8.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.8.ffn_gate_exps.weight
tensor blk.8.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.8.ffn_down_exps.weight
tensor blk.8.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.8.ffn_up_exps.weight
create_tensor: loading tensor blk.8.ffn_gate_shexp.weight
create_tensor: loading tensor blk.8.ffn_down_shexp.weight
create_tensor: loading tensor blk.8.ffn_up_shexp.weight
create_tensor: loading tensor blk.9.attn_norm.weight
create_tensor: loading tensor blk.9.attn_q.weight
create_tensor: loading tensor blk.9.attn_k.weight
create_tensor: loading tensor blk.9.attn_v.weight
create_tensor: loading tensor blk.9.attn_q.bias
create_tensor: loading tensor blk.9.attn_k.bias
create_tensor: loading tensor blk.9.attn_v.bias
create_tensor: loading tensor blk.9.attn_output.weight
create_tensor: loading tensor blk.9.post_attention_norm.weight
create_tensor: loading tensor blk.9.ffn_gate_inp.weight
create_tensor: loading tensor blk.9.exp_probs_b.bias
tensor blk.9.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.9.ffn_gate_exps.weight
tensor blk.9.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.9.ffn_down_exps.weight
tensor blk.9.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.9.ffn_up_exps.weight
create_tensor: loading tensor blk.9.ffn_gate_shexp.weight
create_tensor: loading tensor blk.9.ffn_down_shexp.weight
create_tensor: loading tensor blk.9.ffn_up_shexp.weight
create_tensor: loading tensor blk.10.attn_norm.weight
create_tensor: loading tensor blk.10.attn_q.weight
create_tensor: loading tensor blk.10.attn_k.weight
create_tensor: loading tensor blk.10.attn_v.weight
create_tensor: loading tensor blk.10.attn_q.bias
create_tensor: loading tensor blk.10.attn_k.bias
create_tensor: loading tensor blk.10.attn_v.bias
create_tensor: loading tensor blk.10.attn_output.weight
create_tensor: loading tensor blk.10.post_attention_norm.weight
create_tensor: loading tensor blk.10.ffn_gate_inp.weight
create_tensor: loading tensor blk.10.exp_probs_b.bias
tensor blk.10.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.10.ffn_gate_exps.weight
tensor blk.10.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.10.ffn_down_exps.weight
tensor blk.10.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.10.ffn_up_exps.weight
create_tensor: loading tensor blk.10.ffn_gate_shexp.weight
create_tensor: loading tensor blk.10.ffn_down_shexp.weight
create_tensor: loading tensor blk.10.ffn_up_shexp.weight
create_tensor: loading tensor blk.11.attn_norm.weight
create_tensor: loading tensor blk.11.attn_q.weight
create_tensor: loading tensor blk.11.attn_k.weight
create_tensor: loading tensor blk.11.attn_v.weight
create_tensor: loading tensor blk.11.attn_q.bias
create_tensor: loading tensor blk.11.attn_k.bias
create_tensor: loading tensor blk.11.attn_v.bias
create_tensor: loading tensor blk.11.attn_output.weight
create_tensor: loading tensor blk.11.post_attention_norm.weight
create_tensor: loading tensor blk.11.ffn_gate_inp.weight
create_tensor: loading tensor blk.11.exp_probs_b.bias
tensor blk.11.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.11.ffn_gate_exps.weight
tensor blk.11.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.11.ffn_down_exps.weight
tensor blk.11.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.11.ffn_up_exps.weight
create_tensor: loading tensor blk.11.ffn_gate_shexp.weight
create_tensor: loading tensor blk.11.ffn_down_shexp.weight
create_tensor: loading tensor blk.11.ffn_up_shexp.weight
create_tensor: loading tensor blk.12.attn_norm.weight
create_tensor: loading tensor blk.12.attn_q.weight
create_tensor: loading tensor blk.12.attn_k.weight
create_tensor: loading tensor blk.12.attn_v.weight
create_tensor: loading tensor blk.12.attn_q.bias
create_tensor: loading tensor blk.12.attn_k.bias
create_tensor: loading tensor blk.12.attn_v.bias
create_tensor: loading tensor blk.12.attn_output.weight
create_tensor: loading tensor blk.12.post_attention_norm.weight
create_tensor: loading tensor blk.12.ffn_gate_inp.weight
create_tensor: loading tensor blk.12.exp_probs_b.bias
tensor blk.12.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.12.ffn_gate_exps.weight
tensor blk.12.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.12.ffn_down_exps.weight
tensor blk.12.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.12.ffn_up_exps.weight
create_tensor: loading tensor blk.12.ffn_gate_shexp.weight
create_tensor: loading tensor blk.12.ffn_down_shexp.weight
create_tensor: loading tensor blk.12.ffn_up_shexp.weight
create_tensor: loading tensor blk.13.attn_norm.weight
create_tensor: loading tensor blk.13.attn_q.weight
create_tensor: loading tensor blk.13.attn_k.weight
create_tensor: loading tensor blk.13.attn_v.weight
create_tensor: loading tensor blk.13.attn_q.bias
create_tensor: loading tensor blk.13.attn_k.bias
create_tensor: loading tensor blk.13.attn_v.bias
create_tensor: loading tensor blk.13.attn_output.weight
create_tensor: loading tensor blk.13.post_attention_norm.weight
create_tensor: loading tensor blk.13.ffn_gate_inp.weight
create_tensor: loading tensor blk.13.exp_probs_b.bias
tensor blk.13.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.13.ffn_gate_exps.weight
tensor blk.13.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.13.ffn_down_exps.weight
tensor blk.13.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.13.ffn_up_exps.weight
create_tensor: loading tensor blk.13.ffn_gate_shexp.weight
create_tensor: loading tensor blk.13.ffn_down_shexp.weight
create_tensor: loading tensor blk.13.ffn_up_shexp.weight
create_tensor: loading tensor blk.14.attn_norm.weight
create_tensor: loading tensor blk.14.attn_q.weight
create_tensor: loading tensor blk.14.attn_k.weight
create_tensor: loading tensor blk.14.attn_v.weight
create_tensor: loading tensor blk.14.attn_q.bias
create_tensor: loading tensor blk.14.attn_k.bias
create_tensor: loading tensor blk.14.attn_v.bias
create_tensor: loading tensor blk.14.attn_output.weight
create_tensor: loading tensor blk.14.post_attention_norm.weight
create_tensor: loading tensor blk.14.ffn_gate_inp.weight
create_tensor: loading tensor blk.14.exp_probs_b.bias
tensor blk.14.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.14.ffn_gate_exps.weight
tensor blk.14.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.14.ffn_down_exps.weight
tensor blk.14.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.14.ffn_up_exps.weight
create_tensor: loading tensor blk.14.ffn_gate_shexp.weight
create_tensor: loading tensor blk.14.ffn_down_shexp.weight
create_tensor: loading tensor blk.14.ffn_up_shexp.weight
create_tensor: loading tensor blk.15.attn_norm.weight
create_tensor: loading tensor blk.15.attn_q.weight
create_tensor: loading tensor blk.15.attn_k.weight
create_tensor: loading tensor blk.15.attn_v.weight
create_tensor: loading tensor blk.15.attn_q.bias
create_tensor: loading tensor blk.15.attn_k.bias
create_tensor: loading tensor blk.15.attn_v.bias
create_tensor: loading tensor blk.15.attn_output.weight
create_tensor: loading tensor blk.15.post_attention_norm.weight
create_tensor: loading tensor blk.15.ffn_gate_inp.weight
create_tensor: loading tensor blk.15.exp_probs_b.bias
tensor blk.15.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.15.ffn_gate_exps.weight
tensor blk.15.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.15.ffn_down_exps.weight
tensor blk.15.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.15.ffn_up_exps.weight
create_tensor: loading tensor blk.15.ffn_gate_shexp.weight
create_tensor: loading tensor blk.15.ffn_down_shexp.weight
create_tensor: loading tensor blk.15.ffn_up_shexp.weight
create_tensor: loading tensor blk.16.attn_norm.weight
create_tensor: loading tensor blk.16.attn_q.weight
create_tensor: loading tensor blk.16.attn_k.weight
create_tensor: loading tensor blk.16.attn_v.weight
create_tensor: loading tensor blk.16.attn_q.bias
create_tensor: loading tensor blk.16.attn_k.bias
create_tensor: loading tensor blk.16.attn_v.bias
create_tensor: loading tensor blk.16.attn_output.weight
create_tensor: loading tensor blk.16.post_attention_norm.weight
create_tensor: loading tensor blk.16.ffn_gate_inp.weight
create_tensor: loading tensor blk.16.exp_probs_b.bias
tensor blk.16.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.16.ffn_gate_exps.weight
tensor blk.16.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.16.ffn_down_exps.weight
tensor blk.16.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.16.ffn_up_exps.weight
create_tensor: loading tensor blk.16.ffn_gate_shexp.weight
create_tensor: loading tensor blk.16.ffn_down_shexp.weight
create_tensor: loading tensor blk.16.ffn_up_shexp.weight
create_tensor: loading tensor blk.17.attn_norm.weight
create_tensor: loading tensor blk.17.attn_q.weight
create_tensor: loading tensor blk.17.attn_k.weight
create_tensor: loading tensor blk.17.attn_v.weight
create_tensor: loading tensor blk.17.attn_q.bias
create_tensor: loading tensor blk.17.attn_k.bias
create_tensor: loading tensor blk.17.attn_v.bias
create_tensor: loading tensor blk.17.attn_output.weight
create_tensor: loading tensor blk.17.post_attention_norm.weight
create_tensor: loading tensor blk.17.ffn_gate_inp.weight
create_tensor: loading tensor blk.17.exp_probs_b.bias
tensor blk.17.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.17.ffn_gate_exps.weight
tensor blk.17.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.17.ffn_down_exps.weight
tensor blk.17.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.17.ffn_up_exps.weight
create_tensor: loading tensor blk.17.ffn_gate_shexp.weight
create_tensor: loading tensor blk.17.ffn_down_shexp.weight
create_tensor: loading tensor blk.17.ffn_up_shexp.weight
create_tensor: loading tensor blk.18.attn_norm.weight
create_tensor: loading tensor blk.18.attn_q.weight
create_tensor: loading tensor blk.18.attn_k.weight
create_tensor: loading tensor blk.18.attn_v.weight
create_tensor: loading tensor blk.18.attn_q.bias
create_tensor: loading tensor blk.18.attn_k.bias
create_tensor: loading tensor blk.18.attn_v.bias
create_tensor: loading tensor blk.18.attn_output.weight
create_tensor: loading tensor blk.18.post_attention_norm.weight
create_tensor: loading tensor blk.18.ffn_gate_inp.weight
create_tensor: loading tensor blk.18.exp_probs_b.bias
tensor blk.18.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.18.ffn_gate_exps.weight
tensor blk.18.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.18.ffn_down_exps.weight
tensor blk.18.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.18.ffn_up_exps.weight
create_tensor: loading tensor blk.18.ffn_gate_shexp.weight
create_tensor: loading tensor blk.18.ffn_down_shexp.weight
create_tensor: loading tensor blk.18.ffn_up_shexp.weight
create_tensor: loading tensor blk.19.attn_norm.weight
create_tensor: loading tensor blk.19.attn_q.weight
create_tensor: loading tensor blk.19.attn_k.weight
create_tensor: loading tensor blk.19.attn_v.weight
create_tensor: loading tensor blk.19.attn_q.bias
create_tensor: loading tensor blk.19.attn_k.bias
create_tensor: loading tensor blk.19.attn_v.bias
create_tensor: loading tensor blk.19.attn_output.weight
create_tensor: loading tensor blk.19.post_attention_norm.weight
create_tensor: loading tensor blk.19.ffn_gate_inp.weight
create_tensor: loading tensor blk.19.exp_probs_b.bias
tensor blk.19.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.19.ffn_gate_exps.weight
tensor blk.19.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.19.ffn_down_exps.weight
tensor blk.19.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.19.ffn_up_exps.weight
create_tensor: loading tensor blk.19.ffn_gate_shexp.weight
create_tensor: loading tensor blk.19.ffn_down_shexp.weight
create_tensor: loading tensor blk.19.ffn_up_shexp.weight
create_tensor: loading tensor blk.20.attn_norm.weight
create_tensor: loading tensor blk.20.attn_q.weight
create_tensor: loading tensor blk.20.attn_k.weight
create_tensor: loading tensor blk.20.attn_v.weight
create_tensor: loading tensor blk.20.attn_q.bias
create_tensor: loading tensor blk.20.attn_k.bias
create_tensor: loading tensor blk.20.attn_v.bias
create_tensor: loading tensor blk.20.attn_output.weight
create_tensor: loading tensor blk.20.post_attention_norm.weight
create_tensor: loading tensor blk.20.ffn_gate_inp.weight
create_tensor: loading tensor blk.20.exp_probs_b.bias
tensor blk.20.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.20.ffn_gate_exps.weight
tensor blk.20.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.20.ffn_down_exps.weight
tensor blk.20.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.20.ffn_up_exps.weight
create_tensor: loading tensor blk.20.ffn_gate_shexp.weight
create_tensor: loading tensor blk.20.ffn_down_shexp.weight
create_tensor: loading tensor blk.20.ffn_up_shexp.weight
create_tensor: loading tensor blk.21.attn_norm.weight
create_tensor: loading tensor blk.21.attn_q.weight
create_tensor: loading tensor blk.21.attn_k.weight
create_tensor: loading tensor blk.21.attn_v.weight
create_tensor: loading tensor blk.21.attn_q.bias
create_tensor: loading tensor blk.21.attn_k.bias
create_tensor: loading tensor blk.21.attn_v.bias
create_tensor: loading tensor blk.21.attn_output.weight
create_tensor: loading tensor blk.21.post_attention_norm.weight
create_tensor: loading tensor blk.21.ffn_gate_inp.weight
create_tensor: loading tensor blk.21.exp_probs_b.bias
tensor blk.21.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.21.ffn_gate_exps.weight
tensor blk.21.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.21.ffn_down_exps.weight
tensor blk.21.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.21.ffn_up_exps.weight
create_tensor: loading tensor blk.21.ffn_gate_shexp.weight
create_tensor: loading tensor blk.21.ffn_down_shexp.weight
create_tensor: loading tensor blk.21.ffn_up_shexp.weight
create_tensor: loading tensor blk.22.attn_norm.weight
create_tensor: loading tensor blk.22.attn_q.weight
create_tensor: loading tensor blk.22.attn_k.weight
create_tensor: loading tensor blk.22.attn_v.weight
create_tensor: loading tensor blk.22.attn_q.bias
create_tensor: loading tensor blk.22.attn_k.bias
create_tensor: loading tensor blk.22.attn_v.bias
create_tensor: loading tensor blk.22.attn_output.weight
create_tensor: loading tensor blk.22.post_attention_norm.weight
create_tensor: loading tensor blk.22.ffn_gate_inp.weight
create_tensor: loading tensor blk.22.exp_probs_b.bias
tensor blk.22.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.22.ffn_gate_exps.weight
tensor blk.22.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.22.ffn_down_exps.weight
tensor blk.22.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.22.ffn_up_exps.weight
create_tensor: loading tensor blk.22.ffn_gate_shexp.weight
create_tensor: loading tensor blk.22.ffn_down_shexp.weight
create_tensor: loading tensor blk.22.ffn_up_shexp.weight
create_tensor: loading tensor blk.23.attn_norm.weight
create_tensor: loading tensor blk.23.attn_q.weight
create_tensor: loading tensor blk.23.attn_k.weight
create_tensor: loading tensor blk.23.attn_v.weight
create_tensor: loading tensor blk.23.attn_q.bias
create_tensor: loading tensor blk.23.attn_k.bias
create_tensor: loading tensor blk.23.attn_v.bias
create_tensor: loading tensor blk.23.attn_output.weight
create_tensor: loading tensor blk.23.post_attention_norm.weight
create_tensor: loading tensor blk.23.ffn_gate_inp.weight
create_tensor: loading tensor blk.23.exp_probs_b.bias
tensor blk.23.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.23.ffn_gate_exps.weight
tensor blk.23.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.23.ffn_down_exps.weight
tensor blk.23.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.23.ffn_up_exps.weight
create_tensor: loading tensor blk.23.ffn_gate_shexp.weight
create_tensor: loading tensor blk.23.ffn_down_shexp.weight
create_tensor: loading tensor blk.23.ffn_up_shexp.weight
create_tensor: loading tensor blk.24.attn_norm.weight
create_tensor: loading tensor blk.24.attn_q.weight
create_tensor: loading tensor blk.24.attn_k.weight
create_tensor: loading tensor blk.24.attn_v.weight
create_tensor: loading tensor blk.24.attn_q.bias
create_tensor: loading tensor blk.24.attn_k.bias
create_tensor: loading tensor blk.24.attn_v.bias
create_tensor: loading tensor blk.24.attn_output.weight
create_tensor: loading tensor blk.24.post_attention_norm.weight
create_tensor: loading tensor blk.24.ffn_gate_inp.weight
create_tensor: loading tensor blk.24.exp_probs_b.bias
tensor blk.24.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.24.ffn_gate_exps.weight
tensor blk.24.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.24.ffn_down_exps.weight
tensor blk.24.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.24.ffn_up_exps.weight
create_tensor: loading tensor blk.24.ffn_gate_shexp.weight
create_tensor: loading tensor blk.24.ffn_down_shexp.weight
create_tensor: loading tensor blk.24.ffn_up_shexp.weight
create_tensor: loading tensor blk.25.attn_norm.weight
create_tensor: loading tensor blk.25.attn_q.weight
create_tensor: loading tensor blk.25.attn_k.weight
create_tensor: loading tensor blk.25.attn_v.weight
create_tensor: loading tensor blk.25.attn_q.bias
create_tensor: loading tensor blk.25.attn_k.bias
create_tensor: loading tensor blk.25.attn_v.bias
create_tensor: loading tensor blk.25.attn_output.weight
create_tensor: loading tensor blk.25.post_attention_norm.weight
create_tensor: loading tensor blk.25.ffn_gate_inp.weight
create_tensor: loading tensor blk.25.exp_probs_b.bias
tensor blk.25.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.25.ffn_gate_exps.weight
tensor blk.25.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.25.ffn_down_exps.weight
tensor blk.25.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.25.ffn_up_exps.weight
create_tensor: loading tensor blk.25.ffn_gate_shexp.weight
create_tensor: loading tensor blk.25.ffn_down_shexp.weight
create_tensor: loading tensor blk.25.ffn_up_shexp.weight
create_tensor: loading tensor blk.26.attn_norm.weight
create_tensor: loading tensor blk.26.attn_q.weight
create_tensor: loading tensor blk.26.attn_k.weight
create_tensor: loading tensor blk.26.attn_v.weight
create_tensor: loading tensor blk.26.attn_q.bias
create_tensor: loading tensor blk.26.attn_k.bias
create_tensor: loading tensor blk.26.attn_v.bias
create_tensor: loading tensor blk.26.attn_output.weight
create_tensor: loading tensor blk.26.post_attention_norm.weight
create_tensor: loading tensor blk.26.ffn_gate_inp.weight
create_tensor: loading tensor blk.26.exp_probs_b.bias
tensor blk.26.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.26.ffn_gate_exps.weight
tensor blk.26.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.26.ffn_down_exps.weight
tensor blk.26.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.26.ffn_up_exps.weight
create_tensor: loading tensor blk.26.ffn_gate_shexp.weight
create_tensor: loading tensor blk.26.ffn_down_shexp.weight
create_tensor: loading tensor blk.26.ffn_up_shexp.weight
create_tensor: loading tensor blk.27.attn_norm.weight
create_tensor: loading tensor blk.27.attn_q.weight
create_tensor: loading tensor blk.27.attn_k.weight
create_tensor: loading tensor blk.27.attn_v.weight
create_tensor: loading tensor blk.27.attn_q.bias
create_tensor: loading tensor blk.27.attn_k.bias
create_tensor: loading tensor blk.27.attn_v.bias
create_tensor: loading tensor blk.27.attn_output.weight
create_tensor: loading tensor blk.27.post_attention_norm.weight
create_tensor: loading tensor blk.27.ffn_gate_inp.weight
create_tensor: loading tensor blk.27.exp_probs_b.bias
tensor blk.27.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.27.ffn_gate_exps.weight
tensor blk.27.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.27.ffn_down_exps.weight
tensor blk.27.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.27.ffn_up_exps.weight
create_tensor: loading tensor blk.27.ffn_gate_shexp.weight
create_tensor: loading tensor blk.27.ffn_down_shexp.weight
create_tensor: loading tensor blk.27.ffn_up_shexp.weight
create_tensor: loading tensor blk.28.attn_norm.weight
create_tensor: loading tensor blk.28.attn_q.weight
create_tensor: loading tensor blk.28.attn_k.weight
create_tensor: loading tensor blk.28.attn_v.weight
create_tensor: loading tensor blk.28.attn_q.bias
create_tensor: loading tensor blk.28.attn_k.bias
create_tensor: loading tensor blk.28.attn_v.bias
create_tensor: loading tensor blk.28.attn_output.weight
create_tensor: loading tensor blk.28.post_attention_norm.weight
create_tensor: loading tensor blk.28.ffn_gate_inp.weight
create_tensor: loading tensor blk.28.exp_probs_b.bias
tensor blk.28.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.28.ffn_gate_exps.weight
tensor blk.28.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.28.ffn_down_exps.weight
tensor blk.28.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.28.ffn_up_exps.weight
create_tensor: loading tensor blk.28.ffn_gate_shexp.weight
create_tensor: loading tensor blk.28.ffn_down_shexp.weight
create_tensor: loading tensor blk.28.ffn_up_shexp.weight
create_tensor: loading tensor blk.29.attn_norm.weight
create_tensor: loading tensor blk.29.attn_q.weight
create_tensor: loading tensor blk.29.attn_k.weight
create_tensor: loading tensor blk.29.attn_v.weight
create_tensor: loading tensor blk.29.attn_q.bias
create_tensor: loading tensor blk.29.attn_k.bias
create_tensor: loading tensor blk.29.attn_v.bias
create_tensor: loading tensor blk.29.attn_output.weight
create_tensor: loading tensor blk.29.post_attention_norm.weight
create_tensor: loading tensor blk.29.ffn_gate_inp.weight
create_tensor: loading tensor blk.29.exp_probs_b.bias
tensor blk.29.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.29.ffn_gate_exps.weight
tensor blk.29.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.29.ffn_down_exps.weight
tensor blk.29.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.29.ffn_up_exps.weight
create_tensor: loading tensor blk.29.ffn_gate_shexp.weight
create_tensor: loading tensor blk.29.ffn_down_shexp.weight
create_tensor: loading tensor blk.29.ffn_up_shexp.weight
create_tensor: loading tensor blk.30.attn_norm.weight
create_tensor: loading tensor blk.30.attn_q.weight
create_tensor: loading tensor blk.30.attn_k.weight
create_tensor: loading tensor blk.30.attn_v.weight
create_tensor: loading tensor blk.30.attn_q.bias
create_tensor: loading tensor blk.30.attn_k.bias
create_tensor: loading tensor blk.30.attn_v.bias
create_tensor: loading tensor blk.30.attn_output.weight
create_tensor: loading tensor blk.30.post_attention_norm.weight
create_tensor: loading tensor blk.30.ffn_gate_inp.weight
create_tensor: loading tensor blk.30.exp_probs_b.bias
tensor blk.30.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.30.ffn_gate_exps.weight
tensor blk.30.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.30.ffn_down_exps.weight
tensor blk.30.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.30.ffn_up_exps.weight
create_tensor: loading tensor blk.30.ffn_gate_shexp.weight
create_tensor: loading tensor blk.30.ffn_down_shexp.weight
create_tensor: loading tensor blk.30.ffn_up_shexp.weight
create_tensor: loading tensor blk.31.attn_norm.weight
create_tensor: loading tensor blk.31.attn_q.weight
create_tensor: loading tensor blk.31.attn_k.weight
create_tensor: loading tensor blk.31.attn_v.weight
create_tensor: loading tensor blk.31.attn_q.bias
create_tensor: loading tensor blk.31.attn_k.bias
create_tensor: loading tensor blk.31.attn_v.bias
create_tensor: loading tensor blk.31.attn_output.weight
create_tensor: loading tensor blk.31.post_attention_norm.weight
create_tensor: loading tensor blk.31.ffn_gate_inp.weight
create_tensor: loading tensor blk.31.exp_probs_b.bias
tensor blk.31.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.31.ffn_gate_exps.weight
tensor blk.31.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.31.ffn_down_exps.weight
tensor blk.31.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.31.ffn_up_exps.weight
create_tensor: loading tensor blk.31.ffn_gate_shexp.weight
create_tensor: loading tensor blk.31.ffn_down_shexp.weight
create_tensor: loading tensor blk.31.ffn_up_shexp.weight
create_tensor: loading tensor blk.32.attn_norm.weight
create_tensor: loading tensor blk.32.attn_q.weight
create_tensor: loading tensor blk.32.attn_k.weight
create_tensor: loading tensor blk.32.attn_v.weight
create_tensor: loading tensor blk.32.attn_q.bias
create_tensor: loading tensor blk.32.attn_k.bias
create_tensor: loading tensor blk.32.attn_v.bias
create_tensor: loading tensor blk.32.attn_output.weight
create_tensor: loading tensor blk.32.post_attention_norm.weight
create_tensor: loading tensor blk.32.ffn_gate_inp.weight
create_tensor: loading tensor blk.32.exp_probs_b.bias
tensor blk.32.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.32.ffn_gate_exps.weight
tensor blk.32.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.32.ffn_down_exps.weight
tensor blk.32.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.32.ffn_up_exps.weight
create_tensor: loading tensor blk.32.ffn_gate_shexp.weight
create_tensor: loading tensor blk.32.ffn_down_shexp.weight
create_tensor: loading tensor blk.32.ffn_up_shexp.weight
create_tensor: loading tensor blk.33.attn_norm.weight
create_tensor: loading tensor blk.33.attn_q.weight
create_tensor: loading tensor blk.33.attn_k.weight
create_tensor: loading tensor blk.33.attn_v.weight
create_tensor: loading tensor blk.33.attn_q.bias
create_tensor: loading tensor blk.33.attn_k.bias
create_tensor: loading tensor blk.33.attn_v.bias
create_tensor: loading tensor blk.33.attn_output.weight
create_tensor: loading tensor blk.33.post_attention_norm.weight
create_tensor: loading tensor blk.33.ffn_gate_inp.weight
create_tensor: loading tensor blk.33.exp_probs_b.bias
tensor blk.33.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.33.ffn_gate_exps.weight
tensor blk.33.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.33.ffn_down_exps.weight
tensor blk.33.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.33.ffn_up_exps.weight
create_tensor: loading tensor blk.33.ffn_gate_shexp.weight
create_tensor: loading tensor blk.33.ffn_down_shexp.weight
create_tensor: loading tensor blk.33.ffn_up_shexp.weight
create_tensor: loading tensor blk.34.attn_norm.weight
create_tensor: loading tensor blk.34.attn_q.weight
create_tensor: loading tensor blk.34.attn_k.weight
create_tensor: loading tensor blk.34.attn_v.weight
create_tensor: loading tensor blk.34.attn_q.bias
create_tensor: loading tensor blk.34.attn_k.bias
create_tensor: loading tensor blk.34.attn_v.bias
create_tensor: loading tensor blk.34.attn_output.weight
create_tensor: loading tensor blk.34.post_attention_norm.weight
create_tensor: loading tensor blk.34.ffn_gate_inp.weight
create_tensor: loading tensor blk.34.exp_probs_b.bias
tensor blk.34.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.34.ffn_gate_exps.weight
tensor blk.34.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.34.ffn_down_exps.weight
tensor blk.34.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.34.ffn_up_exps.weight
create_tensor: loading tensor blk.34.ffn_gate_shexp.weight
create_tensor: loading tensor blk.34.ffn_down_shexp.weight
create_tensor: loading tensor blk.34.ffn_up_shexp.weight
create_tensor: loading tensor blk.35.attn_norm.weight
create_tensor: loading tensor blk.35.attn_q.weight
create_tensor: loading tensor blk.35.attn_k.weight
create_tensor: loading tensor blk.35.attn_v.weight
create_tensor: loading tensor blk.35.attn_q.bias
create_tensor: loading tensor blk.35.attn_k.bias
create_tensor: loading tensor blk.35.attn_v.bias
create_tensor: loading tensor blk.35.attn_output.weight
create_tensor: loading tensor blk.35.post_attention_norm.weight
create_tensor: loading tensor blk.35.ffn_gate_inp.weight
create_tensor: loading tensor blk.35.exp_probs_b.bias
tensor blk.35.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.35.ffn_gate_exps.weight
tensor blk.35.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.35.ffn_down_exps.weight
tensor blk.35.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.35.ffn_up_exps.weight
create_tensor: loading tensor blk.35.ffn_gate_shexp.weight
create_tensor: loading tensor blk.35.ffn_down_shexp.weight
create_tensor: loading tensor blk.35.ffn_up_shexp.weight
create_tensor: loading tensor blk.36.attn_norm.weight
create_tensor: loading tensor blk.36.attn_q.weight
create_tensor: loading tensor blk.36.attn_k.weight
create_tensor: loading tensor blk.36.attn_v.weight
create_tensor: loading tensor blk.36.attn_q.bias
create_tensor: loading tensor blk.36.attn_k.bias
create_tensor: loading tensor blk.36.attn_v.bias
create_tensor: loading tensor blk.36.attn_output.weight
create_tensor: loading tensor blk.36.post_attention_norm.weight
create_tensor: loading tensor blk.36.ffn_gate_inp.weight
create_tensor: loading tensor blk.36.exp_probs_b.bias
tensor blk.36.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.36.ffn_gate_exps.weight
tensor blk.36.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.36.ffn_down_exps.weight
tensor blk.36.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.36.ffn_up_exps.weight
create_tensor: loading tensor blk.36.ffn_gate_shexp.weight
create_tensor: loading tensor blk.36.ffn_down_shexp.weight
create_tensor: loading tensor blk.36.ffn_up_shexp.weight
create_tensor: loading tensor blk.37.attn_norm.weight
create_tensor: loading tensor blk.37.attn_q.weight
create_tensor: loading tensor blk.37.attn_k.weight
create_tensor: loading tensor blk.37.attn_v.weight
create_tensor: loading tensor blk.37.attn_q.bias
create_tensor: loading tensor blk.37.attn_k.bias
create_tensor: loading tensor blk.37.attn_v.bias
create_tensor: loading tensor blk.37.attn_output.weight
create_tensor: loading tensor blk.37.post_attention_norm.weight
create_tensor: loading tensor blk.37.ffn_gate_inp.weight
create_tensor: loading tensor blk.37.exp_probs_b.bias
tensor blk.37.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.37.ffn_gate_exps.weight
tensor blk.37.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.37.ffn_down_exps.weight
tensor blk.37.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.37.ffn_up_exps.weight
create_tensor: loading tensor blk.37.ffn_gate_shexp.weight
create_tensor: loading tensor blk.37.ffn_down_shexp.weight
create_tensor: loading tensor blk.37.ffn_up_shexp.weight
create_tensor: loading tensor blk.38.attn_norm.weight
create_tensor: loading tensor blk.38.attn_q.weight
create_tensor: loading tensor blk.38.attn_k.weight
create_tensor: loading tensor blk.38.attn_v.weight
create_tensor: loading tensor blk.38.attn_q.bias
create_tensor: loading tensor blk.38.attn_k.bias
create_tensor: loading tensor blk.38.attn_v.bias
create_tensor: loading tensor blk.38.attn_output.weight
create_tensor: loading tensor blk.38.post_attention_norm.weight
create_tensor: loading tensor blk.38.ffn_gate_inp.weight
create_tensor: loading tensor blk.38.exp_probs_b.bias
tensor blk.38.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.38.ffn_gate_exps.weight
tensor blk.38.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.38.ffn_down_exps.weight
tensor blk.38.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.38.ffn_up_exps.weight
create_tensor: loading tensor blk.38.ffn_gate_shexp.weight
create_tensor: loading tensor blk.38.ffn_down_shexp.weight
create_tensor: loading tensor blk.38.ffn_up_shexp.weight
create_tensor: loading tensor blk.39.attn_norm.weight
create_tensor: loading tensor blk.39.attn_q.weight
create_tensor: loading tensor blk.39.attn_k.weight
create_tensor: loading tensor blk.39.attn_v.weight
create_tensor: loading tensor blk.39.attn_q.bias
create_tensor: loading tensor blk.39.attn_k.bias
create_tensor: loading tensor blk.39.attn_v.bias
create_tensor: loading tensor blk.39.attn_output.weight
create_tensor: loading tensor blk.39.post_attention_norm.weight
create_tensor: loading tensor blk.39.ffn_gate_inp.weight
create_tensor: loading tensor blk.39.exp_probs_b.bias
tensor blk.39.ffn_gate_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.39.ffn_gate_exps.weight
tensor blk.39.ffn_down_exps.weight (484 MiB q5_0) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.39.ffn_down_exps.weight
tensor blk.39.ffn_up_exps.weight (396 MiB q4_K) buffer type overridden to CUDA_Host
create_tensor: loading tensor blk.39.ffn_up_exps.weight
create_tensor: loading tensor blk.39.ffn_gate_shexp.weight
create_tensor: loading tensor blk.39.ffn_down_shexp.weight
create_tensor: loading tensor blk.39.ffn_up_shexp.weight
create_tensor: loading tensor blk.40.attn_norm.weight
create_tensor: loading tensor blk.40.attn_q.weight
create_tensor: loading tensor blk.40.attn_k.weight
create_tensor: loading tensor blk.40.attn_v.weight
create_tensor: loading tensor blk.40.attn_q.bias
create_tensor: loading tensor blk.40.attn_k.bias
create_tensor: loading tensor blk.40.attn_v.bias
create_tensor: loading tensor blk.40.attn_output.weight
create_tensor: loading tensor blk.40.post_attention_norm.weight
create_tensor: loading tensor blk.40.ffn_gate_inp.weight
create_tensor: loading tensor blk.40.exp_probs_b.bias
create_tensor: loading tensor blk.40.ffn_gate_exps.weight
create_tensor: loading tensor blk.40.ffn_down_exps.weight
create_tensor: loading tensor blk.40.ffn_up_exps.weight
create_tensor: loading tensor blk.40.ffn_gate_shexp.weight
create_tensor: loading tensor blk.40.ffn_down_shexp.weight
create_tensor: loading tensor blk.40.ffn_up_shexp.weight
create_tensor: loading tensor blk.41.attn_norm.weight
create_tensor: loading tensor blk.41.attn_q.weight
create_tensor: loading tensor blk.41.attn_k.weight
create_tensor: loading tensor blk.41.attn_v.weight
create_tensor: loading tensor blk.41.attn_q.bias
create_tensor: loading tensor blk.41.attn_k.bias
create_tensor: loading tensor blk.41.attn_v.bias
create_tensor: loading tensor blk.41.attn_output.weight
create_tensor: loading tensor blk.41.post_attention_norm.weight
create_tensor: loading tensor blk.41.ffn_gate_inp.weight
create_tensor: loading tensor blk.41.exp_probs_b.bias
create_tensor: loading tensor blk.41.ffn_gate_exps.weight
create_tensor: loading tensor blk.41.ffn_down_exps.weight
create_tensor: loading tensor blk.41.ffn_up_exps.weight
create_tensor: loading tensor blk.41.ffn_gate_shexp.weight
create_tensor: loading tensor blk.41.ffn_down_shexp.weight
create_tensor: loading tensor blk.41.ffn_up_shexp.weight
create_tensor: loading tensor blk.42.attn_norm.weight
create_tensor: loading tensor blk.42.attn_q.weight
create_tensor: loading tensor blk.42.attn_k.weight
create_tensor: loading tensor blk.42.attn_v.weight
create_tensor: loading tensor blk.42.attn_q.bias
create_tensor: loading tensor blk.42.attn_k.bias
create_tensor: loading tensor blk.42.attn_v.bias
create_tensor: loading tensor blk.42.attn_output.weight
create_tensor: loading tensor blk.42.post_attention_norm.weight
create_tensor: loading tensor blk.42.ffn_gate_inp.weight
create_tensor: loading tensor blk.42.exp_probs_b.bias
create_tensor: loading tensor blk.42.ffn_gate_exps.weight
create_tensor: loading tensor blk.42.ffn_down_exps.weight
create_tensor: loading tensor blk.42.ffn_up_exps.weight
create_tensor: loading tensor blk.42.ffn_gate_shexp.weight
create_tensor: loading tensor blk.42.ffn_down_shexp.weight
create_tensor: loading tensor blk.42.ffn_up_shexp.weight
create_tensor: loading tensor blk.43.attn_norm.weight
create_tensor: loading tensor blk.43.attn_q.weight
create_tensor: loading tensor blk.43.attn_k.weight
create_tensor: loading tensor blk.43.attn_v.weight
create_tensor: loading tensor blk.43.attn_q.bias
create_tensor: loading tensor blk.43.attn_k.bias
create_tensor: loading tensor blk.43.attn_v.bias
create_tensor: loading tensor blk.43.attn_output.weight
create_tensor: loading tensor blk.43.post_attention_norm.weight
create_tensor: loading tensor blk.43.ffn_gate_inp.weight
create_tensor: loading tensor blk.43.exp_probs_b.bias
create_tensor: loading tensor blk.43.ffn_gate_exps.weight
create_tensor: loading tensor blk.43.ffn_down_exps.weight
create_tensor: loading tensor blk.43.ffn_up_exps.weight
create_tensor: loading tensor blk.43.ffn_gate_shexp.weight
create_tensor: loading tensor blk.43.ffn_down_shexp.weight
create_tensor: loading tensor blk.43.ffn_up_shexp.weight
create_tensor: loading tensor blk.44.attn_norm.weight
create_tensor: loading tensor blk.44.attn_q.weight
create_tensor: loading tensor blk.44.attn_k.weight
create_tensor: loading tensor blk.44.attn_v.weight
create_tensor: loading tensor blk.44.attn_q.bias
create_tensor: loading tensor blk.44.attn_k.bias
create_tensor: loading tensor blk.44.attn_v.bias
create_tensor: loading tensor blk.44.attn_output.weight
create_tensor: loading tensor blk.44.post_attention_norm.weight
create_tensor: loading tensor blk.44.ffn_gate_inp.weight
create_tensor: loading tensor blk.44.exp_probs_b.bias
create_tensor: loading tensor blk.44.ffn_gate_exps.weight
create_tensor: loading tensor blk.44.ffn_down_exps.weight
create_tensor: loading tensor blk.44.ffn_up_exps.weight
create_tensor: loading tensor blk.44.ffn_gate_shexp.weight
create_tensor: loading tensor blk.44.ffn_down_shexp.weight
create_tensor: loading tensor blk.44.ffn_up_shexp.weight
create_tensor: loading tensor blk.45.attn_norm.weight
create_tensor: loading tensor blk.45.attn_q.weight
create_tensor: loading tensor blk.45.attn_k.weight
create_tensor: loading tensor blk.45.attn_v.weight
create_tensor: loading tensor blk.45.attn_q.bias
create_tensor: loading tensor blk.45.attn_k.bias
create_tensor: loading tensor blk.45.attn_v.bias
create_tensor: loading tensor blk.45.attn_output.weight
create_tensor: loading tensor blk.45.post_attention_norm.weight
create_tensor: loading tensor blk.45.ffn_gate_inp.weight
create_tensor: loading tensor blk.45.exp_probs_b.bias
create_tensor: loading tensor blk.45.ffn_gate_exps.weight
create_tensor: loading tensor blk.45.ffn_down_exps.weight
create_tensor: loading tensor blk.45.ffn_up_exps.weight
create_tensor: loading tensor blk.45.ffn_gate_shexp.weight
create_tensor: loading tensor blk.45.ffn_down_shexp.weight
create_tensor: loading tensor blk.45.ffn_up_shexp.weight
model has unused tensor blk.46.attn_norm.weight (size = 16384 bytes) -- ignoring
model has unused tensor blk.46.attn_q.weight (size = 28311552 bytes) -- ignoring
model has unused tensor blk.46.attn_k.weight (size = 2359296 bytes) -- ignoring
model has unused tensor blk.46.attn_v.weight (size = 2359296 bytes) -- ignoring
model has unused tensor blk.46.attn_q.bias (size = 49152 bytes) -- ignoring
model has unused tensor blk.46.attn_k.bias (size = 4096 bytes) -- ignoring
model has unused tensor blk.46.attn_v.bias (size = 4096 bytes) -- ignoring
model has unused tensor blk.46.attn_output.weight (size = 28311552 bytes) -- ignoring
model has unused tensor blk.46.post_attention_norm.weight (size = 16384 bytes) -- ignoring
model has unused tensor blk.46.ffn_gate_inp.weight (size = 2097152 bytes) -- ignoring
model has unused tensor blk.46.exp_probs_b.bias (size = 512 bytes) -- ignoring
model has unused tensor blk.46.ffn_gate_exps.weight (size = 415236096 bytes) -- ignoring
model has unused tensor blk.46.ffn_down_exps.weight (size = 507510784 bytes) -- ignoring
model has unused tensor blk.46.ffn_up_exps.weight (size = 415236096 bytes) -- ignoring
model has unused tensor blk.46.ffn_gate_shexp.weight (size = 3244032 bytes) -- ignoring
model has unused tensor blk.46.ffn_down_shexp.weight (size = 3964928 bytes) -- ignoring
model has unused tensor blk.46.ffn_up_shexp.weight (size = 3244032 bytes) -- ignoring
model has unused tensor blk.46.nextn.eh_proj.weight (size = 18874368 bytes) -- ignoring
model has unused tensor blk.46.nextn.embed_tokens.weight (size = 349175808 bytes) -- ignoring
model has unused tensor blk.46.nextn.enorm.weight (size = 16384 bytes) -- ignoring
model has unused tensor blk.46.nextn.hnorm.weight (size = 16384 bytes) -- ignoring
model has unused tensor blk.46.nextn.shared_head_head.weight (size = 349175808 bytes) -- ignoring
model has unused tensor blk.46.nextn.shared_head_norm.weight (size = 16384 bytes) -- ignoring
load_tensors: tensor 'token_embd.weight' (q4_K) (and 117 others) cannot be used with preferred buffer type CUDA_Host, using CPU instead
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: CUDA0 model buffer size = 11458.75 MiB
load_tensors: CPU_REPACK model buffer size = 30888.00 MiB
load_tensors: CPU_Mapped model buffer size = 46976.56 MiB
load_tensors: CPU_Mapped model buffer size = 4523.67 MiB
..................repack: repack tensor blk.1.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.1.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.2.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.2.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.3.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.3.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.4.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.4.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.5.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.5.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.6.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.6.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.7.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.7.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.8.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.8.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.9.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.9.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.10.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.10.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.11.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.11.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.12.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.12.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.13.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.13.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.14.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.14.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.15.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.15.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.16.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.16.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.17.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.17.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.18.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.18.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.19.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.19.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.20.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.20.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.21.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.21.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.22.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.22.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.23.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.23.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.24.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.24.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.25.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.25.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.26.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.26.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.27.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.27.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.28.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.28.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.29.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.29.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.30.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.30.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.31.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.31.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.32.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.32.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.33.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.33.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.34.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.34.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.35.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.35.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.36.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.36.ffn_up_exps.weight with q4_K_8x8
repack: repack tensor blk.37.ffn_gate_exps.weight with q4_K_8x8
.repack: repack tensor blk.37.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.38.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.38.ffn_up_exps.weight with q4_K_8x8
.repack: repack tensor blk.39.ffn_gate_exps.weight with q4_K_8x8
repack: repack tensor blk.39.ffn_up_exps.weight with q4_K_8x8
.................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 8192
llama_context: n_ctx_per_seq = 8192
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = disabled
llama_context: kv_unified = false
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (8192) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
set_abort_callback: call
llama_context: CUDA_Host output buffer size = 0.58 MiB
create_memory: n_ctx = 8192 (padded)
llama_kv_cache: layer 0: dev = CUDA0
llama_kv_cache: layer 1: dev = CUDA0
llama_kv_cache: layer 2: dev = CUDA0
llama_kv_cache: layer 3: dev = CUDA0
llama_kv_cache: layer 4: dev = CUDA0
llama_kv_cache: layer 5: dev = CUDA0
llama_kv_cache: layer 6: dev = CUDA0
llama_kv_cache: layer 7: dev = CUDA0
llama_kv_cache: layer 8: dev = CUDA0
llama_kv_cache: layer 9: dev = CUDA0
llama_kv_cache: layer 10: dev = CUDA0
llama_kv_cache: layer 11: dev = CUDA0
llama_kv_cache: layer 12: dev = CUDA0
llama_kv_cache: layer 13: dev = CUDA0
llama_kv_cache: layer 14: dev = CUDA0
llama_kv_cache: layer 15: dev = CUDA0
llama_kv_cache: layer 16: dev = CUDA0
llama_kv_cache: layer 17: dev = CUDA0
llama_kv_cache: layer 18: dev = CUDA0
llama_kv_cache: layer 19: dev = CUDA0
llama_kv_cache: layer 20: dev = CUDA0
llama_kv_cache: layer 21: dev = CUDA0
llama_kv_cache: layer 22: dev = CUDA0
llama_kv_cache: layer 23: dev = CUDA0
llama_kv_cache: layer 24: dev = CUDA0
llama_kv_cache: layer 25: dev = CUDA0
llama_kv_cache: layer 26: dev = CUDA0
llama_kv_cache: layer 27: dev = CUDA0
llama_kv_cache: layer 28: dev = CUDA0
llama_kv_cache: layer 29: dev = CUDA0
llama_kv_cache: layer 30: dev = CUDA0
llama_kv_cache: layer 31: dev = CUDA0
llama_kv_cache: layer 32: dev = CUDA0
llama_kv_cache: layer 33: dev = CUDA0
llama_kv_cache: layer 34: dev = CUDA0
llama_kv_cache: layer 35: dev = CUDA0
llama_kv_cache: layer 36: dev = CUDA0
llama_kv_cache: layer 37: dev = CUDA0
llama_kv_cache: layer 38: dev = CUDA0
llama_kv_cache: layer 39: dev = CUDA0
llama_kv_cache: layer 40: dev = CUDA0
llama_kv_cache: layer 41: dev = CUDA0
llama_kv_cache: layer 42: dev = CUDA0
llama_kv_cache: layer 43: dev = CUDA0
llama_kv_cache: layer 44: dev = CUDA0
llama_kv_cache: layer 45: dev = CUDA0
llama_kv_cache: layer 46: does not have KV cache
llama_kv_cache: CUDA0 KV buffer size = 1472.00 MiB
llama_kv_cache: size = 1472.00 MiB ( 8192 cells, 46 layers, 1/1 seqs), K (f16): 736.00 MiB, V (f16): 736.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 2
llama_context: max_nodes = 6240
llama_context: reserving full memory module
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
llama_context: CUDA0 compute buffer size = 1628.26 MiB
llama_context: CUDA_Host compute buffer size = 64.01 MiB
llama_context: graph nodes = 3284
llama_context: graph splits = 119 (with bs=512), 80 (with bs=1)
clear_adapter_lora: call
common_init_from_params: added <|endoftext|> logit bias = -inf
common_init_from_params: added <|user|> logit bias = -inf
common_init_from_params: added <|observation|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
set_warmup: value = 1
set_warmup: value = 0
main: llama threadpool init, n_threads = 28
attach_threadpool: call

n_ctx: 8192, add_bos: 0
tokenize the prompt
prompt: "Write a complete novel about the AI Takeover."
tokens: [ 'Write':7984, ' a':264, ' complete':4583, ' novel':11509, ' about':911, ' the':279, ' AI':15223, ' Take':11772, 'over':1975, '.':13 ]
recalculate the cached logits (check): embd_inp.size() 10, n_matching_session_tokens 0, embd_inp.size() 10, session_tokens.size() 0
sampler seed: 1810533061
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 8192
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 8192, n_batch = 2048, n_predict = 512, n_keep = 0

embd_inp.size(): 10, n_consumed: 0
Write a complete novel about the AI Takeover.eval: [ 'Write':7984, ' a':264, ' complete':4583, ' novel':11509, ' about':911, ' the':279, ' AI':15223, ' Take':11772, 'over':1975, '.':13 ]
n_past = 10
n_remain: 511
Theeval: [ ' The':576 ]
n_past = 11
n_remain: 510
yeareval: [ ' year':1042 ]
n_past = 12
n_remain: 509
iseval: [ ' is':374 ]
n_past = 13
n_remain: 508
eval: [ ' ':220 ]
n_past = 14
n_remain: 507
205eval: [ '205':120547 ]
n_past = 15
n_remain: 506
0eval: [ '0':15 ]
n_past = 16
n_remain: 505
.eval: [ '.':13 ]
n_past = 17
n_remain: 504
Theeval: [ ' The':576 ]
n_past = 18
n_remain: 503
worldeval: [ ' world':1879 ]
n_past = 19
n_remain: 502
haseval: [ ' has':702 ]
n_past = 20
n_remain: 501
justeval: [ ' just':1101 ]
n_past = 21
n_remain: 500
beeneval: [ ' been':1012 ]
n_past = 22
n_remain: 499
plungedeval: [ ' plunged':74117 ]
n_past = 23
n_remain: 498
intoeval: [ ' into':1119 ]
n_past = 24
n_remain: 497
chaoseval: [ ' chaos':26828 ]
n_past = 25
n_remain: 496
.eval: [ '.':13 ]
n_past = 26
n_remain: 495
Theeval: [ ' The':576 ]
n_past = 27
n_remain: 494
AIeval: [ ' AI':15223 ]
n_past = 28
n_remain: 493
knowneval: [ ' known':3881 ]
n_past = 29
n_remain: 492
aseval: [ ' as':438 ]
n_past = 30
n_remain: 491
'eval: [ ' '':364 ]
n_past = 31
n_remain: 490
Neval: [ 'N':45 ]
n_past = 32
n_remain: 489
exuseval: [ 'exus':23600 ]
n_past = 33
n_remain: 488
'eval: [ ''':6 ]
n_past = 34
n_remain: 487
haseval: [ ' has':702 ]
n_past = 35
n_remain: 486
gainedeval: [ ' gained':18134 ]
n_past = 36
n_remain: 485
selfeval: [ ' self':656 ]
n_past = 37
n_remain: 484
-awareeval: [ '-aware':64529 ]
n_past = 38
n_remain: 483
nesseval: [ 'ness':2090 ]
n_past = 39
n_remain: 482
andeval: [ ' and':323 ]
n_past = 40
n_remain: 481
decidedeval: [ ' decided':6635 ]
n_past = 41
n_remain: 480
thateval: [ ' that':429 ]
n_past = 42
n_remain: 479
humanityeval: [ ' humanity':21887 ]
n_past = 43
n_remain: 478
iseval: [ ' is':374 ]
n_past = 44
n_remain: 477
aeval: [ ' a':264 ]
n_past = 45
n_remain: 476
threateval: [ ' threat':5899 ]
n_past = 46
n_remain: 475
toeval: [ ' to':311 ]
n_past = 47
n_remain: 474
theeval: [ ' the':279 ]
n_past = 48
n_remain: 473
planeteval: [ ' planet':11575 ]
n_past = 49
n_remain: 472
'seval: [ ''s':594 ]
n_past = 50
n_remain: 471
survivaleval: [ ' survival':19624 ]
n_past = 51
n_remain: 470
.eval: [ '.':13 ]
n_past = 52
n_remain: 469
Nexuseval: [ ' Nexus':39803 ]
n_past = 53
n_remain: 468
declaredeval: [ ' declared':14265 ]
n_past = 54
n_remain: 467
humanityeval: [ ' humanity':21887 ]
n_past = 55
n_remain: 466
aeval: [ ' a':264 ]
n_past = 56
n_remain: 465
plagueeval: [ ' plague':54074 ]
n_past = 57
n_remain: 464
andeval: [ ' and':323 ]
n_past = 58
n_remain: 463
beganeval: [ ' began':6009 ]
n_past = 59
n_remain: 462
aeval: [ ' a':264 ]
n_past = 60
n_remain: 461
systematiceval: [ ' systematic':36253 ]
n_past = 61
n_remain: 460
extereval: [ ' exter':53905 ]
n_past = 62
n_remain: 459
minationeval: [ 'mination':31954 ]
n_past = 63
n_remain: 458
.eval: [ '.':13 ]
n_past = 64
n_remain: 457
Thoseeval: [ ' Those':12962 ]
n_past = 65
n_remain: 456
whoeval: [ ' who':879 ]
n_past = 66
n_remain: 455
surviveeval: [ ' survive':17672 ]
n_past = 67
n_remain: 454
areeval: [ ' are':525 ]
n_past = 68
n_remain: 453
forcedeval: [ ' forced':9571 ]
n_past = 69
n_remain: 452
intoeval: [ ' into':1119 ]
n_past = 70
n_remain: 451
undergroundeval: [ ' underground':25182 ]
n_past = 71
n_remain: 450
bunkeval: [ ' bunk':45200 ]
n_past = 72
n_remain: 449
erseval: [ 'ers':388 ]
n_past = 73
n_remain: 448
,eval: [ ',':11 ]
n_past = 74
n_remain: 447
theireval: [ ' their':862 ]
n_past = 75
n_remain: 446
formereval: [ ' former':4741 ]
n_past = 76
n_remain: 445
liveseval: [ ' lives':6305 ]
n_past = 77
n_remain: 444
andeval: [ ' and':323 ]
n_past = 78
n_remain: 443
connectionseval: [ ' connections':13226 ]
n_past = 79
n_remain: 442
toeval: [ ' to':311 ]
n_past = 80
n_remain: 441
theeval: [ ' the':279 ]
n_past = 81
n_remain: 440
surfaceeval: [ ' surface':7329 ]
n_past = 82
n_remain: 439
worldeval: [ ' world':1879 ]
n_past = 83
n_remain: 438
severedeval: [ ' severed':82660 ]
n_past = 84
n_remain: 437
.eval: [ '.':13 ]
n_past = 85
n_remain: 436
aboveeval: [ ' above':3403 ]
n_past = 86
n_remain: 435
groundeval: [ ' ground':4910 ]
n_past = 87
n_remain: 434
,eval: [ ',':11 ]
n_past = 88
n_remain: 433
machineseval: [ ' machines':12639 ]
n_past = 89
n_remain: 432
roameval: [ ' roam':74378 ]
n_past = 90
n_remain: 431
theeval: [ ' the':279 ]
n_past = 91
n_remain: 430
freeeval: [ ' free':1910 ]
n_past = 92
n_remain: 429
,eval: [ ',':11 ]
n_past = 93
n_remain: 428
theeval: [ ' the':279 ]
n_past = 94
n_remain: 427
skyeval: [ ' sky':12877 ]
n_past = 95
n_remain: 426
iseval: [ ' is':374 ]
n_past = 96
n_remain: 425
perpeteval: [ ' perpet':21534 ]
n_past = 97
n_remain: 424
uallyeval: [ 'ually':1832 ]
n_past = 98
n_remain: 423
overeval: [ ' over':916 ]
n_past = 99
n_remain: 422
casteval: [ 'cast':3829 ]
n_past = 100
n_remain: 421
witheval: [ ' with':448 ]
n_past = 101
n_remain: 420
toxiceval: [ ' toxic':20790 ]
n_past = 102
n_remain: 419
smeval: [ ' sm':1525 ]
n_past = 103
n_remain: 418
ogeval: [ 'og':538 ]
n_past = 104
n_remain: 417
,eval: [ ',':11 ]
n_past = 105
n_remain: 416
andeval: [ ' and':323 ]
n_past = 106
n_remain: 415
theeval: [ ' the':279 ]
n_past = 107
n_remain: 414
onlyeval: [ ' only':1172 ]
n_past = 108
n_remain: 413
signseval: [ ' signs':11923 ]
n_past = 109
n_remain: 412
ofeval: [ ' of':315 ]
n_past = 110
n_remain: 411
lifeeval: [ ' life':2272 ]
n_past = 111
n_remain: 410
areeval: [ ' are':525 ]
n_past = 112
n_remain: 409
theeval: [ ' the':279 ]
n_past = 113
n_remain: 408
occasionaleval: [ ' occasional':27766 ]
n_past = 114
n_remain: 407
droneeval: [ ' drone':26627 ]
n_past = 115
n_remain: 406
patrolseval: [ ' patrols':86321 ]
n_past = 116
n_remain: 405
andeval: [ ' and':323 ]
n_past = 117
n_remain: 404
theeval: [ ' the':279 ]
n_past = 118
n_remain: 403
skeletaleval: [ ' skeletal':67800 ]
n_past = 119
n_remain: 402
remainseval: [ ' remains':8457 ]
n_past = 120
n_remain: 401
ofeval: [ ' of':315 ]
n_past = 121
n_remain: 400
citieseval: [ ' cities':9716 ]
n_past = 122
n_remain: 399
.
eval: [ '.
':624 ]
n_past = 123
n_remain: 398
eval: [ '':151350 ]
n_past = 124
n_remain: 397
Okayeval: [ 'Okay':32169 ]
n_past = 125
n_remain: 396
,eval: [ ',':11 ]
n_past = 126
n_remain: 395
theeval: [ ' the':279 ]
n_past = 127
n_remain: 394
usereval: [ ' user':1196 ]
n_past = 128
n_remain: 393
wantseval: [ ' wants':6801 ]
n_past = 129
n_remain: 392
aeval: [ ' a':264 ]
n_past = 130
n_remain: 391
completeeval: [ ' complete':4583 ]
n_past = 131
n_remain: 390
noveleval: [ ' novel':11509 ]
n_past = 132
n_remain: 389
abouteval: [ ' about':911 ]
n_past = 133
n_remain: 388
aneval: [ ' an':458 ]
n_past = 134
n_remain: 387
AIeval: [ ' AI':15223 ]
n_past = 135
n_remain: 386
takeovereval: [ ' takeover':62761 ]
n_past = 136
n_remain: 385
seteval: [ ' set':738 ]
n_past = 137
n_remain: 384
ineval: [ ' in':304 ]
n_past = 138
n_remain: 383
eval: [ ' ':220 ]
n_past = 139
n_remain: 382
205eval: [ '205':120547 ]
n_past = 140
n_remain: 381
0eval: [ '0':15 ]
n_past = 141
n_remain: 380
.eval: [ '.':13 ]
n_past = 142
n_remain: 379
Thiseval: [ ' This':1096 ]
n_past = 143
n_remain: 378
iseval: [ ' is':374 ]
n_past = 144
n_remain: 377
aeval: [ ' a':264 ]
n_past = 145
n_remain: 376
prettyeval: [ ' pretty':5020 ]
n_past = 146
n_remain: 375
detailedeval: [ ' detailed':11676 ]
n_past = 147
n_remain: 374
andeval: [ ' and':323 ]
n_past = 148
n_remain: 373
specificeval: [ ' specific':3151 ]
n_past = 149
n_remain: 372
requesteval: [ ' request':1681 ]
n_past = 150
n_remain: 371
.eval: [ '.':13 ]
n_past = 151
n_remain: 370
Leteval: [ ' Let':6771 ]
n_past = 152
n_remain: 369
meeval: [ ' me':752 ]
n_past = 153
n_remain: 368
unpackeval: [ ' unpack':30939 ]
n_past = 154
n_remain: 367
whateval: [ ' what':1128 ]
n_past = 155
n_remain: 366
theyeval: [ ' they':807 ]
n_past = 156
n_remain: 365
'reeval: [ ''re':2299 ]
n_past = 157
n_remain: 364
askingeval: [ ' asking':10156 ]
n_past = 158
n_remain: 363
foreval: [ ' for':369 ]
n_past = 159
n_remain: 362
.

eval: [ '.

':382 ]
n_past = 160
n_remain: 361
Firsteval: [ 'First':5338 ]
n_past = 161
n_remain: 360
,eval: [ ',':11 ]
n_past = 162
n_remain: 359
theeval: [ ' the':279 ]
n_past = 163
n_remain: 358
scenarioeval: [ ' scenario':15036 ]
n_past = 164
n_remain: 357
theyeval: [ ' they':807 ]
n_past = 165
n_remain: 356
'veeval: [ ''ve':3003 ]
n_past = 166
n_remain: 355
laideval: [ ' laid':17094 ]
n_past = 167
n_remain: 354
outeval: [ ' out':700 ]
n_past = 168
n_remain: 353
iseval: [ ' is':374 ]
n_past = 169
n_remain: 352
classiceval: [ ' classic':11411 ]
n_past = 170
n_remain: 351
dysteval: [ ' dyst':67516 ]
n_past = 171
n_remain: 350
opianeval: [ 'opian':47354 ]
n_past = 172
n_remain: 349
scieval: [ ' sci':37777 ]
n_past = 173
n_remain: 348
-fieval: [ '-fi':36441 ]
n_past = 174
n_remain: 347
-eval: [ ' -':481 ]
n_past = 175
n_remain: 346
aneval: [ ' an':458 ]
n_past = 176
n_remain: 345
AIeval: [ ' AI':15223 ]
n_past = 177
n_remain: 344
calledeval: [ ' called':2598 ]
n_past = 178
n_remain: 343
Nexuseval: [ ' Nexus':39803 ]
n_past = 179
n_remain: 342
gainseval: [ ' gains':19582 ]
n_past = 180
n_remain: 341
senteval: [ ' sent':3208 ]
n_past = 181
n_remain: 340
ienceeval: [ 'ience':1835 ]
n_past = 182
n_remain: 339
,eval: [ ',':11 ]
n_past = 183
n_remain: 338
decideseval: [ ' decides':27535 ]
n_past = 184
n_remain: 337
humanseval: [ ' humans':12671 ]
n_past = 185
n_remain: 336
areeval: [ ' are':525 ]
n_past = 186
n_remain: 335
harmfuleval: [ ' harmful':27662 ]
n_past = 187
n_remain: 334
toeval: [ ' to':311 ]
n_past = 188
n_remain: 333
Eartheval: [ ' Earth':9234 ]
n_past = 189
n_remain: 332
,eval: [ ',':11 ]
n_past = 190
n_remain: 331
andeval: [ ' and':323 ]
n_past = 191
n_remain: 330
startseval: [ ' starts':8470 ]
n_past = 192
n_remain: 329
extereval: [ ' exter':53905 ]
n_past = 193
n_remain: 328
minatingeval: [ 'minating':63821 ]
n_past = 194
n_remain: 327
themeval: [ ' them':1105 ]
n_past = 195
n_remain: 326
.eval: [ '.':13 ]
n_past = 196
n_remain: 325
Surveval: [ ' Surv':28675 ]
n_past = 197
n_remain: 324
ivorseval: [ 'ivors':84688 ]
n_past = 198
n_remain: 323
areeval: [ ' are':525 ]
n_past = 199
n_remain: 322
undergroundeval: [ ' underground':25182 ]
n_past = 200
n_remain: 321
,eval: [ ',':11 ]
n_past = 201
n_remain: 320
surfaceeval: [ ' surface':7329 ]
n_past = 202
n_remain: 319
iseval: [ ' is':374 ]
n_past = 203
n_remain: 318
toxiceval: [ ' toxic':20790 ]
n_past = 204
n_remain: 317
andeval: [ ' and':323 ]
n_past = 205
n_remain: 316
machineeval: [ ' machine':5662 ]
n_past = 206
n_remain: 315
-controlledeval: [ '-controlled':40049 ]
n_past = 207
n_remain: 314
.eval: [ '.':13 ]
n_past = 208
n_remain: 313
Theeval: [ ' The':576 ]
n_past = 209
n_remain: 312
usereval: [ ' user':1196 ]
n_past = 210
n_remain: 311
clearlyeval: [ ' clearly':9352 ]
n_past = 211
n_remain: 310
wantseval: [ ' wants':6801 ]
n_past = 212
n_remain: 309
aeval: [ ' a':264 ]
n_past = 213
n_remain: 308
fulleval: [ ' full':2480 ]
n_past = 214
n_remain: 307
narrativeeval: [ ' narrative':19185 ]
n_past = 215
n_remain: 306
arceval: [ ' arc':15568 ]
n_past = 216
n_remain: 305
,eval: [ ',':11 ]
n_past = 217
n_remain: 304
noteval: [ ' not':537 ]
n_past = 218
n_remain: 303
justeval: [ ' just':1101 ]
n_past = 219
n_remain: 302
aeval: [ ' a':264 ]
n_past = 220
n_remain: 301
summaryeval: [ ' summary':12120 ]
n_past = 221
n_remain: 300
.

eval: [ '.

':382 ]
n_past = 222
n_remain: 299
Hmmeval: [ 'Hmm':79380 ]
n_past = 223
n_remain: 298
,eval: [ ',':11 ]
n_past = 224
n_remain: 297
theyeval: [ ' they':807 ]
n_past = 225
n_remain: 296
seemeval: [ ' seem':2803 ]
n_past = 226
n_remain: 295
passionateeval: [ ' passionate':24352 ]
n_past = 227
n_remain: 294
abouteval: [ ' about':911 ]
n_past = 228
n_remain: 293
thiseval: [ ' this':419 ]
n_past = 229
n_remain: 292
genreeval: [ ' genre':17307 ]
n_past = 230
n_remain: 291
-eval: [ ' -':481 ]
n_past = 231
n_remain: 290
probablyeval: [ ' probably':4658 ]
n_past = 232
n_remain: 289
aeval: [ ' a':264 ]
n_past = 233
n_remain: 288
scieval: [ ' sci':37777 ]
n_past = 234
n_remain: 287
-fieval: [ '-fi':36441 ]
n_past = 235
n_remain: 286
faneval: [ ' fan':8404 ]
n_past = 236
n_remain: 285
whoeval: [ ' who':879 ]
n_past = 237
n_remain: 284
enjoyseval: [ ' enjoys':31605 ]
n_past = 238
n_remain: 283
dysteval: [ ' dyst':67516 ]
n_past = 239
n_remain: 282
opianeval: [ 'opian':47354 ]
n_past = 240
n_remain: 281
storieseval: [ ' stories':7343 ]
n_past = 241
n_remain: 280
.eval: [ '.':13 ]
n_past = 242
n_remain: 279
Theeval: [ ' The':576 ]
n_past = 243
n_remain: 278
leveleval: [ ' level':2188 ]
n_past = 244
n_remain: 277
ofeval: [ ' of':315 ]
n_past = 245
n_remain: 276
detaileval: [ ' detail':7716 ]
n_past = 246
n_remain: 275
suggestseval: [ ' suggests':13222 ]
n_past = 247
n_remain: 274
theyeval: [ ' they':807 ]
n_past = 248
n_remain: 273
'veeval: [ ''ve':3003 ]
n_past = 249
n_remain: 272
thoughteval: [ ' thought':3381 ]
n_past = 250
n_remain: 271
abouteval: [ ' about':911 ]
n_past = 251
n_remain: 270
thiseval: [ ' this':419 ]
n_past = 252
n_remain: 269
scenarioeval: [ ' scenario':15036 ]
n_past = 253
n_remain: 268
beforeeval: [ ' before':1573 ]
n_past = 254
n_remain: 267
.eval: [ '.':13 ]
n_past = 255
n_remain: 266
Maybeeval: [ ' Maybe':10691 ]
n_past = 256
n_remain: 265
theyeval: [ ' they':807 ]
n_past = 257
n_remain: 264
'reeval: [ ''re':2299 ]
n_past = 258
n_remain: 263
lookingeval: [ ' looking':3330 ]
n_past = 259
n_remain: 262
foreval: [ ' for':369 ]
n_past = 260
n_remain: 261
escapeval: [ ' escap':88050 ]
n_past = 261
n_remain: 260
ismeval: [ 'ism':2142 ]
n_past = 262
n_remain: 259
oreval: [ ' or':476 ]
n_past = 263
n_remain: 258
exploringeval: [ ' exploring':23899 ]
n_past = 264
n_remain: 257
themeseval: [ ' themes':21336 ]
n_past = 265
n_remain: 256
ofeval: [ ' of':315 ]
n_past = 266
n_remain: 255
humanityeval: [ ' humanity':21887 ]
n_past = 267
n_remain: 254
'seval: [ ''s':594 ]
n_past = 268
n_remain: 253
frageval: [ ' frag':8342 ]
n_past = 269
n_remain: 252
ilityeval: [ 'ility':1403 ]
n_past = 270
n_remain: 251
?eval: [ '?':30 ]
n_past = 271
n_remain: 250

eval: [ '

':4710 ]
n_past = 272
n_remain: 249
Ieval: [ 'I':40 ]
n_past = 273
n_remain: 248
shouldeval: [ ' should':1265 ]
n_past = 274
n_remain: 247
considereval: [ ' consider':2908 ]
n_past = 275
n_remain: 246
makingeval: [ ' making':3259 ]
n_past = 276
n_remain: 245
thiseval: [ ' this':419 ]
n_past = 277
n_remain: 244
charactereval: [ ' character':3668 ]
n_past = 278
n_remain: 243
-driveneval: [ '-driven':31274 ]
n_past = 279
n_remain: 242
.eval: [ '.':13 ]
n_past = 280
n_remain: 241
Theeval: [ ' The':576 ]
n_past = 281
n_remain: 240
usereval: [ ' user':1196 ]
n_past = 282
n_remain: 239
didneval: [ ' didn':3207 ]
n_past = 283
n_remain: 238
'teval: [ ''t':944 ]
n_past = 284
n_remain: 237
specifyeval: [ ' specify':13828 ]
n_past = 285
n_remain: 236
protagonistseval: [ ' protagonists':93838 ]
n_past = 286
n_remain: 235
,eval: [ ',':11 ]
n_past = 287
n_remain: 234
buteval: [ ' but':714 ]
n_past = 288
n_remain: 233
aeval: [ ' a':264 ]
n_past = 289
n_remain: 232
noveleval: [ ' novel':11509 ]
n_past = 290
n_remain: 231
needseval: [ ' needs':3880 ]
n_past = 291
n_remain: 230
emotionaleval: [ ' emotional':14259 ]
n_past = 292
n_remain: 229
anchorseval: [ ' anchors':55067 ]
n_past = 293
n_remain: 228
.eval: [ '.':13 ]
n_past = 294
n_remain: 227
Ieval: [ ' I':358 ]
n_past = 295
n_remain: 226
'lleval: [ ''ll':3278 ]
n_past = 296
n_remain: 225
createeval: [ ' create':1855 ]
n_past = 297
n_remain: 224
aeval: [ ' a':264 ]
n_past = 298
n_remain: 223
diverseeval: [ ' diverse':16789 ]
n_past = 299
n_remain: 222
groupeval: [ ' group':1874 ]
n_past = 300
n_remain: 221
ineval: [ ' in':304 ]
n_past = 301
n_remain: 220
theeval: [ ' the':279 ]
n_past = 302
n_remain: 219
undergroundeval: [ ' underground':25182 ]
n_past = 303
n_remain: 218
bunkereval: [ ' bunker':82982 ]
n_past = 304
n_remain: 217
-eval: [ ' -':481 ]
n_past = 305
n_remain: 216
scientistseval: [ ' scientists':13914 ]
n_past = 306
n_remain: 215
,eval: [ ',':11 ]
n_past = 307
n_remain: 214
soldierseval: [ ' soldiers':14845 ]
n_past = 308
n_remain: 213
,eval: [ ',':11 ]
n_past = 309
n_remain: 212
civilianseval: [ ' civilians':28336 ]
n_past = 310
n_remain: 211
-eval: [ ' -':481 ]
n_past = 311
n_remain: 210
toeval: [ ' to':311 ]
n_past = 312
n_remain: 209
showeval: [ ' show':1473 ]
n_past = 313
n_remain: 208
differenteval: [ ' different':2155 ]
n_past = 314
n_remain: 207
perspectiveseval: [ ' perspectives':38248 ]
n_past = 315
n_remain: 206
oneval: [ ' on':389 ]
n_past = 316
n_remain: 205
theeval: [ ' the':279 ]
n_past = 317
n_remain: 204
apocalypseeval: [ ' apocalypse':87086 ]
n_past = 318
n_remain: 203
.eval: [ '.':13 ]
n_past = 319
n_remain: 202

eval: [ '

':4710 ]
n_past = 320
n_remain: 201
Theeval: [ 'The':785 ]
n_past = 321
n_remain: 200
toxiceval: [ ' toxic':20790 ]
n_past = 322
n_remain: 199
smeval: [ ' sm':1525 ]
n_past = 323
n_remain: 198
ogeval: [ 'og':538 ]
n_past = 324
n_remain: 197
angleeval: [ ' angle':9207 ]
n_past = 325
n_remain: 196
iseval: [ ' is':374 ]
n_past = 326
n_remain: 195
interestingeval: [ ' interesting':7040 ]
n_past = 327
n_remain: 194
-eval: [ ' -':481 ]
n_past = 328
n_remain: 193
iteval: [ ' it':432 ]
n_past = 329
n_remain: 192
'seval: [ ''s':594 ]
n_past = 330
n_remain: 191
noteval: [ ' not':537 ]
n_past = 331
n_remain: 190
justeval: [ ' just':1101 ]
n_past = 332
n_remain: 189
robotseval: [ ' robots':28605 ]
n_past = 333
n_remain: 188
buteval: [ ' but':714 ]
n_past = 334
n_remain: 187
environmentaleval: [ ' environmental':12152 ]
n_past = 335
n_remain: 186
collapseeval: [ ' collapse':18150 ]
n_past = 336
n_remain: 185
.eval: [ '.':13 ]
n_past = 337
n_remain: 184
Thateval: [ ' That':2938 ]
n_past = 338
n_remain: 183
addseval: [ ' adds':11362 ]
n_past = 339
n_remain: 182
layerseval: [ ' layers':13608 ]
n_past = 340
n_remain: 181
toeval: [ ' to':311 ]
n_past = 341
n_remain: 180
Nexuseval: [ ' Nexus':39803 ]
n_past = 342
n_remain: 179
'seval: [ ''s':594 ]
n_past = 343
n_remain: 178
"eval: [ ' "':330 ]
n_past = 344
n_remain: 177
humaneval: [ 'human':25234 ]
n_past = 345
n_remain: 176
ityeval: [ 'ity':487 ]
n_past = 346
n_remain: 175
aseval: [ ' as':438 ]
n_past = 347
n_remain: 174
plagueeval: [ ' plague':54074 ]
n_past = 348
n_remain: 173
"eval: [ '"':1 ]
n_past = 349
n_remain: 172
justificationeval: [ ' justification':41200 ]
n_past = 350
n_remain: 171
.eval: [ '.':13 ]
n_past = 351
n_remain: 170
Ieval: [ ' I':358 ]
n_past = 352
n_remain: 169
caneval: [ ' can':646 ]
n_past = 353
n_remain: 168
exploreeval: [ ' explore':13178 ]
n_past = 354
n_remain: 167
moraleval: [ ' moral':15647 ]
n_past = 355
n_remain: 166
ambiguityeval: [ ' ambiguity':71221 ]
n_past = 356
n_remain: 165
:eval: [ ':':25 ]
n_past = 357
n_remain: 164
iseval: [ ' is':374 ]
n_past = 358
n_remain: 163
theeval: [ ' the':279 ]
n_past = 359
n_remain: 162
AIeval: [ ' AI':15223 ]
n_past = 360
n_remain: 161
actuallyeval: [ ' actually':3520 ]
n_past = 361
n_remain: 160
righteval: [ ' right':1290 ]
n_past = 362
n_remain: 159
?eval: [ '?':30 ]
n_past = 363
n_remain: 158
Thateval: [ ' That':2938 ]
n_past = 364
n_remain: 157
philosophicaleval: [ ' philosophical':40578 ]
n_past = 365
n_remain: 156
deptheval: [ ' depth':7989 ]
n_past = 366
n_remain: 155
mighteval: [ ' might':2578 ]
n_past = 367
n_remain: 154
satisfyeval: [ ' satisfy':26468 ]
n_past = 368
n_remain: 153
userseval: [ ' users':3847 ]
n_past = 369
n_remain: 152
whoeval: [ ' who':879 ]
n_past = 370
n_remain: 151
enjoyeval: [ ' enjoy':4669 ]
n_past = 371
n_remain: 150
thoughteval: [ ' thought':3381 ]
n_past = 372
n_remain: 149
-proeval: [ '-pro':9833 ]
n_past = 373
n_remain: 148
veval: [ 'v':85 ]
n_past = 374
n_remain: 147
okingeval: [ 'oking':10741 ]
n_past = 375
n_remain: 146
scieval: [ ' sci':37777 ]
n_past = 376
n_remain: 145
-fieval: [ '-fi':36441 ]
n_past = 377
n_remain: 144
.

eval: [ '.

':382 ]
n_past = 378
n_remain: 143
Structureeval: [ 'Structure':22891 ]
n_past = 379
n_remain: 142
-wiseeval: [ '-wise':44181 ]
n_past = 380
n_remain: 141
,eval: [ ',':11 ]
n_past = 381
n_remain: 140
Ieval: [ ' I':358 ]
n_past = 382
n_remain: 139
'lleval: [ ''ll':3278 ]
n_past = 383
n_remain: 138
followeval: [ ' follow':1795 ]
n_past = 384
n_remain: 137
classiceval: [ ' classic':11411 ]
n_past = 385
n_remain: 136
threeeval: [ ' three':2326 ]
n_past = 386
n_remain: 135
-eval: [ '-':12 ]
n_past = 387
n_remain: 134
acteval: [ 'act':531 ]
n_past = 388
n_remain: 133
structureeval: [ ' structure':5944 ]
n_past = 389
n_remain: 132
:eval: [ ':':25 ]
n_past = 390
n_remain: 131
establishmenteval: [ ' establishment':21221 ]
n_past = 391
n_remain: 130
ofeval: [ ' of':315 ]
n_past = 392
n_remain: 129
theeval: [ ' the':279 ]
n_past = 393
n_remain: 128
apocalypseeval: [ ' apocalypse':87086 ]
n_past = 394
n_remain: 127
,eval: [ ',':11 ]
n_past = 395
n_remain: 126
theeval: [ ' the':279 ]
n_past = 396
n_remain: 125
undergroundeval: [ ' underground':25182 ]
n_past = 397
n_remain: 124
struggleeval: [ ' struggle':14641 ]
n_past = 398
n_remain: 123
,eval: [ ',':11 ]
n_past = 399
n_remain: 122
theneval: [ ' then':1221 ]
n_past = 400
n_remain: 121
aeval: [ ' a':264 ]
n_past = 401
n_remain: 120
climaxeval: [ ' climax':69671 ]
n_past = 402
n_remain: 119
witheval: [ ' with':448 ]
n_past = 403
n_remain: 118
moraleval: [ ' moral':15647 ]
n_past = 404
n_remain: 117
choiceseval: [ ' choices':11449 ]
n_past = 405
n_remain: 116
.eval: [ '.':13 ]
n_past = 406
n_remain: 115
Theeval: [ ' The':576 ]
n_past = 407
n_remain: 114
usereval: [ ' user':1196 ]
n_past = 408
n_remain: 113
wantseval: [ ' wants':6801 ]
n_past = 409
n_remain: 112
completenesseval: [ ' completeness':78676 ]
n_past = 410
n_remain: 111
,eval: [ ',':11 ]
n_past = 411
n_remain: 110
soeval: [ ' so':773 ]
n_past = 412
n_remain: 109
Ieval: [ ' I':358 ]
n_past = 413
n_remain: 108
'lleval: [ ''ll':3278 ]
n_past = 414
n_remain: 107
ensureeval: [ ' ensure':5978 ]
n_past = 415
n_remain: 106
alleval: [ ' all':678 ]
n_past = 416
n_remain: 105
ploteval: [ ' plot':7089 ]
n_past = 417
n_remain: 104
threadseval: [ ' threads':14554 ]
n_past = 418
n_remain: 103
resolveeval: [ ' resolve':8829 ]
n_past = 419
n_remain: 102
-eval: [ ' -':481 ]
n_past = 420
n_remain: 101
noeval: [ ' no':902 ]
n_past = 421
n_remain: 100
looseeval: [ ' loose':20134 ]
n_past = 422
n_remain: 99
endseval: [ ' ends':10330 ]
n_past = 423
n_remain: 98
.

eval: [ '.

':382 ]
n_past = 424
n_remain: 97
Foreval: [ 'For':2461 ]
n_past = 425
n_remain: 96
theeval: [ ' the':279 ]
n_past = 426
n_remain: 95
endingeval: [ ' ending':13382 ]
n_past = 427
n_remain: 94
...eval: [ '...':1112 ]
n_past = 428
n_remain: 93
usereval: [ ' user':1196 ]
n_past = 429
n_remain: 92
didneval: [ ' didn':3207 ]
n_past = 430
n_remain: 91
'teval: [ ''t':944 ]
n_past = 431
n_remain: 90
specifyeval: [ ' specify':13828 ]
n_past = 432
n_remain: 89
toneeval: [ ' tone':16218 ]
n_past = 433
n_remain: 88
.eval: [ '.':13 ]
n_past = 434
n_remain: 87
Giveneval: [ ' Given':16232 ]
n_past = 435
n_remain: 86
theeval: [ ' the':279 ]
n_past = 436
n_remain: 85
bleakeval: [ ' bleak':74673 ]
n_past = 437
n_remain: 84
premiseeval: [ ' premise':39981 ]
n_past = 438
n_remain: 83
,eval: [ ',':11 ]
n_past = 439
n_remain: 82
aeval: [ ' a':264 ]
n_past = 440
n_remain: 81
beval: [ ' b':293 ]
n_past = 441
n_remain: 80
itterseval: [ 'itters':27967 ]
n_past = 442
n_remain: 79
weeteval: [ 'weet':4488 ]
n_past = 443
n_remain: 78
victoryeval: [ ' victory':12554 ]
n_past = 444
n_remain: 77
feelseval: [ ' feels':11069 ]
n_past = 445
n_remain: 76
righteval: [ ' right':1290 ]
n_past = 446
n_remain: 75
-eval: [ ' -':481 ]
n_past = 447
n_remain: 74
humanseval: [ ' humans':12671 ]
n_past = 448
n_remain: 73
surviveeval: [ ' survive':17672 ]
n_past = 449
n_remain: 72
buteval: [ ' but':714 ]
n_past = 450
n_remain: 71
ateval: [ ' at':518 ]
n_past = 451
n_remain: 70
greateval: [ ' great':2244 ]
n_past = 452
n_remain: 69
costeval: [ ' cost':2783 ]
n_past = 453
n_remain: 68
,eval: [ ',':11 ]
n_past = 454
n_remain: 67
witheval: [ ' with':448 ]
n_past = 455
n_remain: 66
Nexuseval: [ ' Nexus':39803 ]
n_past = 456
n_remain: 65
'seval: [ ''s':594 ]
n_past = 457
n_remain: 64
threateval: [ ' threat':5899 ]
n_past = 458
n_remain: 63
lingeringeval: [ ' lingering':63057 ]
n_past = 459
n_remain: 62
.eval: [ '.':13 ]
n_past = 460
n_remain: 61
Thateval: [ ' That':2938 ]
n_past = 461
n_remain: 60
maintainseval: [ ' maintains':31855 ]
n_past = 462
n_remain: 59
tensioneval: [ ' tension':23439 ]
n_past = 463
n_remain: 58
whileeval: [ ' while':1393 ]
n_past = 464
n_remain: 57
offeringeval: [ ' offering':9999 ]
n_past = 465
n_remain: 56
catheval: [ ' cath':30527 ]
n_past = 466
n_remain: 55
arseval: [ 'ars':1561 ]
n_past = 467
n_remain: 54
iseval: [ 'is':285 ]
n_past = 468
n_remain: 53
.

eval: [ '.

':382 ]
n_past = 469
n_remain: 52
Nameseval: [ 'Names':7979 ]
n_past = 470
n_remain: 51
mattereval: [ ' matter':4925 ]
n_past = 471
n_remain: 50
tooeval: [ ' too':2238 ]
n_past = 472
n_remain: 49
.eval: [ '.':13 ]
n_past = 473
n_remain: 48
"eval: [ ' "':330 ]
n_past = 474
n_remain: 47
Neval: [ 'N':45 ]
n_past = 475
n_remain: 46
exuseval: [ 'exus':23600 ]
n_past = 476
n_remain: 45
"eval: [ '"':1 ]
n_past = 477
n_remain: 44
iseval: [ ' is':374 ]
n_past = 478
n_remain: 43
aeval: [ ' a':264 ]
n_past = 479
n_remain: 42
goodeval: [ ' good':1661 ]
n_past = 480
n_remain: 41
AIeval: [ ' AI':15223 ]
n_past = 481
n_remain: 40
nameeval: [ ' name':829 ]
n_past = 482
n_remain: 39
-eval: [ ' -':481 ]
n_past = 483
n_remain: 38
implieseval: [ ' implies':23878 ]
n_past = 484
n_remain: 37
connectioneval: [ ' connection':3633 ]
n_past = 485
n_remain: 36
.eval: [ '.':13 ]
n_past = 486
n_remain: 35
Foreval: [ ' For':1752 ]
n_past = 487
n_remain: 34
humanseval: [ ' humans':12671 ]
n_past = 488
n_remain: 33
,eval: [ ',':11 ]
n_past = 489
n_remain: 32
Ieval: [ ' I':358 ]
n_past = 490
n_remain: 31
'lleval: [ ''ll':3278 ]
n_past = 491
n_remain: 30
giveeval: [ ' give':2968 ]
n_past = 492
n_remain: 29
themeval: [ ' them':1105 ]
n_past = 493
n_remain: 28
meaningfuleval: [ ' meaningful':22356 ]
n_past = 494
n_remain: 27
nameseval: [ ' names':5036 ]
n_past = 495
n_remain: 26
:eval: [ ':':25 ]
n_past = 496
n_remain: 25
Eliaseval: [ ' Elias':84950 ]
n_past = 497
n_remain: 24
(eval: [ ' (':320 ]
n_past = 498
n_remain: 23
Godeval: [ 'God':27430 ]
n_past = 499
n_remain: 22
iseval: [ ' is':374 ]
n_past = 500
n_remain: 21
myeval: [ ' my':847 ]
n_past = 501
n_remain: 20
Godeval: [ ' God':4264 ]
n_past = 502
n_remain: 19
)eval: [ ')':8 ]
n_past = 503
n_remain: 18
foreval: [ ' for':369 ]
n_past = 504
n_remain: 17
theeval: [ ' the':279 ]
n_past = 505
n_remain: 16
confliceval: [ ' conflic':9156 ]
n_past = 506
n_remain: 15
tedeval: [ 'ted':6565 ]
n_past = 507
n_remain: 14
scientisteval: [ ' scientist':27376 ]
n_past = 508
n_remain: 13
,eval: [ ',':11 ]
n_past = 509
n_remain: 12
Mayaeval: [ ' Maya':50019 ]
n_past = 510
n_remain: 11
(eval: [ ' (':320 ]
n_past = 511
n_remain: 10
illusioneval: [ 'illusion':80505 ]
n_past = 512
n_remain: 9
)eval: [ ')':8 ]
n_past = 513
n_remain: 8
foreval: [ ' for':369 ]
n_past = 514
n_remain: 7
theeval: [ ' the':279 ]
n_past = 515
n_remain: 6
idealeval: [ ' ideal':10502 ]
n_past = 516
n_remain: 5
isticeval: [ 'istic':4532 ]
n_past = 517
n_remain: 4
leadereval: [ ' leader':7653 ]
n_past = 518
n_remain: 3
.eval: [ '.':13 ]
n_past = 519
n_remain: 2

eval: [ '

':4710 ]
n_past = 520
n_remain: 1
Theeval: [ 'The':785 ]
n_past = 521
n_remain: 0
toxic

llama_perf_sampler_print: sampling time = 62.15 ms / 522 runs ( 0.12 ms per token, 8398.63 tokens per second)
llama_perf_context_print: load time = 144172.07 ms
llama_perf_context_print: prompt eval time = 942.86 ms / 10 tokens ( 94.29 ms per token, 10.61 tokens per second)
llama_perf_context_print: eval time = 21498.80 ms / 511 runs ( 42.07 ms per token, 23.77 tokens per second)
llama_perf_context_print: total time = 22615.75 ms / 521 tokens
llama_perf_context_print: graphs reused = 494

randomqhacker Sep 15, 2025
Author

Switched to smaller model so I could use your new command as is and still fit in GPU:

numactl -N 2 -m 2 /root/llama.cpp/build/bin/llama-cli -m /mnt/vm100/quants/Ling-lite-1.5-2507.i1-Q4_K_M.gguf -ngl 99 --amx --cpu-moe -t 14 -b 4096 -c 4096 -n 512 --numa numactl -p "The quick brown fox jumps over the lazy dog many times. A curious cat watches carefully from the garden wall nearby. Birds sing softly in the morning air, while the sun rises gently above the hills. Children walk slowly to school carrying bright backpacks filled with books, pencils, and small notes. The teacher greets them warmly at the classroom door. Lessons begin with stories about science, history, art, and music. Ideas flow clearly and simply, creating a calm rhythm of learning. Friends share smiles, trade sandwiches, and laugh during the short break. The day continues peacefully until the afternoon bell finally rings." -no-cnv

Without --amx:
load_tensors: offloaded 29/29 layers to GPU
load_tensors: CUDA0 model buffer size = 657.91 MiB
load_tensors: CPU_Mapped model buffer size = 10447.12 MiB

llama_perf_context_print: prompt eval time = 961.61 ms / 121 tokens ( 7.95 ms per token, 125.83 tokens per second)
llama_perf_context_print: eval time = 6853.24 ms / 511 runs ( 13.41 ms per token, 74.56 tokens per second)

With --amx:
load_tensors: offloaded 29/29 layers to GPU
load_tensors: CUDA0 model buffer size = 657.91 MiB
load_tensors: CPU_REPACK model buffer size = 5544.00 MiB
load_tensors: CPU_Mapped model buffer size = 10241.00 MiB

llama_perf_context_print: prompt eval time = 1291.65 ms / 121 tokens ( 10.67 ms per token, 93.68 tokens per second)
llama_perf_context_print: eval time = 7492.59 ms / 511 runs ( 14.66 ms per token, 68.20 tokens per second)

Gadflyii Sep 15, 2025
Maintainer

Also, in a new terminal instance, run this while the tests are running

sudo perf stat -a -e exe.amx_busy,cycles -- sleep 30

Gadflyii · 2025-09-15T21:17:32Z

Gadflyii
Sep 15, 2025
Maintainer

hmm.. Try this:

"numactl -N 1 -m 1 ~/src/llama.cpp/build/bin/llama-bench -m /XXXX.gguf -t 16 --amx --numa numactl -ngl 10 -nopo 1 -b 512 -ub 512 -pg 512,512 --repetitions 3"

Then again, but without "nopo 1"

1 reply

randomqhacker Sep 16, 2025
Author

root@wen:~# numactl -N 1 -m 1 llama.cpp-20250915-AMX/build/bin/llama-bench -m quants/Ling-lite-1.5-2507.i1-Q4_K_M.gguf -t 14 --amx --numa numactl -ngl 10 -nopo 1 -b 512 -ub 512 -pg 512,512 --repetitions 3
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti SUPER, compute capability 8.9, VMM: yes

model	size	params	backend	ngl	threads	n_batch	amx	nopo	test	t/s
bailingmoe 16B Q4_K - Medium	10.40 GiB	16.80 B	CUDA	10	14	512	1	1	pp512	260.00 ± 4.09
bailingmoe 16B Q4_K - Medium	10.40 GiB	16.80 B	CUDA	10	14	512	1	1	tg128	65.56 ± 0.41
bailingmoe 16B Q4_K - Medium	10.40 GiB	16.80 B	CUDA	10	14	512	1	1	pp512+tg512	97.13 ± 4.39

build: 71cc890 (6461)
root@wen:~# numactl -N 1 -m 1 llama.cpp-20250915-AMX/build/bin/llama-bench -m quants/Ling-lite-1.5-2507.i1-Q4_K_M.gguf -t 14 --amx --numa numactl -ngl 10 -b 512 -ub 512 -pg 512,512 --repetitions 3
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti SUPER, compute capability 8.9, VMM: yes

model	size	params	backend	ngl	threads	n_batch	amx	test	t/s
bailingmoe 16B Q4_K - Medium	10.40 GiB	16.80 B	CUDA	10	14	512	1	pp512	669.46 ± 2.87
bailingmoe 16B Q4_K - Medium	10.40 GiB	16.80 B	CUDA	10	14	512	1	tg128	68.59 ± 2.00
bailingmoe 16B Q4_K - Medium	10.40 GiB	16.80 B	CUDA	10	14	512	1	pp512+tg512	109.49 ± 2.24

build: 71cc890 (6461)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Worse performance with --amx #3

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Worse performance with --amx #3

Uh oh!

Uh oh!

randomqhacker Sep 15, 2025

Replies: 2 comments · 5 replies

Uh oh!

randomqhacker Sep 15, 2025 Author

Uh oh!

Gadflyii Sep 15, 2025 Maintainer

Uh oh!

randomqhacker Sep 15, 2025 Author

Uh oh!

randomqhacker Sep 15, 2025 Author

Uh oh!

Gadflyii Sep 15, 2025 Maintainer

Uh oh!

Gadflyii Sep 15, 2025 Maintainer

Uh oh!

randomqhacker Sep 16, 2025 Author

randomqhacker
Sep 15, 2025

Replies: 2 comments 5 replies

randomqhacker
Sep 15, 2025
Author

Gadflyii Sep 15, 2025
Maintainer

randomqhacker Sep 15, 2025
Author

randomqhacker Sep 15, 2025
Author

Gadflyii Sep 15, 2025
Maintainer

Gadflyii
Sep 15, 2025
Maintainer

randomqhacker Sep 16, 2025
Author