Small issue with server info and UI #15069

chattingwalec · 2025-08-04T11:13:30Z

chattingwalec
Aug 4, 2025

I created a service for so we could run smolvlm2 on a llama server. It works very well except for the fact that whenever it hasn't been used in a while, when you load the page, there's no server info. Usually it loads after some minutes.

This wouldn't be an issue except that multimodal is restricted until the server info has loaded. Anyone know if there some way to load the info faster? I didn't think it wise to make an issue until I'd established whether this was a feature or a bug.

Image to make clear what I'm talking about:

Model data if it's helpful.

`system_info: n_threads = 8 (n_threads_batch = 8) / 8 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |

main: binding port with default address family
main: HTTP server is listening, hostname: 0.0.0.0, port: 8080, http threads: 7
main: loading model
srv load_model: loading model '/app/models/SmolVLM2-2.2B-Instruct-Q8_0.gguf'
llama_model_loader: loaded meta data with 74 key-value pairs and 219 tensors from /app/models/SmolVLM2-2.2B-Instruct-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = SmolVLM2 2.2B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = SmolVLM2
llama_model_loader: - kv 5: general.size_label str = 2.2B
llama_model_loader: - kv 6: general.license str = apache-2.0
llama_model_loader: - kv 7: general.base_model.count u32 = 1
llama_model_loader: - kv 8: general.base_model.0.name str = SmolVLM Instruct
llama_model_loader: - kv 9: general.base_model.0.organization str = HuggingFaceTB
llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/HuggingFaceTB/...
llama_model_loader: - kv 11: general.dataset.count u32 = 12
llama_model_loader: - kv 12: general.dataset.0.name str = The_Cauldron
llama_model_loader: - kv 13: general.dataset.0.organization str = HuggingFaceM4
llama_model_loader: - kv 14: general.dataset.0.repo_url str = https://huggingface.co/HuggingFaceM4/...
llama_model_loader: - kv 15: general.dataset.1.name str = Docmatix
llama_model_loader: - kv 16: general.dataset.1.organization str = HuggingFaceM4
llama_model_loader: - kv 17: general.dataset.1.repo_url str = https://huggingface.co/HuggingFaceM4/...
llama_model_loader: - kv 18: general.dataset.2.name str = LLaVA OneVision Data
llama_model_loader: - kv 19: general.dataset.2.organization str = Lmms Lab
llama_model_loader: - kv 20: general.dataset.2.repo_url str = https://huggingface.co/lmms-lab/LLaVA...
llama_model_loader: - kv 21: general.dataset.3.name str = M4 Instruct Data
llama_model_loader: - kv 22: general.dataset.3.organization str = Lmms Lab
llama_model_loader: - kv 23: general.dataset.3.repo_url str = https://huggingface.co/lmms-lab/M4-In...
llama_model_loader: - kv 24: general.dataset.4.name str = Finevideo
llama_model_loader: - kv 25: general.dataset.4.organization str = HuggingFaceFV
llama_model_loader: - kv 26: general.dataset.4.repo_url str = https://huggingface.co/HuggingFaceFV/...
llama_model_loader: - kv 27: general.dataset.5.name str = MAmmoTH VL Instruct 12M
llama_model_loader: - kv 28: general.dataset.5.organization str = MAmmoTH VL
llama_model_loader: - kv 29: general.dataset.5.repo_url str = https://huggingface.co/MAmmoTH-VL/MAm...
llama_model_loader: - kv 30: general.dataset.6.name str = LLaVA Video 178K
llama_model_loader: - kv 31: general.dataset.6.organization str = Lmms Lab
llama_model_loader: - kv 32: general.dataset.6.repo_url str = https://huggingface.co/lmms-lab/LLaVA...
llama_model_loader: - kv 33: general.dataset.7.name str = Video STaR
llama_model_loader: - kv 34: general.dataset.7.organization str = Orrzohar
llama_model_loader: - kv 35: general.dataset.7.repo_url str = https://huggingface.co/orrzohar/Video...
llama_model_loader: - kv 36: general.dataset.8.name str = Vript
llama_model_loader: - kv 37: general.dataset.8.organization str = Mutonix
llama_model_loader: - kv 38: general.dataset.8.repo_url str = https://huggingface.co/Mutonix/Vript
llama_model_loader: - kv 39: general.dataset.9.name str = VISTA 400K
llama_model_loader: - kv 40: general.dataset.9.organization str = TIGER Lab
llama_model_loader: - kv 41: general.dataset.9.repo_url str = https://huggingface.co/TIGER-Lab/VIST...
llama_model_loader: - kv 42: general.dataset.10.name str = MovieChat 1K_train
llama_model_loader: - kv 43: general.dataset.10.organization str = Enxin
llama_model_loader: - kv 44: general.dataset.10.repo_url str = https://huggingface.co/Enxin/MovieCha...
llama_model_loader: - kv 45: general.dataset.11.name str = ShareGPT4Video
llama_model_loader: - kv 46: general.dataset.11.organization str = ShareGPT4Video
llama_model_loader: - kv 47: general.dataset.11.repo_url str = https://huggingface.co/ShareGPT4Video...
llama_model_loader: - kv 48: general.tags arr[str,2] = ["video-text-to-text", "image-text-to...
llama_model_loader: - kv 49: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 50: llama.block_count u32 = 24
llama_model_loader: - kv 51: llama.context_length u32 = 8192
llama_model_loader: - kv 52: llama.embedding_length u32 = 2048
llama_model_loader: - kv 53: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 54: llama.attention.head_count u32 = 32
llama_model_loader: - kv 55: llama.rope.freq_base f32 = 130000.000000
llama_model_loader: - kv 56: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 57: llama.attention.key_length u32 = 64
llama_model_loader: - kv 58: llama.attention.value_length u32 = 64
llama_model_loader: - kv 59: llama.vocab_size u32 = 49280
llama_model_loader: - kv 60: llama.rope.dimension_count u32 = 64
llama_model_loader: - kv 61: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 62: tokenizer.ggml.pre str = smollm
llama_model_loader: - kv 63: tokenizer.ggml.tokens arr[str,49280] = ["<|endoftext|>", "<|im_start|>", "<|...
llama_model_loader: - kv 64: tokenizer.ggml.token_type arr[i32,49280] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 65: tokenizer.ggml.merges arr[str,48900] = ["Ġ t", "Ġ a", "i n", "h e", "Ġ Ġ...
llama_model_loader: - kv 66: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 67: tokenizer.ggml.eos_token_id u32 = 49279
llama_model_loader: - kv 68: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 69: tokenizer.ggml.padding_token_id u32 = 2
llama_model_loader: - kv 70: tokenizer.chat_template str = <|im_start|>{% for message in message...
llama_model_loader: - kv 71: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 72: general.quantization_version u32 = 2
llama_model_loader: - kv 73: general.file_type u32 = 7
llama_model_loader: - type f32: 49 tensors
llama_model_loader: - type q8_0: 170 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 1.79 GiB (8.50 BPW)
load: special tokens cache size = 145
load: token to piece cache size = 0.3199 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 8192
print_info: n_embd = 2048
print_info: n_layer = 24
print_info: n_head = 32
print_info: n_head_kv = 32
print_info: n_rot = 64
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 64
print_info: n_embd_head_v = 64
print_info: n_gqa = 1
print_info: n_embd_k_gqa = 2048
print_info: n_embd_v_gqa = 2048
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 8192
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 130000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 8192
print_info: rope_finetuned = unknown
print_info: model type = ?B
print_info: model params = 1.81 B
print_info: general.name = SmolVLM2 2.2B Instruct
print_info: vocab type = BPE
print_info: n_vocab = 49280
print_info: n_merges = 48900
print_info: BOS token = 1 '<|im_start|>'
print_info: EOS token = 49279 '<end_of_utterance>'
print_info: EOT token = 49279 '<end_of_utterance>'
print_info: UNK token = 0 '<|endoftext|>'
print_info: PAD token = 2 '<|im_end|>'
print_info: LF token = 198 'Ċ'
print_info: FIM REP token = 4 ''
print_info: EOG token = 0 '<|endoftext|>'
print_info: EOG token = 2 '<|im_end|>'
print_info: EOG token = 4 ''
print_info: EOG token = 49279 '<end_of_utterance>'
print_info: max token length = 162
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: CPU_Mapped model buffer size = 1836.91 MiB
...........................................................................................
llama_context: constructing llama_context
llama_context: non-unified KV cache requires ggml_set_rows() - forcing unified KV cache
llama_context: n_seq_max = 1
llama_context: n_ctx = 8192
llama_context: n_ctx_per_seq = 8192
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: kv_unified = true
llama_context: freq_base = 130000.0
llama_context: freq_scale = 1
llama_context: CPU output buffer size = 0.19 MiB
llama_kv_cache_unified: CPU KV buffer size = 1536.00 MiB
llama_kv_cache_unified: size = 1536.00 MiB ( 8192 cells, 24 layers, 1/ 1 seqs), K (f16): 768.00 MiB, V (f16): 768.00 MiB
llama_kv_cache_unified: LLAMA_SET_ROWS=0, using old ggml_cpy() method for backwards compatibility
llama_context: CPU compute buffer size = 544.01 MiB
llama_context: graph nodes = 894
llama_context: graph splits = 1
common_init_from_params: added <|endoftext|> logit bias = -inf
common_init_from_params: added <|im_end|> logit bias = -inf
common_init_from_params: added logit bias = -inf
common_init_from_params: added <end_of_utterance> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
Failed to generate tool call example: Value is not callable: null at row 1, column 72:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
at row 1, column 42:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
at row 1, column 42:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
at row 1, column 13:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
at row 1, column 1:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}

clip_model_loader: model name: SmolVLM2 2.2B Instruct
clip_model_loader: description:
clip_model_loader: GGUF version: 3
clip_model_loader: alignment: 32
clip_model_loader: n_tensors: 438
clip_model_loader: n_kv: 66

clip_model_loader: has vision encoder
clip_ctx: CLIP using CPU backend
load_hparams: projector: idefics3
load_hparams: n_embd: 1152
load_hparams: n_head: 16
load_hparams: n_ff: 4304
load_hparams: n_layer: 27
load_hparams: ffn_op: gelu
load_hparams: projection_dim: 2048

--- vision hparams ---
load_hparams: image_size: 384
load_hparams: patch_size: 14
load_hparams: has_llava_proj: 0
load_hparams: minicpmv_version: 0
load_hparams: proj_scale_factor: 3
load_hparams: n_wa_pattern: 0

load_hparams: model size: 565.05 MiB
load_hparams: metadata size: 0.15 MiB
alloc_compute_meta: CPU compute buffer size = 45.25 MiB
srv load_model: loaded multimodal model, '/app/models/mmproj-SmolVLM2-2.2B-Instruct-Q8_0.gguf'
srv load_model: ctx_shift is not supported by multimodal, it will be disabled
srv init: initializing slots, n_slots = 1
slot init: id 0 | task -1 | new slot n_ctx_slot = 8192
main: model loaded
main: chat template, chat_template: <|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}, example_format: '<|im_start|>You are a helpful assistant

User: Hello<end_of_utterance>
Assistant: Hi there<end_of_utterance>
User: How are you?<end_of_utterance>
Assistant:'
main: server is listening on http://0.0.0.0:8080 - starting the main loop
srv update_slots: all slots are idle
srv log_server_r: request: GET / 10.2.0.5 200
srv log_server_r: request: GET /props 10.2.0.5 200
srv params_from_: Chat format: Content-only
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 8192, n_keep = 0, n_prompt_tokens = 19
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 19, n_tokens = 19, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 19, n_tokens = 19
slot release: id 0 | task 0 | stop processing: n_past = 416, truncated = 0
slot print_timing: id 0 | task 0 |
prompt eval time = 273.45 ms / 19 tokens ( 14.39 ms per token, 69.48 tokens per second)
eval time = 19742.14 ms / 398 tokens ( 49.60 ms per token, 20.16 tokens per second)
total time = 20015.58 ms / 417 tokens`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Small issue with server info and UI #15069

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Small issue with server info and UI #15069

Uh oh!

Uh oh!

chattingwalec Aug 4, 2025

Replies: 0 comments

chattingwalec
Aug 4, 2025