Skip to content

Bug: llama_print_timings seems to accumulate load_time/total_time in llama-bench #9286

@akx

Description

@akx

What happened?

When running llama-bench with multiple parameter sets, load time (and as such, total time) in the llama_print_timings output seems to be accumulated over the parameter sets.

This doesn't affect the actual benchmark data output.

Name and Version

$ ./llama-cli --version
version: 3651 (8f1d81a0)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0

What operating system are you seeing the problem on?

Mac

Relevant log output

$ ./llama-bench -m ~/Documents/Llama/models/llama-3-1-8b-instruct-f16.gguf -n 4,8,16,32,64,128,256 -ngl 99 -v 2>&1 | grep llama_print_timings
llama_print_timings:        load time =    1130.77 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =    4528.24 ms /  3072 tokens (    1.47 ms per token,   678.41 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =    4587.03 ms /  3073 tokens
llama_print_timings:        load time =    4669.62 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (     nan ms per token,      nan tokens per second)
llama_print_timings:        eval time =     975.30 ms /    21 runs   (   46.44 ms per token,    21.53 tokens per second)
llama_print_timings:       total time =    5577.03 ms /    21 tokens
llama_print_timings:        load time =    5681.06 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (     nan ms per token,      nan tokens per second)
llama_print_timings:        eval time =    1880.25 ms /    41 runs   (   45.86 ms per token,    21.81 tokens per second)
llama_print_timings:       total time =    7486.03 ms /    41 tokens
llama_print_timings:        load time =    7566.78 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (     nan ms per token,      nan tokens per second)
llama_print_timings:        eval time =    3661.28 ms /    81 runs   (   45.20 ms per token,    22.12 tokens per second)
llama_print_timings:       total time =   11161.03 ms /    81 tokens
llama_print_timings:        load time =   11243.36 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (     nan ms per token,      nan tokens per second)
llama_print_timings:        eval time =    7263.77 ms /   161 runs   (   45.12 ms per token,    22.16 tokens per second)
llama_print_timings:       total time =   18438.03 ms /   161 tokens
llama_print_timings:        load time =   18520.05 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (     nan ms per token,      nan tokens per second)
llama_print_timings:        eval time =   14510.21 ms /   321 runs   (   45.20 ms per token,    22.12 tokens per second)
llama_print_timings:       total time =   32963.03 ms /   321 tokens
llama_print_timings:        load time =   33049.59 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (     nan ms per token,      nan tokens per second)
llama_print_timings:        eval time =   29012.77 ms /   641 runs   (   45.26 ms per token,    22.09 tokens per second)
llama_print_timings:       total time =   61991.03 ms /   641 tokens

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinglow severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions