parsing gguf head with gguf-py to avoid full loading #4

excosy · 2025-08-15T18:26:46Z

replace optional dependency llama-cpp-python with required gguf, which can dump metadata without full loading
remove full loading methods: analyze_gguf_with_llamacpp_tools, parse_gguf_header_simple
add some metadata: context_length, expert_count, expert_used_count
sync maximum of ctx_slider from gguf metadata

parsing gguf head with gguf-py to avoid full loading

a4ba2d3

Provide feedback