Skip to content

Conversation

@jukofyork
Copy link
Collaborator

@jukofyork jukofyork commented Jul 14, 2025

This just adds the equivalent option for LoRAs as already existed for Control Vectors, eg:

Using --lora-layer-range 0 59 on a LoRA with 64 pairs of A/B tensors:

llama_adapter_lora_init_impl: loading lora adapter from 'qwq-32b-writer-lora-F32.gguf' ...
llama_adapter_lora_init_impl: Metal_Mapped LoRA buffer size =   120.00 MiB
llama_adapter_lora_init_impl: loaded 120 tensors from lora file

It does change the function signature of llama_adapter_lora_init:

    // Load a LoRA adapter from file
    // il_start and il_end are the layer range the lora should apply to (both inclusive)
    LLAMA_API struct llama_adapter_lora * llama_adapter_lora_init(
            struct llama_model * model,
                    const char * path_lora,
                       int32_t   il_start,
                       int32_t   il_end);

but it is only called from common.cpp here:

    // load and optionally apply lora adapters
    if (!params.lora_adapters.empty()) {
        if (params.lora_layer_start < 0) params.lora_layer_start = 0;
        if (params.lora_layer_end   < 0) params.lora_layer_end   = llama_model_n_layer(model);

        for (auto & la : params.lora_adapters) {
            llama_adapter_lora_ptr lora;
            lora.reset(llama_adapter_lora_init(model, la.path.c_str(), params.lora_layer_start, params.lora_layer_end));
            if (lora == nullptr) {
                LOG_ERR("%s: failed to apply lora adapter '%s'\n", __func__, la.path.c_str());
                llama_free(lctx);
                llama_model_free(model);
                return iparams;
            }

            la.ptr = lora.get();
            iparams.lora.emplace_back(std::move(lora)); // copy to list of loaded adapters
        }
    }

(just something to be aware of if any other external tool uses this as an API call, etc)

@ngxson
Copy link
Collaborator

ngxson commented Jul 14, 2025

I think both the lora and control vector part in llama.cpp have little usage, so we should not make it too complicated.

Even without this --lora-layer-range, user can easy slice off some layers from the lora gguf to remove those layers. So it may not worth adding a lot of code into the project just to do that. And if we add it, most users will not use it anyway, as it is not as intuitive as controlling the scale of each adapter.

Ability to specify layer range is needed in control vector because in most cases, the model breaks when apply it to all layers. Obviously this is also why no one actually use it in production. By contrast, all lora adapters work fine when apply to all layers by default.

@jukofyork
Copy link
Collaborator Author

I think both the lora and control vector part in llama.cpp have little usage, so we should not make it too complicated.

Even without this --lora-layer-range, user can easy slice off some layers from the lora gguf to remove those layers. So it may not worth adding a lot of code into the project just to do that. And if we add it, most users will not use it anyway, as it is not as intuitive as controlling the scale of each adapter.

Ability to specify layer range is needed in control vector because in most cases, the model breaks when apply it to all layers. Obviously this is also why no one actually use it in production. By contrast, all lora adapters work fine when apply to all layers by default.

No problem and I agree it's not hard to just trim the layers from the gguf directly if needed.

I'll close this now.

@jukofyork jukofyork closed this Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants