RAG with Llama-cpp-python - parameters validation #6410

calypset · 2024-03-31T11:45:25Z

calypset
Mar 31, 2024

Hello,

I am building a RAG with Llama-cpp-python and langchain LlamaCpp for a few hundred PDFs of scientific information and a few GPUs.

I have tried optimizing the parameters of the LLM to my best knowledge based on information online.

I was wondering if those parameters would seem appropriate for the intended purpose of interrogating a large set of data?

Loading the model as such with parameters:

llm = LlamaCpp(
        model_path=model_source,
        **params,
        chat_format='zephyr', 
        stop=['</s>'], 
        callbacks=callbacks
    )

Parameters loaded:

N_GPU_LAYERS  = -1
ROPE_FREQ_SCALE  = 2.5
ROPE_FREQ_BASE = 26000
N_DISCARD = 1
N_KEEP = 4
N_PARTS = 1
N_CTX = 16384
N_BATCH = 2048
N_PREDICT = -1
CONTEXT_WINDOW = 16384
MAX_TOKENS = 16384
MAX_NEW_TOKENS = 4096
LAST_N_TOKENS_SIZE  = 4096
TEMPERATURE = 1
REPEAT_PENALTY = 1
F16_KV = True
TOP_P = 1
TYPICAL_P = 0.9
MIN_P = 0.1
TFS = 0.999
VERBOSE = True
ECHO = False

onestardao · 2025-07-30T12:41:13Z

onestardao
Jul 30, 2025

Nice breakdown — this kind of tuning makes a difference if the underlying logic path is already stable. But from experience with RAG systems, a lot of the issues show up after the LLM receives the chunk, not before.

A few things I’ve run into when debugging similar setups:

No.2 Interpretation Collapse — the chunk is retrieved and injected, but the model doesn’t properly integrate it into the logic flow
No.6 Logic Collapse — certain parameter combos (e.g., temperature × repetition_penalty) can actually make multi-step reasoning less stable, not more
No.4 Bluffing / Overconfidence — you may still get fluent, confident answers even when the model fails to use the retrieved data

So parameter tuning is useful — but sometimes the real culprit is upstream: retrieval format, memory boundary, or how the prompt bridges semantic layers.

Happy to compare notes if you're exploring chunk attention or logic fallback strategies.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RAG with Llama-cpp-python - parameters validation #6410

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RAG with Llama-cpp-python - parameters validation #6410

Uh oh!

Uh oh!

calypset Mar 31, 2024

Replies: 1 comment

Uh oh!

onestardao Jul 30, 2025

calypset
Mar 31, 2024

onestardao
Jul 30, 2025