Why must the parameter `pipeline_kwargs` be set explicitly in the defintion of a LCEL chain using `bind` if `stream` is called, but not if `invoke` is called? #26869

jmozmoz · 2024-09-25T18:33:57Z

jmozmoz
Sep 25, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

import torch
from torch import bfloat16
import transformers

from langchain_huggingface.llms import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

import warnings
warnings.filterwarnings("ignore")

quantization_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

llm = HuggingFacePipeline.from_model_id(
    model_id='openai-community/gpt2',
    task='text-generation',
    model_kwargs={
        'max_length': 1000,
        'quantization_config': quantization_config,
        'low_cpu_mem_usage': True,
    },
    pipeline_kwargs={
        'max_new_tokens': 1000,
    },
    device_map = "auto",
)

question = "What is electroencephalography?"

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)
parser = StrOutputParser()
chain = prompt | llm | parser

print(chain.invoke({"question": question})) # this works!

# Lot of text...

for chunk in llm.stream( # directly stream from the model!
        question, 
        pipeline_kwargs={
            'max_length': 1000, # needed, otherwise output stops after 10 chunks, but then it also works
        }
    ):
    print(chunk, end="", flush=True)

# lots of text...

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)
parser = StrOutputParser()
chain = prompt | llm | parser

for chunk in chain.stream({"question": question}): # {"question": question}):
    print(chunk, end="", flush=True) # Stops after 10 chunks! How to set pipeline_kwargs here?

Description

When using streaming for chains, the output stops after 10 chunks. If using streaming directly with the LLM model the same happens (output stops after 10 chunks). But if the additional argument pipeline_kwargs is set (even though, is was set also when calling HuggingFacePipeline.from_model_id), the streaming output is created as expected.

How can I set this additional argument if streaming the output of a chain? I tried different ways to add pipeline_kwargs (even using RunnablePassthrough), but either it throws an error about unexpected arguments or it seems to be ignored.

System Info

System Information
------------------
> OS:  Linux
> OS Version:  #1 SMP Fri Sep 25 19:48:47 UTC 2020
> Python Version:  3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:23:07) [GCC 12.3.0]

Package Information
-------------------
> langchain_core: 0.3.5
> langchain: 0.3.0
> langchain_community: 0.3.0
> langsmith: 0.1.128
> langchain_huggingface: 0.1.0
> langchain_text_splitters: 0.3.0

Optional packages not installed
-------------------------------
> langgraph
> langserve

Other Dependencies
------------------
> aiohttp: 3.9.5
> async-timeout: Installed. No version info available.
> dataclasses-json: 0.6.7
> httpx: 0.27.0
> huggingface-hub: 0.23.4
> jsonpatch: 1.33
> numpy: 1.26.3
> orjson: 3.10.5
> packaging: 23.2
> pydantic: 2.7.4
> pydantic-settings: 2.5.2
> PyYAML: 6.0.1
> requests: 2.32.3
> sentence-transformers: 2.7.0
> SQLAlchemy: 2.0.31
> tenacity: 8.4.2
> tokenizers: 0.19.1
> transformers: 4.44.2
> typing-extensions: 4.12.2

jmozmoz · 2024-09-27T12:59:28Z

jmozmoz
Sep 27, 2024
Author

The solution is, to set the parameter pipeline_kwargs using bind:

chain = prompt | llm.bind(pipeline_kwargs={'max_new_tokens': 500}) | parser

What remains still open is the question: Why is this necessary at all, as this parameter is already set in the definition of the llm object?

0 replies

jmozmoz · 2024-09-30T10:56:39Z

jmozmoz
Sep 30, 2024
Author

To proceed my monologue:

If .invoke is used to get the answer of the LLM, the method generate() of the BaseLLM class is called via invoke methods of the classes RunnableSequence and BaseLLM. From there (with some intermediate steps) the method __call__() of the class Pipeline in the transformers library is called. And in that method, the parameter max_next_tokens, set originally in the call to HuggingFacePipeline.from_model_id, is taken from the variable self._preprocess_params: https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/base.py#L1224. Finally, the method generate() is reached, where the LLM is used to generate the output.

If stream is used, the generate() method of the transformers library is directly started by an additional thread: https://github.com/langchain-ai/langchain/blob/master/libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py#L352.

So the intermediate steps setting the max_new_tokens parameter (and others) are skipped.

Is this by purpose? Are I missing something?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why must the parameter `pipeline_kwargs` be set explicitly in the defintion of a LCEL chain using `bind` if `stream` is called, but not if `invoke` is called? #26869

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why must the parameter pipeline_kwargs be set explicitly in the defintion of a LCEL chain using bind if stream is called, but not if invoke is called? #26869

Uh oh!

jmozmoz Sep 25, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 2 comments

Uh oh!

jmozmoz Sep 27, 2024 Author

Uh oh!

jmozmoz Sep 30, 2024 Author

Why must the parameter `pipeline_kwargs` be set explicitly in the defintion of a LCEL chain using `bind` if `stream` is called, but not if `invoke` is called? #26869

jmozmoz
Sep 25, 2024

jmozmoz
Sep 27, 2024
Author

jmozmoz
Sep 30, 2024
Author