Chat Model API/RetrievalQA chain - Possible to use Self-Hosted HuggingFace LLMs? - Hints? Examples? #9385

OldishCoder · 2023-08-17T11:59:09Z

OldishCoder
Aug 17, 2023

From https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.chat_models
"Chat Models are a variation on language models.
While Chat Models use language models under the hood, the interface they expose is a bit different. Rather than expose a “text in, text out” API, they expose an interface where “chat messages” are the inputs and outputs."
[Are there any links to details on this brief statement? Keywords I could search on?]

I was following the training at:
https://learn.deeplearning.ai/langchain-chat-with-your-data/lesson/6/question-answering
And was very interested to try RetrievalQA chains, but the only examples are with OpenAI?
I'm guessing this is in part because OpenAI is so ubiquitous?
I'd like to use HuggingFace self-hosted models instead...

I looked at https://python.langchain.com/docs/integrations/chat/
But none of them were for HuggingFace LLMs?
Is this because it is not possible?

My use case:
I'm in a corporate environment working with proprietary data.
We have an AzureOpenAI instance we are trying to figure out, but that is being driven by "high value" use cases...not mine...
I'm wanting to explore an enhanced chat agent to explain documentation in a specialized domain using lookup almost exactly aligned with the training above. With self hosted, I hope to explore without exposing our data nor having to get in line for the AzureOAI resources...

Am I just out of luck?

Answered by OldishCoder

Aug 17, 2023

@dosu-bot, this sounds promising, but it skips over my first question above:

"Chat Models are a variation on language models..."
[Are there any links to details on this brief statement? Keywords I could search on?]

I found this more complete description:
https://blog.langchain.dev/chat-models/

It would be nice if it was cross-referenced in:
https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.chat_models

For truly self-hosted/locally hosted HuggingFace LLaMA2 models I ended up with this code. It's not perfect, but it does allow testing:

!huggingface-cli login --token <my huggingface token>
#optional, but stops warning clutter
import warnings
warnings.filterwa…

View full answer

@dosu-bot · 2023-08-17T12:13:01Z

dosubot[bot]
bot Aug 17, 2023

🤖

Hello,

Yes, it is indeed possible to use self-hosted HuggingFace language models with the LangChain framework for developing a chat agent, including for RetrievalQA chains. The LangChain framework provides a class named SelfHostedHuggingFaceLLM which is a HuggingFace Pipeline API to run on self-hosted remote hardware. This class supports text-generation, text2text-generation, and summarization tasks.

Here is an example of how to use it:

from langchain.llms import SelfHostedHuggingFaceLLM
import runhouse as rh

gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
hf = SelfHostedHuggingFaceLLM(
    model_id="google/flan-t5-large", task="text2text-generation",
    hardware=gpu
)

In this example, the SelfHostedHuggingFaceLLM class is used to load the google/flan-t5-large model for the text2text-generation task on a GPU hardware. The model_id parameter is used to specify the HuggingFace model to load, the task parameter is used to specify the task, and the hardware parameter is used to specify the hardware to run the model on. You can find more details in the source code.

The SelfHostedHuggingFaceLLM class in the LangChain framework can be used to develop a chat agent by creating an instance of the class and using it as the language model for a ChatAgent.

Here is a simple example of how you can use it:

from langchain.llms import SelfHostedHuggingFaceLLM
from langchain.agents.chat.base import ChatAgent
import runhouse as rh

# Specify the hardware to use for the model
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")

# Create an instance of the SelfHostedHuggingFaceLLM class
hf = SelfHostedHuggingFaceLLM(
    model_id="gpt2", 
    task="text-generation",
    hardware=gpu
)

# Create a ChatAgent using the SelfHostedHuggingFaceLLM instance
chat_agent = ChatAgent.from_llm_and_tools(llm=hf, tools=[])

# Use the chat agent to generate a response to a user input
response = chat_agent.llm_chain.llm._call(prompt="Hello, how are you?")
print(response)

In this example, we're using the GPT-2 model for text generation. The model_id parameter specifies the model to use, the task parameter specifies the task the model should perform (in this case, text generation), and the hardware parameter specifies the hardware to use for the model. The ChatAgent is then created using the from_llm_and_tools method, which takes the language model and a list of tools as parameters. The _call method is then used to generate a response to a user input.

Please note that the _call method is not intended to be used directly in most cases, as it is a lower-level method. In a real-world application, you would likely create a higher-level method or class to handle user interactions, which would then use the _call method to generate responses.

I hope this helps! If you have any other questions, feel free to ask.

Best,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

OldishCoder · 2023-08-17T17:44:24Z

OldishCoder
Aug 17, 2023
Author

@dosu-bot, this sounds promising, but it skips over my first question above:

"Chat Models are a variation on language models..."
[Are there any links to details on this brief statement? Keywords I could search on?]

I found this more complete description:
https://blog.langchain.dev/chat-models/

It would be nice if it was cross-referenced in:
https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.chat_models

For truly self-hosted/locally hosted HuggingFace LLaMA2 models I ended up with this code. It's not perfect, but it does allow testing:

!huggingface-cli login --token <my huggingface token>
#optional, but stops warning clutter
import warnings
warnings.filterwarnings('ignore')
from transformers import pipeline
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf",
                                          use_auth_token=True,)

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf",
                                             device_map='auto',
                                             torch_dtype=torch.float16,
                                             use_auth_token=True,
                                             )
pipe = pipeline("text-generation",
                model=model,
                tokenizer= tokenizer,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                max_new_tokens = 720,
                do_sample=True,
                top_k=30,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id
                )
# ignore triton errors on Win10, triton is linux only per git repo
from langchain import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline = pipe, model_kwargs = {'temperature':0.25, 'max_tokens':4000, 'stop_sequence': "\n\n"})

Then llm can be used in langchain chains...

--OC

0 replies

alexsam986 · 2025-01-16T06:59:37Z

alexsam986
Jan 16, 2025

Yes, you can use self-hosted HuggingFace language models with the LangChain framework, including Retrieval chains. Since you haven't mentioned anything about the use case, the privacy and user limits per day are restricted in HuggingFace.

If you're looking for any alternatives, I hope this guide will be helpful for you. https://www.contus.com/blog/best-self-hosted-chat-platforms/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chat Model API/RetrievalQA chain - Possible to use Self-Hosted HuggingFace LLMs? - Hints? Examples? #9385

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Chat Model API/RetrievalQA chain - Possible to use Self-Hosted HuggingFace LLMs? - Hints? Examples? #9385

Uh oh!

OldishCoder Aug 17, 2023

Replies: 4 comments · 2 replies

Uh oh!

dosubot[bot] bot Aug 17, 2023

Sources

Uh oh!

Uh oh!

OldishCoder Aug 17, 2023 Author

Uh oh!

alexsam986 Jan 16, 2025

OldishCoder
Aug 17, 2023

Replies: 4 comments 2 replies

dosubot[bot]
bot Aug 17, 2023

OldishCoder
Aug 17, 2023
Author

alexsam986
Jan 16, 2025