Langgraph intergration with "serverless inference" on huggingface #24943

kw1018 · 2024-08-01T17:40:06Z

kw1018
Aug 1, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

Failed to resolve model_id:Could not find model id for inference server: meta-llama/Meta-Llama-3.1-8B-InstructMake sure that your Hugging Face token has access to the endpoint.

Description

I'm trying to use Lang graph to build out a demo product. In order to do this as low cost as possible I am searching for a way to wrap the "serverless inference" option from HuggingFace around an agent (Not inference endpoints just yet!)

I have tried the suggested code in the Langchain docs but I think the api call is still looking for an inference endpoint server as it is returning a error with the model id (meta-llama/Meta-Llama-3.1-8B) as the endpoint_url.

Thank you in advance to anyone that has any ideas or solutions!

Hope you are having a great day!

System Info

n/a

Answered by dosubot[bot]

Aug 1, 2024

Hello @kw1018! I'm here to help you with any bugs, questions, or contributions. Let's work together to solve this!

To integrate the "serverless inference" option from HuggingFace with an agent in LangGraph to build a demo product while keeping costs low, you can use the free serverless API provided by HuggingFace. Here is a step-by-step guide:

Install the necessary packages:

%pip install --upgrade --quiet huggingface_hub

Get your HuggingFace API token:

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

Set the API token as an environment variable:

import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

Prepare the HuggingFaceEndpoint and LLMChain:

View full answer

@kw1018 · 2024-08-01T17:52:46Z

dosubot[bot]
bot Aug 1, 2024

Hello @kw1018! I'm here to help you with any bugs, questions, or contributions. Let's work together to solve this!

To integrate the "serverless inference" option from HuggingFace with an agent in LangGraph to build a demo product while keeping costs low, you can use the free serverless API provided by HuggingFace. Here is a step-by-step guide:

Install the necessary packages:

%pip install --upgrade --quiet huggingface_hub

Get your HuggingFace API token:

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

Set the API token as an environment variable:

import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

Prepare the HuggingFaceEndpoint and LLMChain:

from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

question = "Who won the FIFA World Cup in the year 1994?"

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_length=128,
    temperature=0.5,
    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
)
llm_chain = LLMChain(prompt=prompt, llm=llm)
print(llm_chain.invoke({"question": question}))

Additionally, you can explore using DeepInfra, another serverless inference service that provides access to a variety of LLMs. DeepInfra offers 1 hour of free serverless GPU compute to test different models, which can help keep costs low for your demo product.

Get your DeepInfra API token:

from getpass import getpass

DEEPINFRA_API_TOKEN = getpass()

Set the API token as an environment variable:

import os

os.environ["DEEPINFRA_API_TOKEN"] = DEEPINFRA_API_TOKEN

Create the DeepInfra instance:

from langchain_community.chat_models import ChatDeepInfra
from langchain_core.messages import HumanMessage

chat = ChatDeepInfra(model="meta-llama/Llama-2-7b-chat-hf")

messages = [
    HumanMessage(
        content="Translate this sentence from English to French. I love programming."
    )
]
chat.invoke(messages)

Use async and streaming functionality if needed:

from langchain_core.callbacks import StreamingStdOutCallbackHandler

await chat.agenerate([messages])

chat = ChatDeepInfra(
    streaming=True,
    verbose=True,
    callbacks=[StreamingStdOutCallbackHandler()],
)
chat.invoke(messages)

Tool Calling with DeepInfra:

import asyncio
from dotenv import find_dotenv, load_dotenv
from langchain_community.chat_models import ChatDeepInfra
from langchain_core.messages import HumanMessage
from langchain_core.pydantic_v1 import BaseModel
from langchain_core.tools import tool

model_name = "meta-llama/Meta-Llama-3-70B-Instruct"

_ = load_dotenv(find_dotenv())

@tool
def foo(something):
    """
    Called when foo
    """
    pass

class Bar(BaseModel):
    """
    Called when Bar
    """
    pass

llm = ChatDeepInfra(model=model_name)
tools = [foo, Bar]
llm_with_tools = llm.bind_tools(tools)
messages = [
    HumanMessage("Foo and bar, please."),
]

response = llm_with_tools.invoke(messages)
print(response.tool_calls)

async def call_ainvoke():
    result = await llm_with_tools.ainvoke(messages)
    print(result.tool_calls)

asyncio.run(call_ainvoke())

This setup uses the free serverless API from HuggingFace and the free compute time from DeepInfra, which are suitable for demo products and help keep costs low. However, be aware that these free options may be rate-limited for heavy use cases. For more demanding applications, consider using dedicated endpoints ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Langgraph intergration with "serverless inference" on huggingface #24943

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langgraph intergration with "serverless inference" on huggingface #24943

Uh oh!

Uh oh!

kw1018 Aug 1, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment

Uh oh!

dosubot[bot] bot Aug 1, 2024

kw1018
Aug 1, 2024

dosubot[bot]
bot Aug 1, 2024