Are the tokens in ChatBufferMemory also charged? #10247

xiaohanzai · 2023-09-05T18:36:03Z

xiaohanzai
Sep 5, 2023

I’m trying to extract info from a bunch of input texts with langchain and gpt3.5. Instead of using the same prompt template for the instructions and substituting the input text each time, I’m thinking of storing the first input and output into a ChatTokenBufferMemory, so that in later calls of openai api I can just say “can you do the same to this text”. But do the tokens in ChatBufferMemory also cost money? My original intention is to save money because “can you do the same to this text” has way less tokens than my whole prompt template with the instructions, but I’m wondering that the chat history is also fed into api call each time?

@dosu-bot · 2023-09-05T18:47:05Z

dosubot[bot]
bot Sep 5, 2023

🤖

Yes, the tokens in the ChatBufferMemory are also charged. When you make a call to the OpenAI API, the total number of tokens in the API call is what you're billed for. This includes both the input and output tokens.

In your case, when you store the first input and output into a ChatTokenBufferMemory and then make subsequent calls to the OpenAI API, the tokens from the chat history (ChatBufferMemory) are included in the total token count for each API call. This is because the chat history is also fed into the API call each time.

The token usage is managed by the LangChain framework itself. The actual number of tokens used in a request is calculated in the _generate and _agenerate methods, where the token usage is updated for each response from the OpenAI API. The cost of using the OpenAI API is determined by the number of tokens used in the API call.

Here is the relevant code from the LangChain repository that handles token usage:

def update_token_usage(
    keys: Set[str], response: Dict[str, Any], token_usage: Dict[str, Any]
) -> None:
    """Update token usage."""
    _keys_to_use = keys.intersection(response["usage"])
    for _key in _keys_to_use:
        if _key not in token_usage:
            token_usage[_key] = response["usage"][_key]
        else:
            token_usage[_key] += response["usage"][_key]

So, while your intention to save money by reducing the number of tokens in your prompt is a good one, keep in mind that the tokens from the chat history are also included in the total token count for each API call.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Are the tokens in ChatBufferMemory also charged? #10247

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Are the tokens in ChatBufferMemory also charged? #10247

Uh oh!

xiaohanzai Sep 5, 2023

Replies: 1 comment

Uh oh!

dosubot[bot] bot Sep 5, 2023

Sources

xiaohanzai
Sep 5, 2023

dosubot[bot]
bot Sep 5, 2023