Skip to content

Conversation

@wirthual
Copy link

Work for #3430

Took method 1:1 from OpenAI cookbook. However the example only shows for certain models. How should other models be handled? Quick test with gpt-5 showed a token number that differed from this method.

gpt-5
Warning: gpt-5 may update over time. Returning num tokens assuming gpt-5-2025-08-07.
110 prompt tokens counted by num_tokens_from_messages().
109 prompt tokens counted by the OpenAI API.
Test script
from openai import OpenAI
import os
import tiktoken

def num_tokens_from_messages(messages, model="gpt-4o-mini-2024-07-18"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using o200k_base encoding.")
        encoding = tiktoken.get_encoding("o200k_base")
    if model in {
        "gpt-3.5-turbo-0125",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        "gpt-4o-mini-2024-07-18",
        "gpt-4o-2024-08-06",
        "gpt-4.1-2025-04-14",
        "gpt-5-2025-08-07",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0125.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0125")
    elif "gpt-4o-mini" in model:
        print("Warning: gpt-4o-mini may update over time. Returning num tokens assuming gpt-4o-mini-2024-07-18.")
        return num_tokens_from_messages(messages, model="gpt-4o-mini-2024-07-18")
    elif "gpt-4o" in model:
        print("Warning: gpt-4o and gpt-4o-mini may update over time. Returning num tokens assuming gpt-4o-2024-08-06.")
        return num_tokens_from_messages(messages, model="gpt-4o-2024-08-06")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    elif "gpt-5" in model:
        print("Warning: gpt-5 may update over time. Returning num tokens assuming gpt-5-2025-08-07.")
        return num_tokens_from_messages(messages, model="gpt-5-2025-08-07")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

example_messages = [
    {
        "role": "system",
        "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
    },
    {
        "role": "system",
        "content": "New synergies will help drive top-line growth.",
    },
    {
        "role": "system",
        "content": "Things working well together will increase revenue.",
    },
    {
        "role": "system",
        "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
    },
    {
        "role": "system",
        "content": "Let's talk later when we're less busy about how to do better.",
    },
    {
        "role": "user",
        "content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
    },
]

for model in [
    "gpt-3.5-turbo",
    "gpt-4",
    "gpt-4o",
    "gpt-4o-mini",
    "gpt-5"
    ]:
    print(model)
    # example token count from the function defined above
    print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
    # example token count from the OpenAI API
    response = client.chat.completions.create(model=model,
    messages=example_messages)
    
    print(f'{response.usage.prompt_tokens} prompt tokens counted by the OpenAI API.')
    print()

return event_loop


def num_tokens_from_messages(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OpenAI specific so it should live in models/openai.py


def num_tokens_from_messages(
messages: list[ChatCompletionMessageParam] | list[ResponseInputItemParam],
model: OpenAIModelName = 'gpt-4o-mini-2024-07-18',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a default value

else:
raise NotImplementedError(
f"""num_tokens_from_messages() is not implemented for model {model}."""
) # TODO: How to handle other models?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to reverse engineer the right formula for gpt-5?

As long as we document that this is a best effort calculation and may not be accurate down to the exact token, we can have one branch of logic for "everything before gpt-5" and one for every newer. If future models have different rules, we can update the logic then.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with a decreased primer the calculation for gpt5 is more accurate.

Should the method from the cookbook be the default for all other models?

try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
print('Warning: model not found. Using o200k_base encoding.') # TODO: How to handle warnings?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No warnings please, let's just make a best effort

if self.system != 'openai':
raise NotImplementedError('Token counting is only supported for OpenAI system.')

openai_messages = await self._map_messages(messages, model_request_parameters)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call self.prepare_request before this call, like we do in the other model classes' count_tokens methods

)


def num_tokens_from_messages(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a private function

elif 'gpt-5' in model:
return num_tokens_from_messages(messages, model='gpt-5-2025-08-07')
else:
raise NotImplementedError(f"""num_tokens_from_messages() is not implemented for model {model}.""")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify all of this as if 'gpt-5' in model: <do the new thing> else: <do the old thing>

'gpt-4o-2024-08-06',
}:
tokens_per_message = 3
final_primer = 3 # every reply is primed with <|start|>assistant<|message|>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include a link to the doc we took this from

'gpt-5-2025-08-07',
}:
tokens_per_message = 3
final_primer = 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it explicit that this one was "reverse engineered"

assert_never(item)
return responses.EasyInputMessageParam(role='user', content=content)

async def count_tokens(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, let's update the docstring for UsageLimits.count_tokens_before_request to make it explicit which models support it (i.e. which implement the count_tokens method)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants