WIP: Add count_tokens for openAI models #3447

wirthual · 2025-11-16T23:48:29Z

Work for #3430

Took method 1:1 from OpenAI cookbook. However the example only shows for certain models. How should other models be handled? Quick test with gpt-5 showed a token number that differed from this method.

gpt-5
Warning: gpt-5 may update over time. Returning num tokens assuming gpt-5-2025-08-07.
110 prompt tokens counted by num_tokens_from_messages().
109 prompt tokens counted by the OpenAI API.

Test script

from openai import OpenAI
import os
import tiktoken

def num_tokens_from_messages(messages, model="gpt-4o-mini-2024-07-18"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using o200k_base encoding.")
        encoding = tiktoken.get_encoding("o200k_base")
    if model in {
        "gpt-3.5-turbo-0125",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        "gpt-4o-mini-2024-07-18",
        "gpt-4o-2024-08-06",
        "gpt-4.1-2025-04-14",
        "gpt-5-2025-08-07",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0125.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0125")
    elif "gpt-4o-mini" in model:
        print("Warning: gpt-4o-mini may update over time. Returning num tokens assuming gpt-4o-mini-2024-07-18.")
        return num_tokens_from_messages(messages, model="gpt-4o-mini-2024-07-18")
    elif "gpt-4o" in model:
        print("Warning: gpt-4o and gpt-4o-mini may update over time. Returning num tokens assuming gpt-4o-2024-08-06.")
        return num_tokens_from_messages(messages, model="gpt-4o-2024-08-06")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    elif "gpt-5" in model:
        print("Warning: gpt-5 may update over time. Returning num tokens assuming gpt-5-2025-08-07.")
        return num_tokens_from_messages(messages, model="gpt-5-2025-08-07")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

example_messages = [
    {
        "role": "system",
        "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
    },
    {
        "role": "system",
        "content": "New synergies will help drive top-line growth.",
    },
    {
        "role": "system",
        "content": "Things working well together will increase revenue.",
    },
    {
        "role": "system",
        "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
    },
    {
        "role": "system",
        "content": "Let's talk later when we're less busy about how to do better.",
    },
    {
        "role": "user",
        "content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
    },
]

for model in [
    "gpt-3.5-turbo",
    "gpt-4",
    "gpt-4o",
    "gpt-4o-mini",
    "gpt-5"
    ]:
    print(model)
    # example token count from the function defined above
    print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
    # example token count from the OpenAI API
    response = client.chat.completions.create(model=model,
    messages=example_messages)
    
    print(f'{response.usage.prompt_tokens} prompt tokens counted by the OpenAI API.')
    print()

DouweM · 2025-11-18T18:30:04Z

pydantic_ai_slim/pydantic_ai/_utils.py

    return event_loop
+
+
+def num_tokens_from_messages(


This is OpenAI specific so it should live in models/openai.py

DouweM · 2025-11-18T18:30:29Z

pydantic_ai_slim/pydantic_ai/_utils.py

+
+def num_tokens_from_messages(
+    messages: list[ChatCompletionMessageParam] | list[ResponseInputItemParam],
+    model: OpenAIModelName = 'gpt-4o-mini-2024-07-18',


We don't need a default value

DouweM · 2025-11-18T18:32:35Z

pydantic_ai_slim/pydantic_ai/_utils.py

+    else:
+        raise NotImplementedError(
+            f"""num_tokens_from_messages() is not implemented for model {model}."""
+        )  # TODO: How to handle other models?


Are you able to reverse engineer the right formula for gpt-5?

As long as we document that this is a best effort calculation and may not be accurate down to the exact token, we can have one branch of logic for "everything before gpt-5" and one for every newer. If future models have different rules, we can update the logic then.

I think with a decreased primer the calculation for gpt5 is more accurate.

Should the method from the cookbook be the default for all other models?

DouweM · 2025-11-18T18:32:55Z

pydantic_ai_slim/pydantic_ai/_utils.py

+    try:
+        encoding = tiktoken.encoding_for_model(model)
+    except KeyError:
+        print('Warning: model not found. Using o200k_base encoding.')  # TODO: How to handle warnings?


No warnings please, let's just make a best effort