-
Notifications
You must be signed in to change notification settings - Fork 1.4k
WIP: Add count_tokens for openAI models #3447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP: Add count_tokens for openAI models #3447
Conversation
| return event_loop | ||
|
|
||
|
|
||
| def num_tokens_from_messages( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OpenAI specific so it should live in models/openai.py
|
|
||
| def num_tokens_from_messages( | ||
| messages: list[ChatCompletionMessageParam] | list[ResponseInputItemParam], | ||
| model: OpenAIModelName = 'gpt-4o-mini-2024-07-18', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need a default value
| else: | ||
| raise NotImplementedError( | ||
| f"""num_tokens_from_messages() is not implemented for model {model}.""" | ||
| ) # TODO: How to handle other models? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to reverse engineer the right formula for gpt-5?
As long as we document that this is a best effort calculation and may not be accurate down to the exact token, we can have one branch of logic for "everything before gpt-5" and one for every newer. If future models have different rules, we can update the logic then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with a decreased primer the calculation for gpt5 is more accurate.
Should the method from the cookbook be the default for all other models?
| try: | ||
| encoding = tiktoken.encoding_for_model(model) | ||
| except KeyError: | ||
| print('Warning: model not found. Using o200k_base encoding.') # TODO: How to handle warnings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No warnings please, let's just make a best effort
| if self.system != 'openai': | ||
| raise NotImplementedError('Token counting is only supported for OpenAI system.') | ||
|
|
||
| openai_messages = await self._map_messages(messages, model_request_parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call self.prepare_request before this call, like we do in the other model classes' count_tokens methods
| ) | ||
|
|
||
|
|
||
| def num_tokens_from_messages( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this a private function
| elif 'gpt-5' in model: | ||
| return num_tokens_from_messages(messages, model='gpt-5-2025-08-07') | ||
| else: | ||
| raise NotImplementedError(f"""num_tokens_from_messages() is not implemented for model {model}.""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's simplify all of this as if 'gpt-5' in model: <do the new thing> else: <do the old thing>
| 'gpt-4o-2024-08-06', | ||
| }: | ||
| tokens_per_message = 3 | ||
| final_primer = 3 # every reply is primed with <|start|>assistant<|message|> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's include a link to the doc we took this from
| 'gpt-5-2025-08-07', | ||
| }: | ||
| tokens_per_message = 3 | ||
| final_primer = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it explicit that this one was "reverse engineered"
| assert_never(item) | ||
| return responses.EasyInputMessageParam(role='user', content=content) | ||
|
|
||
| async def count_tokens( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're at it, let's update the docstring for UsageLimits.count_tokens_before_request to make it explicit which models support it (i.e. which implement the count_tokens method)
Work for #3430
Took method 1:1 from OpenAI cookbook. However the example only shows for certain models. How should other models be handled? Quick test with gpt-5 showed a token number that differed from this method.
gpt-5 Warning: gpt-5 may update over time. Returning num tokens assuming gpt-5-2025-08-07. 110 prompt tokens counted by num_tokens_from_messages(). 109 prompt tokens counted by the OpenAI API.Test script