Chat cost estimator to help users understand APIs' usage-based pricing structures #5182
Replies: 2 comments
-
While I think the suggestions idea is a little iffy (difficult to implement, and once implemented it would need constant maintenance to stay accurate), I agree that we should have a way to visible see our API cost estimates. Cline (for VS Code) has this and it greatly reduces my hesitancy with the tool, because I know exactly what my costs are. |
Beta Was this translation helpful? Give feedback.
-
We have developed this in my non-librechat chat application, solely for OpenAI. Yes it is a pain. But it would greatly increase adoption of this tool if it could be used as a proxy for paying the API costs for subscribers (think, employees or students), while setting quotas on how much people could spend. I think every company in the world would want this. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm guessing that this would be a pain in the backside to implement, but thought I'd put out the suggestion anyway. It would also require knowing the latest API prices so ... see the foregoing!
My only concern with using LLMs via API is knowing that the costs are harder to predict than using a web UI. Especially in chats with Anthropic, where I often take advantage of the long context window. I know that there's prompt caching, but sometimes I wonder, "how much did that chat cost (ballpark)?" Not too hard to get daily billing from the platform dashboards, of course. But that still doesn't give you any kind of breakdown.
Perhaps as a sort of hand-holding feature for those new to this way of working with models, it would be cool to provide an estimate for how much a single chat costs. The next level feature, which I'll add here anyway might be something like "cost optimization suggestion". For example, the tool could flag that a chat used a needlessly advanced API for a simple task and perhaps could suggest a more economical alternative. Like "Hey, you definitely didn't need to use O1 for this. GPT 3.5 is effective for simple summarization tasks like the one you did here and way less expensive."
In terms of implementation, I guess that you could try to give a ballpark by calculating the total words in the chat using a tokenization estimator, and then converting that to an estimated billing amount at the rate for the particular model these are used. Alternatively, you could allow the user to supply the rate, offloading the work of finding them onto them.
I think probably way too complicated given that you'd also need to calculate inputs and outputs separately, factor in caching. But ... perhaps a stretch feature idea!
Beta Was this translation helpful? Give feedback.
All reactions