Plugin Architecture, Question-answering Costs, and Query Planning #167

AdamSobieski · 2023-03-27T13:30:43Z

AdamSobieski
Mar 27, 2023

Hello. Is or will Semantic Kernel be interoperable with the recently announced feature of ChatGPT plugins?

I would like to also ask about measuring and estimating the costs of answering specific natural-language questions and subquestions and about the related matter of query planning – including for complex queries utilizing one or more plugins.

Question-answering services utilizing GPT and plugins can be envisioned. Architectures can be considered which would allow logged-in users to create new questions and to collaboratively upvote existing questions until those questions accumulated enough "points" for them to be enqueued for processing. It seems reasonable that the number of "points" that a question should accumulate before being processed should meet or exceed some measure of the complexity, or cost, of answering that question.

Estimates or measures of question-answering complexity, or costs, might involve totaling any electrical, mechanical, computational, storage, transmission, and administrative costs required to answer the question.

To be able to estimate the costs of answering natural-language questions before doing so would be useful. One question might cost $0.002 to answer and another $0.02. Answering a question might only require processing one database table or graph and answering another might involve querying a set of federated resources.

It would seemingly be simpler to algorithmically estimate the complexity, or cost, of answering natural-language questions which can be mapped to a query language, e.g., SQL or SPARQL.

While I am still exploring the recently announced plugins architecture and its documentation, it seems to me that plugin developers could consider providing functions like CanAnswer(q), EstimateCost(q), and Answer(q) and that, for plugins with such capabilities, functionalities could then be requested for Semantic Kernel developers.

Thank you. Hopefully these ideas, comments, questions, and discussion topics are useful for the Semantic Kernel team and community.

Answered by timlaverty

Mar 27, 2023

On Plugin/Skills integration:
Short answer: Yes :).

Longer answer: we're figuring out how deep we go w/ integration. One option would be to import Plugins as SK Skills. Another, export SK Skills as Plugins. These aren't mutually exclusive. Another option - aligning on Skill manifest format.

Feedback welcome - thoughts?

On Cost Estimation
We are working on pieces of this but mostly for tracking costs after a call is made. This might be worth spinning up a separate discussion. I suspect the stochastic nature of LLMs would make accurate estimates challenging but I hear your points above and think this would be worthwhile to explore.

View full answer

timlaverty · 2023-03-27T16:50:24Z

timlaverty
Mar 27, 2023
Collaborator

On Plugin/Skills integration:
Short answer: Yes :).

Longer answer: we're figuring out how deep we go w/ integration. One option would be to import Plugins as SK Skills. Another, export SK Skills as Plugins. These aren't mutually exclusive. Another option - aligning on Skill manifest format.

Feedback welcome - thoughts?

On Cost Estimation
We are working on pieces of this but mostly for tracking costs after a call is made. This might be worth spinning up a separate discussion. I suspect the stochastic nature of LLMs would make accurate estimates challenging but I hear your points above and think this would be worthwhile to explore.

0 replies

Ben-Pattinson · 2023-05-10T08:03:23Z

Ben-Pattinson
May 10, 2023

Cost estimating is easy / hard. Easy when you know what the text being processed is (there's a C# implementation of a tokenizer in the repo), but hard because
a) As of today Microsoft haven't filled out the cost per 1000 tokens for chat gpt3.5
b) With a planning based system, you have literally no idea how many calls it's going to make and how much processing is going to happen. Yes, you know you input: "Make me lots of money on the stock market", but then it has to go get web pages, process them in a LLM in multiple steps before finally spitting out "Buy Tesla stock low, sell high". You see the input/output but not any of the work in the middle.

A better approach is probably to give a maximum cost. We know the maximum token size for the model, we (should) know the cost per 1000 tokens, we can then limit it to say 20 calls and therefore know the maximum processing cost. The actual cost can be tracked fairly trivially for the LLM, as it's just a matter of counting tokens used.

0 replies

AdamSobieski · 2024-04-28T02:42:09Z

AdamSobieski
Apr 28, 2024
Author

Circling back on these interesting topics, I found the following publication:

White, Ryen W., and Ahmed Hassan Awadallah. "Task duration estimation." In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 636-644. 2019. [PDF]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Plugin Architecture, Question-answering Costs, and Query Planning #167

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Plugin Architecture, Question-answering Costs, and Query Planning #167

Uh oh!

AdamSobieski Mar 27, 2023

Replies: 3 comments

Uh oh!

timlaverty Mar 27, 2023 Collaborator

Uh oh!

Ben-Pattinson May 10, 2023

Uh oh!

AdamSobieski Apr 28, 2024 Author

AdamSobieski
Mar 27, 2023

timlaverty
Mar 27, 2023
Collaborator

Ben-Pattinson
May 10, 2023

AdamSobieski
Apr 28, 2024
Author