Replies: 1 comment 1 reply
-
|
@kfsone I agree with your assessment that the basic conversation pattern is not optimal, and we have been using more customized model context management when we built special agents like WebSurfer and FileSurfer. There are several built-in knobs you can tune when it comes to context management:
You can also directly code your custom agent with domain-specific context management logics: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I had given up on autogen a while back but am circling back because nobody else seems to have avoided the same major pitfalls autogen falls into, in that the idea that the payloads exchanged between endpoints are "conversations".
Foremost, almost every tool tries diligently to dispatch the same content to the LLM on the other end, and from reading code, prompts, data, papers, etc, it's quite clear that the engineers are succumbing during development to thinking of this as an essential.
If you send a model "You are Phillip, an AI expert" what does that do? Does it alter anything persistent or stateful on the machine running the model, on disk?
Instead of wasting ridiculous amounts of time and compute, why don't we:
Stop writing wordy, theatrical, performative prompts. Very few models were actually trained on a significant number of conversations where one user said: "You are X, an AI with significant Python skills". As a result, it is very difficult to be scientific about the impact this has on the inference progress, and it's why we get such varied and unreliable results.
Unbloat the context. The current conversational to contexts guarantees degradation of model performance by diluting attention; we've all seen how models can get stuck doing the very thing you or they say they must not do? This gets worse the more times the problem item or request not to do it appear, because that's how attention actually works.
A great example of Autogen shooting itself in the foot is when two models discuss a change to a large section of text - a large function, a chapter, etc. When the second model replies, there are now two copies of that large text.
This is a total waste of tokens, processing power, and incompatible with how inference actually works.
Autogen ought to be peeling back/reducing the context to stay on track. For instance, the reviewer agent might take the incoming "here's my solution " context, and then probing.
"{base context} Could we change line 325 to 'print(hello, world)'? Would that break anything"
and collecting the feedback from the coder agent. Instead of adding that to the base context, instead the reviewer agent would adapt it's ask to the coder agent "{base context} and assume we had changed line 325 to ... Now what would happen if we changed blue to pink?"
There are also all kinds of cases where an agent should be able to create a temporary context to send to it's own model as a way to do inline reasoning without bloating the progressing context: "How do I enable aarm64 compatibility?" That question doesn't require all 120k of tokens you are currently at.
Beta Was this translation helpful? Give feedback.
All reactions