Skip to content

Research: binary token input and output #19458

@geobang

Description

@geobang

Research Stage

  • Background Research (Let's try to avoid reinventing the wheel)
  • Hypothesis Formed (How do you think this will work and it's effect?)
  • Strategy / Implementation Forming
  • Analysis of results
  • Debrief / Documentation (So people in the future can learn from us)

Previous existing literature and research

Hi,

Tl/Dr: make bpe out of LLM and make it as agent so LLM can communicate more efficiently by token directly and lower resource needed. It also buried the border of multi-llm and moe llm too.

In the very first from bpe this is a serialization process that is time consuming and single threaded.

There are various attempt to make it multi threaded or GPU accelerated, namely blockbpe and my poc #19410

In reality the llm design is moving from big unified to Moe and multi-llm with agents. Since bpe is still playing an important role of intercomm of agent and llms ATM. Since we treat LLM as a one piece. If we moving forward and decouple bpe, by first Principles tell us the bpe only useful for human but not computer itself. So what if bpe is cut out from llm?

Actually LLM can have lightening fast inference but slow bpe. So if LLM wanna talking to each other, they only need Exchange of tokens. there is a problem that each agent or LLM have it's own token domain so like people talking with different slangs. So how to overcome this?

And there is a very effective way is to have a translation layer that is indexed and easily mapping, this is also strong as GPU do. Imagine if we have an agent sit in between llms and llms are input/output with token that is understood by agent? Yes. We can make agent doing with this when agent loading sam LLM tokenizer part and setup the mapping table for both llms.

This mapping is easily implemented. And the llama.cpp only need to allow token as input and output. So it is more simple to remove function than adding.

In the same time, agent can serve people with tokenizer and detokenizer result and store it to SQL server for further reference and those completed the circle.

And if you look back, the border of multi-llms and Moe is increasingly blurred. With all above, the hw resource price need to pay can be lowered dramatically, in theory to remove the serialization. So this is a small to medium sized project and now still have business value, it also benefit those project like openclaw to make local llms possible.

Please find the above idea useful and see if there any party can take it.

Best regards
George

Hypothesis

No response

Implementation

No response

Analysis

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions