Research: binary token input and output

### Research Stage

- [x] Background Research (Let's try to avoid reinventing the wheel)
- [ ] Hypothesis Formed (How do you think this will work and it's effect?)
- [ ] Strategy / Implementation Forming
- [ ] Analysis of results
- [ ] Debrief / Documentation (So people in the future can learn from us)

### Previous existing literature and research

Hi,

Tl/Dr: make bpe out of LLM and make it as agent so LLM can communicate more efficiently by token directly and lower resource needed. It also buried the border of multi-llm and moe llm too.

In the very first from bpe this is a serialization process that is time consuming and single threaded.

There are various attempt to make it multi threaded or GPU accelerated, namely blockbpe and my poc https://github.com/ggml-org/llama.cpp/issues/19410

In reality the llm design is moving from big unified to Moe and multi-llm with agents. Since bpe is still playing an important role of intercomm of agent and llms ATM. Since we treat LLM as a one piece. If we moving forward and decouple bpe, by first Principles tell us the bpe only useful for human but not computer itself. So what if bpe is cut out from llm? 

Actually LLM can have lightening fast inference but slow bpe. So if LLM wanna talking to each other, they only need Exchange of tokens. there is a problem that each agent or LLM have it's own token domain so like people talking with different slangs. So how to overcome this?

And there is a very effective way is to have a translation layer that is indexed and easily mapping, this is also strong as GPU do. Imagine if we have an agent sit in between llms and llms are input/output with token that is understood by agent? Yes. We can make agent doing with this when agent loading sam LLM tokenizer part and setup the mapping table for both llms. 

This mapping is easily implemented. And the llama.cpp only need to allow token as input and output. So it is more simple to remove function than adding.

In the same time, agent can serve people with tokenizer and detokenizer result and store it to SQL server for further reference and those completed the circle.

And if you look back, the border of multi-llms and Moe is increasingly blurred. With all above, the hw resource price need to pay can be lowered dramatically, in theory to remove the serialization. So this is a small to medium sized project and now still have business value, it also benefit those project like openclaw to make local llms possible.

Please find the above idea useful and see if there any party can take it.

Best regards
George

### Hypothesis

_No response_

### Implementation

_No response_

### Analysis

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research: binary token input and output #19458

Research Stage

Previous existing literature and research

Hypothesis

Implementation

Analysis

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Research: binary token input and output #19458

Description

Research Stage

Previous existing literature and research

Hypothesis

Implementation

Analysis

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions