Context Compressor: A Recursive Ternary Embedding Document Search Algorithm #15985
SuiltaPico
announced in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This algorithm divides a document recursively into three groups for compression. Assuming we have an array of text segments
[1, 2, 3, 4, 5, 6]
, we first divide it into three groups[1, 2, 3], [3, 4, 5], [4, 5, 6]
, and then embed these three groups. Next, we search for the two most similar documents in the document based on vector similarity. After comparison, we find that the group[4, 5, 6]
has the lowest similarity. Therefore, we continue using the same method on the text segment array[1, 2, 3, 4]
...However, this is just a preliminary idea and I haven't had the time to put it into practice. Additionally, I have not determined the termination condition for the recursion. I hope this idea can inspire everyone and we can discuss the feasibility of this method together.
Beta Was this translation helpful? Give feedback.
All reactions