Skip to content
Discussion options

You must be logged in to vote

When prompts are over 75 tokens (let's say 150 tokens), we are submitting multiple sets of 75 tokens. Tokens only have context of whatever else is in the same set. This means you might have "blue hair" at the border between the 1st and 2nd set, and the token "blue" will be in the 1st set, and "hair" will be in the second. What this leads to is incoherency, as the two words are separated.

That setting tries to mitigate that by finding the last comma if there are any within the last N tokens, and if so - moving everything past that comma together into the next set.

So:
Set 1: {[74]=COMMA,[75]=blue}, Set 2: {[76]=hair} => Set 1: {[74]=COMMA,[75]=PADDING}, Set 2: {[76]=blue, [77]=hair}

Don't …

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@tearxinnuan
Comment options

@hentailord85ez
Comment options

@tearxinnuan
Comment options

@hentailord85ez
Comment options

Answer selected by tearxinnuan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants