Why is Context Shifting not kicking in for all messages even without using dynamic information (Memories)? #674

Spacellary · 2024-02-10T04:10:15Z

Spacellary
Feb 10, 2024

When near max context only some messages benefit from Context Shifting, I can't seem to find the reason why that happens.

Real examples:

CtxLimit: 11892/12288, Process:34.76s (3.0ms/T = 338.83T/s), Generate:9.61s (82.9ms/T = 12.07T/s), Total:44.37s (2.61T/s)
CtxLimit: 11959/12288, Process:33.04s (3.1ms/T = 323.44T/s), Generate:9.76s (82.7ms/T = 12.09T/s), Total:42.81s (2.76T/s)
CtxLimit: 11833/12288, Process:32.48s (3.1ms/T = 325.81T/s), Generate:7.91s (82.4ms/T = 12.13T/s), Total:40.39s (2.38T/s)
[Context Shifting: Erased 240 tokens at position 1238]
CtxLimit: 11782/12288, Process:0.81s (10.6ms/T = 94.41T/s), Generate:9.38s (82.3ms/T = 12.15T/s), Total:10.19s (11.19T/s)
[Context Shifting: Erased 191 tokens at position 1237]
CtxLimit: 11757/12288, Process:0.67s (9.7ms/T = 103.60T/s), Generate:8.09s (82.5ms/T = 12.12T/s), Total:8.75s (11.20T/s)
[Context Shifting: Erased 192 tokens at position 1239]
CtxLimit: 11762/12288, Process:0.81s (8.9ms/T = 112.35T/s), Generate:8.83s (82.5ms/T = 12.12T/s), Total:9.64s (11.10T/s)
CtxLimit: 11890/12288, Process:32.69s (3.1ms/T = 325.07T/s), Generate:9.20s (82.9ms/T = 12.06T/s), Total:41.89s (2.65T/s)
[Context Shifting: Erased 193 tokens at position 1376]
CtxLimit: 11897/12288, Process:0.81s (8.9ms/T = 111.93T/s), Generate:9.10s (82.7ms/T = 12.09T/s), Total:9.91s (11.10T/s)
CtxLimit: 11899/12288, Process:0.08s (83.0ms/T = 12.05T/s), Generate:9.29s (82.9ms/T = 12.06T/s), Total:9.37s (11.95T/s)
CtxLimit: 11892/12288, Process:0.08s (83.0ms/T = 12.05T/s), Generate:8.71s (83.0ms/T = 12.05T/s), Total:8.80s (11.94T/s)
[Context Shifting: Erased 181 tokens at position 1377]
CtxLimit: 11894/12288, Process:0.78s (9.5ms/T = 105.26T/s), Generate:8.46s (83.0ms/T = 12.05T/s), Total:9.24s (11.04T/s)

CtxLimit: 7786/8192, Process:0.14s (138.0ms/T = 7.25T/s), Generate:6.48s (64.2ms/T = 15.58T/s), Total:6.62s (15.25T/s)
[Context Shifting: Erased 178 tokens at position 1132]
CtxLimit: 7781/8192, Process:0.57s (8.1ms/T = 122.81T/s), Generate:6.67s (64.1ms/T = 15.59T/s), Total:7.24s (14.36T/s)
[Context Shifting: Erased 182 tokens at position 1133]
CtxLimit: 7772/8192, Process:0.67s (9.0ms/T = 111.28T/s), Generate:6.34s (64.0ms/T = 15.62T/s), Total:7.01s (14.12T/s)
CtxLimit: 7830/8192, Process:21.61s (3.2ms/T = 308.77T/s), Generate:7.01s (64.3ms/T = 15.56T/s), Total:28.61s (3.81T/s)
[Context Shifting: Erased 197 tokens at position 1237]
CtxLimit: 7801/8192, Process:0.54s (8.4ms/T = 118.52T/s), Generate:6.83s (65.0ms/T = 15.38T/s), Total:7.37s (14.25T/s)
[Context Shifting: Erased 101 tokens at position 1237]
CtxLimit: 7874/8192, Process:0.58s (7.8ms/T = 128.21T/s), Generate:6.46s (64.6ms/T = 15.48T/s), Total:7.05s (14.19T/s)
CtxLimit: 7865/8192, Process:21.98s (3.3ms/T = 305.74T/s), Generate:6.10s (64.2ms/T = 15.58T/s), Total:28.08s (3.38T/s)
CtxLimit: 8006/8192, Process:22.33s (3.3ms/T = 307.17T/s), Generate:6.41s (65.4ms/T = 15.30T/s), Total:28.73s (3.41T/s)

You can see the huge drop in final T/s when shifting doesn't happen.

I am using the prebuilt koboldcpp 1.57.1 + SillyTavern 1.11.4+ (staging, latest commits), and I made sure I don't have any dynamic information added anywhere in the context sent for processing.

Context/Response Formatting:

I don't have (I even disabled the modules and extensions I mention):

Summary
Any lorebooks or similar, either global or on character
Vector Storage (RAG)
Silly Tavern's "Smart Context" (which is also a RAG)
Objectives (extension)
Anything on characters or otherwise being inserted @D or similar

So, I am not using any Memory or other dynamic information that might trigger additional context reprocessing, but this keeps happening.

What causes this? Is it an issue with the Model - due to quantization and RoPE?

For the example and this discussion, I'm using s3nh/Kunoichi-DPO-v2-7B-GGUF (Q4_K_M):

koboldcpp "...\kunoichi-dpo-v2-7b.Q4_K_M.gguf" --multiuser --gpulayers 33 --contextsize 12288 --port 6969 --blasbatchsize 256 --usecublas --quiet --remotetunnel --nommap --highpriority

Namespace(bantokens=None, benchmark=None, blasbatchsize=256, blasthreads=5, config=None, contextsize=12288, debugmode=0, forceversion=0, foreground=False, gpulayers=33, highpriority=True, hordeconfig=None, host='', launch=False, lora=None, model=None, model_param='...\kunoichi-dpo-v2-7b.Q4_K_M.gguf', multiuser=1, noavx2=False, noblas=False, nommap=True, noshift=False, onready='', port=6969, port_param=5001, preloadstory='', quiet=True, remotetunnel=True, ropeconfig=[0.0, 10000.0], skiplauncher=False, smartcontext=False, ssl=None, tensor_split=None, threads=5, useclblast=None, usecublas=[], usemlock=False, usevulkan=None)

GPU: GTX 1070Ti - 8GB - Pascal
RAM: 32GB DDR4 3200MHZ
CPU: Ryzen 5 1600 AF 6C/12T 3.2-3.6MHZ (Zen 2 Arch)
OS: Windows 11 22H2

Spacellary · 2024-02-10T04:20:18Z

Spacellary
Feb 10, 2024
Author

@LostRuins I understand you might not have too much time at this moment, but when you are available, I'd be very thankful if you could chime into this.

Sorry for the ping.

0 replies

LostRuins · 2024-02-10T12:24:45Z

LostRuins
Feb 10, 2024
Maintainer

I'd suggest running in --debugmode and comparing the context, usually if the context shifting doesn't kick in, it means that not enough common text was matched to be able to do the shift.

2 replies

Spacellary Feb 11, 2024
Author

I am taking a look a Debug outputs, I'm trying to compare the [Debug: Dump Input Tokens, format: 6] sections.

Spacellary Feb 11, 2024
Author

I am still currently attributing the occurrences to degradations caused by the usage of a small and quantized model + automatic NTK RoPE scaling.
[In the process of testing.]

I'm trying to use VSCode to compare the Debug Dump outputs but it's not doing a good job of searching a big bunch of content across the terminal logs, it struggles searching for a big amount plain text.

Do you know of a tool that can help me with the comparison task?

Edit: I'll try again using Diffchecker.

Spacellary · 2024-02-11T03:55:28Z

Spacellary
Feb 11, 2024
Author

Testing in progress...
Report:

SillyTavern: Token Padding set to 32. Models: `kunoichi-7b.Q4_K_M.gguf` and `kunoichi-dpo-v2-7b.Q4_K_M.gguf`.

When using --contextsize 8192, Context Shifting worked flawlessly, processing times nearly instant.

CtxLimit: 8041/8192, Process:0.38s (16.4ms/T = 60.85T/s), Generate:8.61s (72.3ms/T = 13.83T/s), Total:8.98s (13.25T/s)
[Context Shifting: Erased 160 tokens at position 960]
CtxLimit: 8035/8192, Process:0.40s (9.9ms/T = 101.01T/s), Generate:7.59s (66.0ms/T = 15.15T/s), Total:7.99s (14.40T/s)
[Context Shifting: Erased 136 tokens at position 960]
CtxLimit: 8023/8192, Process:0.38s (16.6ms/T = 60.37T/s), Generate:6.76s (66.3ms/T = 15.09T/s), Total:7.14s (14.28T/s)
[Context Shifting: Erased 168 tokens at position 960]
CtxLimit: 8014/8192, Process:0.41s (10.2ms/T = 98.04T/s), Generate:8.15s (67.9ms/T = 14.73T/s), Total:8.56s (14.03T/s)
[Context Shifting: Erased 152 tokens at position 960]
CtxLimit: 7992/8192, Process:0.29s (12.0ms/T = 83.04T/s), Generate:7.08s (66.2ms/T = 15.10T/s), Total:7.37s (14.51T/s)
[Context Shifting: Erased 171 tokens at position 960]
CtxLimit: 7973/8192, Process:0.39s (9.8ms/T = 102.04T/s), Generate:7.47s (66.1ms/T = 15.12T/s), Total:7.87s (14.37T/s)
[Context Shifting: Erased 138 tokens at position 960]
CtxLimit: 7973/8192, Process:0.32s (13.9ms/T = 72.10T/s), Generate:7.71s (66.4ms/T = 15.05T/s), Total:8.03s (14.45T/s)
[Context Shifting: Erased 156 tokens at position 960]
CtxLimit: 7976/8192, Process:0.40s (9.9ms/T = 101.01T/s), Generate:7.98s (66.5ms/T = 15.05T/s), Total:8.37s (14.33T/s)
[Context Shifting: Erased 130 tokens at position 960]
CtxLimit: 7986/8192, Process:0.29s (13.6ms/T = 73.43T/s), Generate:8.00s (66.6ms/T = 15.01T/s), Total:8.28s (14.49T/s)
[Context Shifting: Erased 160 tokens at position 960]
CtxLimit: 7987/8192, Process:0.40s (9.4ms/T = 106.06T/s), Generate:7.98s (66.5ms/T = 15.04T/s), Total:8.38s (14.33T/s)
[Context Shifting: Erased 137 tokens at position 960]
CtxLimit: 8002/8192, Process:0.62s (8.1ms/T = 123.20T/s), Generate:7.40s (66.7ms/T = 14.99T/s), Total:8.03s (13.83T/s)
[Context Shifting: Erased 145 tokens at position 960]
CtxLimit: 7948/8192, Process:0.22s (54.0ms/T = 18.52T/s), Generate:5.84s (66.4ms/T = 15.06T/s), Total:6.06s (14.52T/s)
[Context Shifting: Erased 137 tokens at position 960]

...

I will test with --contextsize 12288 and Token Padding set to 128 or slightly higher.

0 replies

Spacellary · 2024-02-11T06:32:44Z

Spacellary
Feb 11, 2024
Author

Testing in progress...
Report 2:

SillyTavern: Token Padding set to 128/256/512/1024. Context 12288. Model: `kunoichi-dpo-v2-7b.Q4_K_M.gguf (4.83 BPW)`.

When using --contextsize 12288, Context Shifting DOESN'T happen very often, sometimes it's 50% of the time, sometimes it's 70% of the time. It's very random but it's making me more inclined to believe it's related to automatic NTK RoPE scaling affecting the model in a way that breaks Context processing consistency, as I couldn't reproduce this behavior when with 8192 context in the same model under the same conditions.

`Input: {"prompt": }` diffs - Successful vs Failed Shift:

Success Example - Diff (1 to 2):
https://www.diffchecker.com/bNJECF4S/

Fail Example - Diff (2 to 3):
https://www.diffchecker.com/5rJgaHUZ/

More deletions/additions are made in the failed example, as opposed to the successful one.

`n_past`:

[Debug: n_past=11812 Context Size = 11812]
(Context Shifted From 1 to 2)
[Debug: n_past=11784 Context Size = 11784]
(No Shift Happended from 2 to 3)
[Debug: n_past=951 Context Size = 951]

Does this make any sense/sound like something that could happen or is it normal, @LostRuins?

3 replies

LostRuins Feb 11, 2024
Maintainer

Nope, RoPE scaling will not affect context shifting.

LostRuins Feb 11, 2024
Maintainer

In the next version I will try to make context shifting a bit more lenient when matching context

Spacellary Feb 11, 2024
Author

@LostRuins – I'll give it a try when it's out then, it can be an optional flag modifier if you're unsure about changing the default behavior for the release, or whatever is easiest to do, if there's anything else I can provide to help I'm more than happy to do so.

It was definitely shifting way less at 12K context, compared to 8K or 4K.

On 4K CTX, Context Shifting is flawless all the time, as you increase Context, 6K, 8K, 12K... It happens less and less, above 8K personally it's very bad especially.

Spacellary · 2024-02-18T14:01:04Z

Spacellary
Feb 18, 2024
Author

See issue #681 for conclusion and cause of the reported here. It is solved as if right now, hopefully!

0 replies

Why is Context Shifting not kicking in for all messages even without using dynamic information (Memories)? #674

Uh oh!

Uh oh!

Spacellary Feb 10, 2024

Context/Response Formatting:

I don't have (I even disabled the modules and extensions I mention):

What causes this? Is it an issue with the Model - due to quantization and RoPE?

Replies: 5 comments · 5 replies

Uh oh!

Spacellary Feb 10, 2024 Author

Sorry for the ping.

Uh oh!

LostRuins Feb 10, 2024 Maintainer

Uh oh!

Spacellary Feb 11, 2024 Author

Uh oh!

Uh oh!

Spacellary Feb 11, 2024 Author

Uh oh!

Uh oh!

Spacellary Feb 11, 2024 Author

SillyTavern: Token Padding set to 32. Models: kunoichi-7b.Q4_K_M.gguf and kunoichi-dpo-v2-7b.Q4_K_M.gguf.

Uh oh!

Uh oh!

Spacellary Feb 11, 2024 Author

SillyTavern: Token Padding set to 128/256/512/1024. Context 12288. Model: kunoichi-dpo-v2-7b.Q4_K_M.gguf (4.83 BPW).

Input: {"prompt": } diffs - Successful vs Failed Shift:

n_past:

Uh oh!

LostRuins Feb 11, 2024 Maintainer

Uh oh!

LostRuins Feb 11, 2024 Maintainer

Uh oh!

Uh oh!

Spacellary Feb 11, 2024 Author

Uh oh!

Spacellary Feb 18, 2024 Author

Spacellary
Feb 10, 2024

Replies: 5 comments 5 replies

Spacellary
Feb 10, 2024
Author

LostRuins
Feb 10, 2024
Maintainer

Spacellary Feb 11, 2024
Author

Spacellary Feb 11, 2024
Author

Spacellary
Feb 11, 2024
Author

SillyTavern: Token Padding set to 32. Models: `kunoichi-7b.Q4_K_M.gguf` and `kunoichi-dpo-v2-7b.Q4_K_M.gguf`.

Spacellary
Feb 11, 2024
Author

SillyTavern: Token Padding set to 128/256/512/1024. Context 12288. Model: `kunoichi-dpo-v2-7b.Q4_K_M.gguf (4.83 BPW)`.

`Input: {"prompt": }` diffs - Successful vs Failed Shift:

`n_past`:

LostRuins Feb 11, 2024
Maintainer

LostRuins Feb 11, 2024
Maintainer

Spacellary Feb 11, 2024
Author

Spacellary
Feb 18, 2024
Author