-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Describe the bug
Specs:
Chat Model: Qwen3.5-9b (Using Instruct parameters)
Embedding Model: qwen3-embedding:0.6b via ollama (it doesnt seem this is being loaded, I dont see my VRAM go up)
5060ti 16gb
llama-swap with llama.cpp with 32k context window
When clicking on one of the news stories (https://www.wired.com/story/openai-fires-employee-insider-trading-polymarket-kalshi/) from the Perplexica discover page, it will often fail and just not do anything. Upon checking Perplexica logs, I see that it tried to use 481k tokens!:
⨯ unhandledRejection: Error: 400 request (481909 tokens) exceeds the available context size (32768 tokens), try increasing it
at f.generate (.next/server/chunks/607.js:20:19183)
at cX.makeStatusError (.next/server/chunks/607.js:27:51395)
at cX.makeRequest (.next/server/chunks/607.js:27:54864)
at async q.streamText (.next/server/chunks/136.js:1:2480)
at async i.research (.next/server/chunks/641.js:541:227)
at async m.searchAsync (.next/server/app/api/chat/route.js:1:13218) {
status: 400,
headers: Headers {
'access-control-allow-origin': '',
'content-length': '201',
'content-type': 'application/json; charset=utf-8',
date: 'Fri, 06 Mar 2026 17:49:01 GMT',
server: 'llama.cpp'
},
requestID: null,
error: [Object],
�
code: 400,
param: undefined,
type: 'exceed_context_size_error'
}
The box in red is Perplexica summarizing one page (https://techcrunch.com/2026/03/04/https-techcrunch-com-2026-03-04-google-search-rolls-out-geminis-canvas-in-ai-mode-to-all-us-users/), the green box is openwebui summarizing the same page using qwen3.5-thinking
When doing a balanced search, topic (Do research on benchmarks on is it worth upgrading my current PC that I use for gaming to a and 9800x3d with ddr5 memory? Specs: 49" 5120x1400 240hz monitor RTX 5080 (to be reused) intel i5-14600k overclocked to 5.6ghz 32gb 3600Mhz DDR4 Same 4x m.2), it attempted to use 483k tokens:
⨯ unhandledRejection: Error: 400 request (483846 tokens) exceeds the available context size (32768 tokens), try increasing it
at f.generate (.next/server/chunks/607.js:20:19183)
at cX.makeStatusError (.next/server/chunks/607.js:27:51395)
at cX.makeRequest (.next/server/chunks/607.js:27:54864)
at async q.streamText (.next/server/chunks/136.js:1:2480)
at async i.research (.next/server/chunks/641.js:541:227)
at async m.searchAsync (.next/server/app/api/chat/route.js:1:13218) {
To Reproduce
Steps to reproduce the behavior:
- Go to
Discover - Click on 'any story'
- Fails to summarize
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.