Roo crashes TabbyApi/Exlamma after v3.25.20

### App Version

v3.25.20

### API Provider

OpenAI Compatible

### Model Used

Devstral-Small-2507

### Roo Code Task Links (Optional)

_No response_

### 🔁 Steps to Reproduce

1. Update RooCode to any version higher than v3.25.20.
2. Setup an OpenAI compatible model in the Configuration Profile. The server is a local machine on the same network running a TabbyApi instance which serves a model via the ExlammaV2 engine on an OpenAI compatible endpoint.
3. Run any task.

### 💥 Outcome Summary

Expected response but received error: 
API Request Failed. 
Chat completion aborted. Please check the server console.

<img width="817" height="122" alt="Image" src="https://github.com/user-attachments/assets/fbbb9b0b-4e55-46a8-b1b7-3668bf37d632" />

TabbyApi server logs posted below. 

[Release v3.25.21](https://github.com/RooCodeInc/Roo-Code/releases/tag/v3.25.21) seems to have introduced an issue with the way Roo communicates to an OpenAI compatible endpoint, which is causing an issue. Downgrading to v3.25.20 works but any version beyond that throws the error.

### 📄 Relevant Logs or Errors (Optional)

```shell
2025-09-01 16:35:24.417 INFO:     # Current Workspace Directory 
(c:/repo/novel-writer) Files
2025-09-01 16:35:24.417 INFO:     .vscode/
2025-09-01 16:35:24.417 INFO:     documentation/
2025-09-01 16:35:24.417 INFO:     project_details/
2025-09-01 16:35:24.417 INFO:     prompt_plan/
2025-09-01 16:35:24.417 INFO:     public/
2025-09-01 16:35:24.417 INFO:     roadmap/
2025-09-01 16:35:24.417 INFO:     src/
2025-09-01 16:35:24.417 INFO:     src/components/
2025-09-01 16:35:24.417 INFO:     src/components/Breadcrumbs/
2025-09-01 16:35:24.417 INFO:     src/components/CharacterDetails/
2025-09-01 16:35:24.417 INFO:     src/components/ColorSchemeToggle/
2025-09-01 16:35:24.417 INFO:     src/components/CommitModal/
2025-09-01 16:35:24.417 INFO:     src/components/CustomParagraph/
2025-09-01 16:35:24.417 INFO:     src/components/ExportComponent/
2025-09-01 16:35:24.417 INFO:     src/components/FileImportModal/
2025-09-01 16:35:24.417 INFO:     
2025-09-01 16:35:24.417 INFO:     (File list truncated. Use list_files on 
specific subdirectories if you need to explore further.)
2025-09-01 16:35:24.417 INFO:     You have not created a todo list yet. Create 
one with `update_todo_list` if your task is complicated or involves multiple 
steps.
2025-09-01 16:35:24.417 INFO:     </environment_details>[/INST]
2025-09-01 16:35:30.013 ERROR:    FATAL ERROR with generation. Attempting to 
recreate the generator. If this fails, please restart the server.
2025-09-01 16:35:30.013 WARNING:  Immediately terminating all jobs. Clients will
have their requests cancelled.
2025-09-01 16:35:30.016 ERROR:    Traceback (most recent call last):
2025-09-01 16:35:30.016 ERROR:      File 
"/app/endpoints/OAI/utils/chat_completion.py", line 376, in 
stream_generate_chat_completion
2025-09-01 16:35:30.016 ERROR:        raise generation
2025-09-01 16:35:30.016 ERROR:      File 
"/app/endpoints/OAI/utils/completion.py", line 118, in _stream_collector
2025-09-01 16:35:30.016 ERROR:        async for generation in new_generation:
2025-09-01 16:35:30.016 ERROR:      File "/app/backends/exllamav2/model.py", 
line 977, in stream_generate
2025-09-01 16:35:30.016 ERROR:        async for generation_chunk in 
self.generate_gen(
2025-09-01 16:35:30.016 ERROR:      File "/app/backends/exllamav2/model.py", 
line 1465, in generate_gen
2025-09-01 16:35:30.016 ERROR:        raise ex
2025-09-01 16:35:30.016 ERROR:      File "/app/backends/exllamav2/model.py", 
line 1403, in generate_gen
2025-09-01 16:35:30.016 ERROR:        async for result in job:
2025-09-01 16:35:30.016 ERROR:      File 
"/opt/venv/lib/python3.12/site-packages/exllamav2/generator/dynamic_async.py", 
line 97, in __aiter__
2025-09-01 16:35:30.016 ERROR:        raise result
2025-09-01 16:35:30.016 ERROR:      File 
"/opt/venv/lib/python3.12/site-packages/exllamav2/generator/dynamic_async.py", 
line 28, in _run_iteration
2025-09-01 16:35:30.016 ERROR:        results = self.generator.iterate()
2025-09-01 16:35:30.016 ERROR:                  ^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-01 16:35:30.016 ERROR:      File 
"/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, 
in decorate_context
2025-09-01 16:35:30.016 ERROR:        return func(*args, **kwargs)
2025-09-01 16:35:30.016 ERROR:               ^^^^^^^^^^^^^^^^^^^^^
2025-09-01 16:35:30.016 ERROR:      File 
"/opt/venv/lib/python3.12/site-packages/exllamav2/generator/dynamic.py", line 
1002, in iterate
2025-09-01 16:35:30.016 ERROR:        self.iterate_gen(results)
2025-09-01 16:35:30.016 ERROR:      File 
"/opt/venv/lib/python3.12/site-packages/exllamav2/generator/dynamic.py", line 
1251, in iterate_gen
2025-09-01 16:35:30.016 ERROR:        job.receive_logits(job_logits)
2025-09-01 16:35:30.016 ERROR:      File 
"/opt/venv/lib/python3.12/site-packages/exllamav2/generator/dynamic.py", line 
1888, in receive_logits
2025-09-01 16:35:30.016 ERROR:        ExLlamaV2Sampler.sample(
2025-09-01 16:35:30.016 ERROR:      File 
"/opt/venv/lib/python3.12/site-packages/exllamav2/generator/sampler.py", line 
540, in sample
2025-09-01 16:35:30.016 ERROR:        m = ext_c.sample_basic(
2025-09-01 16:35:30.016 ERROR:            ^^^^^^^^^^^^^^^^^^^
2025-09-01 16:35:30.016 ERROR:    TypeError: sample_basic(): incompatible 
function arguments. The following argument types are supported:
2025-09-01 16:35:30.016 ERROR:        1. (arg0: torch.Tensor, arg1: float, arg2:
int, arg3: float, arg4: float, arg5: float, arg6: float, arg7: float, arg8: 
float, arg9: torch.Tensor, arg10: torch.Tensor, arg11: torch.Tensor, arg12: 
torch.Tensor, arg13: torch.Tensor, arg14: bool, arg15: list[float], arg16: 
float, arg17: float, arg18: float, arg19: torch.Tensor, arg20: float, arg21: 
float, arg22: float, arg23: float, arg24: float, arg25: float, arg26: float) -> 
list[float]
2025-09-01 16:35:30.016 ERROR:    
2025-09-01 16:35:30.016 ERROR:    Invoked with: tensor([[[4.5430, 4.5859, 
1.3486,  ..., 4.1016, 3.3926, 4.9648]]]), None, 0, 1.0, 0.0, 0.0, 1.0, 1.0, 
0.8176540387510759, tensor([[841435730]]), tensor([[2.2581e+33]]), tensor(..., 
device='meta', size=(1, 1)), tensor(..., device='meta', size=(1, 1)), 
tensor(..., device='meta', size=(1, 1)), False, [], 1.5, 0.3, 1.0, tensor(..., 
device='meta', size=(1, 1)), 0.0, 0.1, 1.0, 1.0, 1.0, 0.0, 0.0
2025-09-01 16:35:30.023 ERROR:    Sent to request: Chat completion aborted. 
Please check the server console.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Roo crashes TabbyApi/Exlamma after v3.25.20 #7581

App Version

API Provider

Model Used

Roo Code Task Links (Optional)

🔁 Steps to Reproduce

💥 Outcome Summary

📄 Relevant Logs or Errors (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Roo crashes TabbyApi/Exlamma after v3.25.20 #7581

Description

App Version

API Provider

Model Used

Roo Code Task Links (Optional)

🔁 Steps to Reproduce

💥 Outcome Summary

📄 Relevant Logs or Errors (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions