Continue to call ollama's GPT-oss: 20b model problem #7061
Replies: 4 comments 3 replies
-
This is another program calling
This is a
|
Beta Was this translation helpful? Give feedback.
-
@main1015 Hi. Thank you for the detailed issue. I am wondering if this has something to do with keep alive. How much vram and system ram do you have? To test this I think we can try:
|
Beta Was this translation helpful? Give feedback.
-
![]() |
Beta Was this translation helpful? Give feedback.
-
Ollama recently updated and implemented some new things under the hood relation to memory prediction and allocation. Try this:
I found this to work even through Continue, where similar-sized models would split when called from Continue, due to inflated memory allocation. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm currently experiencing an odd issue when using
continue
to connect to thegpt-oss:20b
model onollama
. Whenollama
loads this model, it appears to load into thecpu
rather than thegpu
. However, when I connect to thegpt-oss:20b
model through other tools or code, it loads into thegpu
, which is quite puzzling. When I captured the data fromcontinue
' s call toApi.chat
and sent it toollama
viapostman
, the model still loads into thegpu
. Please help resolve this. Theconfig.yaml
configuration file for mycontinue
is as follows:Beta Was this translation helpful? Give feedback.
All reactions