Continue to call ollama's GPT-oss: 20b model problem #7061
Replies: 4 comments 4 replies
-
|
This is another program calling This is a |
Beta Was this translation helpful? Give feedback.
-
|
@main1015 Hi. Thank you for the detailed issue. I am wondering if this has something to do with keep alive. How much vram and system ram do you have? To test this I think we can try: |
Beta Was this translation helpful? Give feedback.
-
Running ollama on ubuntu, RTX A5000 24GB vRAM, 128GB system RAM. keepAlive: 0 I added this parameter and tried it, but it didn't seem to work
|
Beta Was this translation helpful? Give feedback.
-
|
Ollama recently updated and implemented some new things under the hood relation to memory prediction and allocation. Try this:
I found this to work even through Continue, where similar-sized models would split when called from Continue, due to inflated memory allocation. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
I'm currently experiencing an odd issue when using
continueto connect to thegpt-oss:20bmodel onollama. Whenollamaloads this model, it appears to load into thecpurather than thegpu. However, when I connect to thegpt-oss:20bmodel through other tools or code, it loads into thegpu, which is quite puzzling. When I captured the data fromcontinue' s call toApi.chatand sent it toollamaviapostman, the model still loads into thegpu. Please help resolve this. Theconfig.yamlconfiguration file for mycontinueis as follows:Beta Was this translation helpful? Give feedback.
All reactions