Skip to content

running ollama as a remote LLM server #44

@obriensystems

Description

@obriensystems

moved to ObrienlabsDev/rag#9
a continuation of #37
see ObrienlabsDev/blog#95

Running ollama as a server
add a rule to the firewall

terminate ollama, don't restart
ien@ultra2 ~ % echo $OLLAMA_HOST

n@ultra2 ~ % export OLLAMA_HOST=0.0.0.0:11434
en@ultra2 ~ % ollama serve                    
Error: listen tcp 0.0.0.0:11434: bind: address already in use
michaelobrien@ultra2 ~ % ollama serve 
2025/03/01 18:24:55 routes.go:1205: INFO server config env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/michaelobrien/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-03-01T18:24:55.692-05:00 level=INFO source=images.go:432 msg="total blobs: 7"
time=2025-03-01T18:24:55.693-05:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-01T18:24:55.693-05:00 level=INFO source=routes.go:1256 msg="Listening on [::]:11434 (version 0.5.12)"
time=2025-03-01T18:24:55.722-05:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="48.0 GiB" available="48.0 GiB"


serving...
[GIN] 2025/03/01 - 18:27:23 | 404 |       8.292µs |  192.168.11.107 | GET      "/api/models"
[GIN] 2025/03/01 - 18:27:27 | 200 |      63.167µs |  192.168.11.107 | GET      "/"
[GIN] 2025/03/01 - 18:28:04 | 200 |      83.333µs |  192.168.11.102 | GET      "/"
[GIN] 2025/03/01 - 18:28:50 | 200 |      80.709µs |  192.168.11.102 | GET      "/"
[GIN] 2025/03/01 - 18:28:50 | 404 |        6.25µs |  192.168.11.102 | GET      "/favicon.ico"
[GIN] 2025/03/01 - 18:29:33 | 404 |      19.541µs |  192.168.11.102 | GET      "/api/models"

in another terminal

ien@ultra2 performance % LSOF -nP -iTCP:11434 -sTCP:LISTEN          
COMMAND  PID          USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
ollama  1613 min    3u  IPv6 0xe5ef4dc0426442e1      0t0  TCP *:11434 (LISTEN)
rien@ultra2 performance % netstat -an | grep 11434                   
tcp46      0      0  *.11434                *.*                    LISTEN 

ra2 performance % curl http://192.168.11.107:11434           
Ollama is running%   

no need to run the model locally - the rest api will take care of this.
on the host check the models loaded

michaelobrien@ultra2 performance % ollama list
NAME               ID              SIZE     MODIFIED    
deepseek-r1:32b    38056bbcbb2d    19 GB    13 days ago    
deepseek-r1:70b    0c1615a8ca32    42 GB    13 days ago  

on the client run a query

curl -X POST http://192.168.11.107:11434/api/generate \
     -H "Content-Type: application/json" \
     -d '{
           "model": "deepseek-r1:70b",
           "prompt": "What is the capital of Canada?",
           "stream": false
         }'

urlencode using jq

vi data.json
jq -r '     
  "http://192.168.11.107:11434/api/generate?" +
  ([ to_entries[] | (@uri "\(.key)" + "=" + @uri "\(.value|tostring)" ) ] | join("&"))
' data.json

not quite
http://192.168.11.107:11434/api/generate?model=deepseek-r1%3A70b&prompt=What%20is%20the%20capital%20of%20Canada%3F&stream=false

need
http://192.168.11.107:11434/api/generate?model=deepseek-r1%3A70b&prompt=What%20is%20the%20capital%20of%20Canada%3F&stream=false

testing this out - I need CORS set

export OLLAMA_ORIGINS="*"

restart server
ollama serve

test it
curl -H "Origin: http://example.com" -I http://192.168.11.107:11434/api/tags

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=utf-8
Date: Sun, 02 Mar 2025 01:08:31 GMT
Content-Length: 690

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions