-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
moved to ObrienlabsDev/rag#9
a continuation of #37
see ObrienlabsDev/blog#95
Running ollama as a server
add a rule to the firewall
terminate ollama, don't restart
ien@ultra2 ~ % echo $OLLAMA_HOST
n@ultra2 ~ % export OLLAMA_HOST=0.0.0.0:11434
en@ultra2 ~ % ollama serve
Error: listen tcp 0.0.0.0:11434: bind: address already in use
michaelobrien@ultra2 ~ % ollama serve
2025/03/01 18:24:55 routes.go:1205: INFO server config env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/michaelobrien/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-03-01T18:24:55.692-05:00 level=INFO source=images.go:432 msg="total blobs: 7"
time=2025-03-01T18:24:55.693-05:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-01T18:24:55.693-05:00 level=INFO source=routes.go:1256 msg="Listening on [::]:11434 (version 0.5.12)"
time=2025-03-01T18:24:55.722-05:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="48.0 GiB" available="48.0 GiB"
serving...
[GIN] 2025/03/01 - 18:27:23 | 404 | 8.292µs | 192.168.11.107 | GET "/api/models"
[GIN] 2025/03/01 - 18:27:27 | 200 | 63.167µs | 192.168.11.107 | GET "/"
[GIN] 2025/03/01 - 18:28:04 | 200 | 83.333µs | 192.168.11.102 | GET "/"
[GIN] 2025/03/01 - 18:28:50 | 200 | 80.709µs | 192.168.11.102 | GET "/"
[GIN] 2025/03/01 - 18:28:50 | 404 | 6.25µs | 192.168.11.102 | GET "/favicon.ico"
[GIN] 2025/03/01 - 18:29:33 | 404 | 19.541µs | 192.168.11.102 | GET "/api/models"
in another terminal
ien@ultra2 performance % LSOF -nP -iTCP:11434 -sTCP:LISTEN
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ollama 1613 min 3u IPv6 0xe5ef4dc0426442e1 0t0 TCP *:11434 (LISTEN)
rien@ultra2 performance % netstat -an | grep 11434
tcp46 0 0 *.11434 *.* LISTEN
ra2 performance % curl http://192.168.11.107:11434
Ollama is running%
no need to run the model locally - the rest api will take care of this.
on the host check the models loaded
michaelobrien@ultra2 performance % ollama list
NAME ID SIZE MODIFIED
deepseek-r1:32b 38056bbcbb2d 19 GB 13 days ago
deepseek-r1:70b 0c1615a8ca32 42 GB 13 days ago
on the client run a query
curl -X POST http://192.168.11.107:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1:70b",
"prompt": "What is the capital of Canada?",
"stream": false
}'
urlencode using jq
vi data.json
jq -r '
"http://192.168.11.107:11434/api/generate?" +
([ to_entries[] | (@uri "\(.key)" + "=" + @uri "\(.value|tostring)" ) ] | join("&"))
' data.json
not quite
http://192.168.11.107:11434/api/generate?model=deepseek-r1%3A70b&prompt=What%20is%20the%20capital%20of%20Canada%3F&stream=false
need
http://192.168.11.107:11434/api/generate?model=deepseek-r1%3A70b&prompt=What%20is%20the%20capital%20of%20Canada%3F&stream=false
testing this out - I need CORS set
export OLLAMA_ORIGINS="*"
restart server
ollama serve
test it
curl -H "Origin: http://example.com" -I http://192.168.11.107:11434/api/tags
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=utf-8
Date: Sun, 02 Mar 2025 01:08:31 GMT
Content-Length: 690