local-llm

Instructions and tools for running large language models (LLMs) locally on your own hardware.

Server installation instructions are not comprehensive, but should be enough to get started.

Initial setup on server

nvidia-smi to check GPU setting
Install Ollama: https://docs.ollama.com/linux
Set up reverse proxy for Ollama
- Install nginx
- Modify and copy one of the nginx ollama configs into /etc/nginx/sites-available/ollama
- Make symlink from /etc/nginx/sites-enabled/ollama

The architecture

System Overview

graph TD
    User[User] --> VSCode[VS Code with Continue Plugin]
    VSCode --> A[DNS: local-llm.example.com]
    A --> B{Routing}
    B -->|Option 1| C[Cloudflare Tunnel]
    B -->|Option 2| D[Direct IP Address]
    C --> E[Nginx Reverse Proxy]
    D --> E
    E -->|API Key Authentication| F[Ollama]

    style User fill:#f3e5f5
    style VSCode fill:#e8eaf6
    style A fill:#e1f5ff
    style E fill:#fff4e1
    style F fill:#e8f5e9

Request Flow Sequence

sequenceDiagram
    participant User
    participant VSCode as VS Code<br/>(Continue Plugin)
    participant DNS as DNS<br/>local-llm.example.com
    participant CF as Cloudflare Tunnel<br/>(Optional)
    participant Nginx as Nginx Reverse Proxy
    participant Ollama

    User->>VSCode: Request LLM assistance
    VSCode->>DNS: Resolve local-llm.example.com
    DNS-->>VSCode: IP Address

    alt via Cloudflare Tunnel
        VSCode->>CF: HTTPS Request
        CF->>Nginx: Forward Request
    else via Direct IP
        VSCode->>Nginx: Direct HTTPS Request
    end

    Nginx->>Nginx: Validate API Key
    Nginx->>Ollama: Forward Request
    Ollama->>Ollama: Process LLM Query
    Ollama-->>Nginx: LLM Response
    Nginx-->>VSCode: Return Response
    VSCode-->>User: Display Result

The plugins

Continue.dev's VS Code plugin. Easy to connect to local LLMs. Configure it to use your local Ollama instance by adding the following to your config.yaml found in ~/.continue/config.yaml:

name: Local Config
version: 1.0.0
schema: v1
models:
  - name: Llama 3.1 8B
    provider: ollama
    apiBase: ${{ secrets.OLLAMA_API_URL }}
    requestOptions:
      headers:
        Authorization: Bearer ${{ secrets.OLLAMA_API_KEY }}
    model: llama3.1:8b

Then ensure you have the environment variables set in your system or in a .env file:

OLLAMA_API_URL=https://local-llm.example.com
OLLAMA_API_KEY=your_api_key_here

Cline's VS Code plugin is another option which sometimes works better for agent coding. Auto-discover of models does not work with Ollama behind an API key, so you will need to manually add the model configuration in the plugin settings.

Recommended models

Follow continue.dev's recommendations based on your GPU size: https://docs.continue.dev/customize/models#recommended-models

For small GPU (16-24GB VRAM):

Qwen3-coder 30b for planning/agent
Qwen2.5-coder 1.5b for completion

General tips

"Restart" the chat/task often to avoid context bloat.
Commit your own work before using agent mode!
- Make it easy to revert changes by not having your own uncommitted changes in the repo. Learned this the hard way.
Use rules/custom prompts (depends on plugin)
Use MCP servers
- Continue.dev MCP server setup guide: https://docs.continue.dev/customize/deep-dives/mcp
- Cline instructions: https://docs.cline.bot/mcp/configuring-mcp-servers
- Cline MCP marketplace: https://cline.bot/mcp-marketplace
In agent or planning mode, ask the model to think step by step and come up with a detailed plan/todo list before writing code.
- Agent mode is not going to work super great on a small-ish model
Explore pre-made prompts:
- Continue.dev prompt library: https://hub.continue.dev/hub
  - Try e.g. "React"

More information

https://docs.continue.dev/guides/ollama-guide

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
nginx-ollama-cf-tunnel.conf		nginx-ollama-cf-tunnel.conf
nginx-ollama.conf		nginx-ollama.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

local-llm

Initial setup on server

The architecture

System Overview

Request Flow Sequence

The plugins

Recommended models

General tips

More information

About

Uh oh!

Releases

Packages

Uh oh!

svaletech/local-llm

Folders and files

Latest commit

History

Repository files navigation

local-llm

Initial setup on server

The architecture

System Overview

Request Flow Sequence

The plugins

Recommended models

General tips

More information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages