Instructions and tools for running large language models (LLMs) locally on your own hardware.
Server installation instructions are not comprehensive, but should be enough to get started.
nvidia-smito check GPU setting- Install Ollama: https://docs.ollama.com/linux
- Set up reverse proxy for Ollama
- Install nginx
- Modify and copy one of the nginx ollama configs into /etc/nginx/sites-available/ollama
- Make symlink from /etc/nginx/sites-enabled/ollama
graph TD
User[User] --> VSCode[VS Code with Continue Plugin]
VSCode --> A[DNS: local-llm.example.com]
A --> B{Routing}
B -->|Option 1| C[Cloudflare Tunnel]
B -->|Option 2| D[Direct IP Address]
C --> E[Nginx Reverse Proxy]
D --> E
E -->|API Key Authentication| F[Ollama]
style User fill:#f3e5f5
style VSCode fill:#e8eaf6
style A fill:#e1f5ff
style E fill:#fff4e1
style F fill:#e8f5e9
sequenceDiagram
participant User
participant VSCode as VS Code<br/>(Continue Plugin)
participant DNS as DNS<br/>local-llm.example.com
participant CF as Cloudflare Tunnel<br/>(Optional)
participant Nginx as Nginx Reverse Proxy
participant Ollama
User->>VSCode: Request LLM assistance
VSCode->>DNS: Resolve local-llm.example.com
DNS-->>VSCode: IP Address
alt via Cloudflare Tunnel
VSCode->>CF: HTTPS Request
CF->>Nginx: Forward Request
else via Direct IP
VSCode->>Nginx: Direct HTTPS Request
end
Nginx->>Nginx: Validate API Key
Nginx->>Ollama: Forward Request
Ollama->>Ollama: Process LLM Query
Ollama-->>Nginx: LLM Response
Nginx-->>VSCode: Return Response
VSCode-->>User: Display Result
Continue.dev's VS Code plugin. Easy to connect to local LLMs. Configure it to use your local Ollama instance by adding the following to your config.yaml found in ~/.continue/config.yaml:
name: Local Config
version: 1.0.0
schema: v1
models:
- name: Llama 3.1 8B
provider: ollama
apiBase: ${{ secrets.OLLAMA_API_URL }}
requestOptions:
headers:
Authorization: Bearer ${{ secrets.OLLAMA_API_KEY }}
model: llama3.1:8bThen ensure you have the environment variables set in your system or in a .env file:
OLLAMA_API_URL=https://local-llm.example.com
OLLAMA_API_KEY=your_api_key_here
Cline's VS Code plugin is another option which sometimes works better for agent coding. Auto-discover of models does not work with Ollama behind an API key, so you will need to manually add the model configuration in the plugin settings.
Follow continue.dev's recommendations based on your GPU size: https://docs.continue.dev/customize/models#recommended-models
For small GPU (16-24GB VRAM):
- Qwen3-coder 30b for planning/agent
- Qwen2.5-coder 1.5b for completion
- "Restart" the chat/task often to avoid context bloat.
- Commit your own work before using agent mode!
- Make it easy to revert changes by not having your own uncommitted changes in the repo. Learned this the hard way.
- Use rules/custom prompts (depends on plugin)
- Use MCP servers
- Continue.dev MCP server setup guide: https://docs.continue.dev/customize/deep-dives/mcp
- Cline instructions: https://docs.cline.bot/mcp/configuring-mcp-servers
- Cline MCP marketplace: https://cline.bot/mcp-marketplace
- In agent or planning mode, ask the model to think step by step and come up with a detailed plan/todo list before writing code.
- Agent mode is not going to work super great on a small-ish model
- Explore pre-made prompts:
- Continue.dev prompt library: https://hub.continue.dev/hub
- Try e.g. "React"
- Continue.dev prompt library: https://hub.continue.dev/hub