Skip to content

svaletech/local-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

local-llm

Instructions and tools for running large language models (LLMs) locally on your own hardware.

Server installation instructions are not comprehensive, but should be enough to get started.

Initial setup on server

  • nvidia-smi to check GPU setting
  • Install Ollama: https://docs.ollama.com/linux
  • Set up reverse proxy for Ollama
    • Install nginx
    • Modify and copy one of the nginx ollama configs into /etc/nginx/sites-available/ollama
    • Make symlink from /etc/nginx/sites-enabled/ollama

The architecture

System Overview

graph TD
    User[User] --> VSCode[VS Code with Continue Plugin]
    VSCode --> A[DNS: local-llm.example.com]
    A --> B{Routing}
    B -->|Option 1| C[Cloudflare Tunnel]
    B -->|Option 2| D[Direct IP Address]
    C --> E[Nginx Reverse Proxy]
    D --> E
    E -->|API Key Authentication| F[Ollama]

    style User fill:#f3e5f5
    style VSCode fill:#e8eaf6
    style A fill:#e1f5ff
    style E fill:#fff4e1
    style F fill:#e8f5e9
Loading

Request Flow Sequence

sequenceDiagram
    participant User
    participant VSCode as VS Code<br/>(Continue Plugin)
    participant DNS as DNS<br/>local-llm.example.com
    participant CF as Cloudflare Tunnel<br/>(Optional)
    participant Nginx as Nginx Reverse Proxy
    participant Ollama

    User->>VSCode: Request LLM assistance
    VSCode->>DNS: Resolve local-llm.example.com
    DNS-->>VSCode: IP Address

    alt via Cloudflare Tunnel
        VSCode->>CF: HTTPS Request
        CF->>Nginx: Forward Request
    else via Direct IP
        VSCode->>Nginx: Direct HTTPS Request
    end

    Nginx->>Nginx: Validate API Key
    Nginx->>Ollama: Forward Request
    Ollama->>Ollama: Process LLM Query
    Ollama-->>Nginx: LLM Response
    Nginx-->>VSCode: Return Response
    VSCode-->>User: Display Result
Loading

The plugins

Continue.dev's VS Code plugin. Easy to connect to local LLMs. Configure it to use your local Ollama instance by adding the following to your config.yaml found in ~/.continue/config.yaml:

name: Local Config
version: 1.0.0
schema: v1
models:
  - name: Llama 3.1 8B
    provider: ollama
    apiBase: ${{ secrets.OLLAMA_API_URL }}
    requestOptions:
      headers:
        Authorization: Bearer ${{ secrets.OLLAMA_API_KEY }}
    model: llama3.1:8b

Then ensure you have the environment variables set in your system or in a .env file:

OLLAMA_API_URL=https://local-llm.example.com
OLLAMA_API_KEY=your_api_key_here

Cline's VS Code plugin is another option which sometimes works better for agent coding. Auto-discover of models does not work with Ollama behind an API key, so you will need to manually add the model configuration in the plugin settings.

Recommended models

Follow continue.dev's recommendations based on your GPU size: https://docs.continue.dev/customize/models#recommended-models

For small GPU (16-24GB VRAM):

  • Qwen3-coder 30b for planning/agent
  • Qwen2.5-coder 1.5b for completion

General tips

More information

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published