Run Ollama and open language models directly on Android devices
What This Is • Architecture • Technical Implementation • Models • Why This Approach
Oalla demonstrates running a complete Go web server inside an Android app process. The result is a mobile app that can run any Ollama-compatible model locally without internet connectivity.
This is completely open source, just like Ollama itself. You can use any models from Ollama's library or Hugging Face that work with the GGUF format.
Get the latest APK from the Releases page
┌─────────────────────────────────────────────────────────────────┐
│ Android App Process │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ HTTP ┌─────────────────────────┐ │
│ │ JavaScript │ ←────────→ │ Go Server │ │
│ │ Chat UI │ localhost │ (Ollama) │ │
│ │ │ :8000-8500 │ │ │
│ └─────────────────┘ (dynamic) └─────────────────────────┘ │
│ │ │ │
│ │ │ │
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Android │ │ JNI Bridge │ │
│ │ WebView │ │ (libbridgeollama.so) │ │
│ │ │ │ │ │
│ └─────────────────┘ └─────────────────────────┘ │
│ │ │ │
│ └────────────────────────────────────┘ │
│ Native Integration │
└─────────────────────────────────────────────────────────────────┘
Key Components:
- JavaScript UI: Rich web-based chat interface running in WebView
- HTTP API: Standard REST endpoints (
/api/chat,/api/models, etc.) - Go Server: Full Ollama server compiled as Android native library
- JNI Bridge: Connects Kotlin/Java Android code with Go server
- Single Process: Everything runs in one Android app process for efficiency
- Dynamic Port: Randomly allocated port (8000-8500) for security
The app loads Ollama's web interface in a WebView while running the actual Ollama server natively in the same process. JavaScript communicates with the Go backend via standard HTTP requests to localhost.
Step-by-step guide to modify the official Ollama repository for Android compatibility. Covers JNI bridge creation, in-process execution, cross-compilation, and the web API endpoints that make this possible.
How the Android app manages the Go server lifecycle, handles JavaScript-native communication, implements security through dynamic ports and authentication, and manages encrypted assets.
Works with any Ollama model or GGUF-format models from Hugging Face:
Tested Ollama Models
| Model | Size | Context | Type |
|---|---|---|---|
tinyllama:latest |
638MB | 2K | Text |
qwen3:0.6b |
523MB | 40K | Text |
smollm2:135m |
135MB | 4K | Text |
gemma3:270m |
292MB | 32k | Text |
Tested Hugging Face Models
| Model | Size | Context | Type |
|---|---|---|---|
hf.co/unsloth/Qwen3-4B-GGUF:Q4_K_M |
1.03GB | 128K | Text |
This architecture proves that mobile devices can run sophisticated AI workloads locally. It maintains full compatibility with Ollama's ecosystem while providing a rich web-based interface that would be difficult to implement natively.
The approach is entirely offline-first and privacy-focused - no data leaves your device, no accounts required, no tracking.
Benefits:
- Easy model installation - just download GGUF files and load them
- Full Ollama API compatibility for seamless integration
- Web-based UI that's simple to customize and extend
Current Limitations:
- Text-only models supported at this time
- Embedding and image models not yet integrated
- No Android GPU acceleration (CPU inference only)
- Performance depends on device capabilities
MIT License, same as Ollama. This project builds upon Ollama's work to bring it to mobile platforms.
