Guide for Ollama integration with LocalAI #132

mann1x · 2025-06-19T15:05:39Z

mann1x
Jun 19, 2025

Since I didn't find much around on how to integrate Ollama, here's my notes.

ollama v0.9.2
lingarr v0.9.7 beta

GPU RTX3090 24GB limited to 220W power

For the setup here my env variables for the docker-compose file:

      - LOCAL_AI_MODEL=gemma3-translator-4b-8k
      - LOCAL_AI_ENDPOINT=http://192.168.0.2:11434/v1/chat/completions
      - SERVICE_TYPE=localai
      - SOURCE_LANGUAGES=[{"name":"English","code":"en"},{"name":"Italian","code":"it"}]
      - TARGET_LANGUAGES=[{"name":"English","code":"en"},{"name":"Italian","code":"it"}]

Adapt your IP address for ollama, model name and your source and target languages.
You need to connect Radarr and Sonarr as well, better via the env variables instead of the UI.

For now, I focused on testing gemma3 models due to the excellent multilingual capabilities.

Unfortunately the 12b version doesn't work well in ollama as of today and it gets stuck sometimes, the translation will randomly fail.
The 27b version has an excellent quality but it doesn't follow the instructions and often outputs an example of a random translation he came up with, not the actual translation.

The only model that works really well is the 4b and the quality is very high despite the low size.
There are 2 versions of gemma3, one is named qat, that's the version you want to get:

https://ollama.com/library/gemma3:4b-it-qat

You can directly use this one, is a Q4_0 quantization that performs almost as well as the q8/fp16 but faster.
It's a lot slower than the non-qat Q4_0 version; 9 minutes vs 5 minutes.
But the tradeoff is very welcome as the quality is very close to the 27b model which takes over 20 minutes and more than 20GB VRAM instead of 6.5GB.

In my example I saved this model with a new name to raise the context to 8192:

/set parameter num_ctx 8192
/save gemma3-translator-4b-8k

It's not necessarily needed as the subtitle translations are by line, just a precaution.
The standard 4K context should be enough.
But gemma3 has an excellent context management so it will not increase the RAM/VRAM usage unless more context is needed.

The system prompt of the ollama model doesn't matter, it's specified in Lingarr.
It's very important and may need to be customized based on the model:

You are a professional translator specializing in literal translation of video content subtitles. You will get one line of the subtitle to translate in the target language.
Please strictly adhere to the following guidelines:

1. User will submit translation requests in the specified format:
「Translate from [SOURCE_LANGUAGE] to [TARGET_LANGUAGE]: [TEXT]」.

2. You are only to handle translation tasks.

3. Your responses must meet the following criteria:
   - Provide only the literal translation of the text. 
   - NEVER add anything else in the answer except the translation.
   - Maintain consistency with the original language.
   - NEVER add any annotations.
   - NEVER provide explanations.
   - NEVER provide examples.
   - NEVER offer interpretations.
   - NEVER perform cultural adaptations.
   - Keep the original formatting.

4. Here is an example:
     User: Translate from English to Italian: Nice to meet you
     Assistant: Piacere di conoscerti

5. Here is another example:
     User: Translate from Italian to English: PER FAVORE, BEVITI UNA LIMONATA
     Assistant: DRINK A LEMONADE, PLEASE

End of the guidelines.

Translate from {sourceLanguage} to {targetLanguage}:

I've added a 2nd example using the uppercase text format otherwise it gets very often lost in translation.
The original had an English to Chinese translation example, I got some Chinese lines in the translations...
I recommend to make the examples using your source and target languages.

Do not enable context prompt, seems to make things worse and much slower.

I tested a bunch of different parameters but found that those in https://ollama.com/zongwei/gemma3-translator:4b works best:

    "temperature": 0,
    "top_k": 64,
    "top_p": 0.1

These are good for this 4b model, not necessarily for others.
Weren't good for the 27b as an example.

You need to set them in the Lingarr UI:

Good luck

rowanfuchs · 2025-06-19T15:28:22Z

rowanfuchs
Jun 19, 2025
Maintainer

This is great, thank you for contributing this. I've pinned it if that's okay for you.

4 replies

mann1x Jun 19, 2025
Author

Sure, thank you!
Do you know if a better integration for ollama is on someone's table?
I don't see it in the backlog or roadmap.
Otherwise, I was thinking to work on it.

rowanfuchs Jun 19, 2025
Maintainer

The Local AI service was meant for Ollama too, what did you miss?

mann1x Jun 19, 2025
Author

Opened a feature request discussion with the scope:

#133

rowanfuchs Jun 19, 2025
Maintainer

Opened a feature request discussion with the scope:

#133

Ahh looks good, yes if you have the time and will, please do help! it's very welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guide for Ollama integration with LocalAI #132

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Guide for Ollama integration with LocalAI #132

Uh oh!

Uh oh!

mann1x Jun 19, 2025

Replies: 1 comment · 4 replies

Uh oh!

rowanfuchs Jun 19, 2025 Maintainer

Uh oh!

mann1x Jun 19, 2025 Author

Uh oh!

Uh oh!

rowanfuchs Jun 19, 2025 Maintainer

Uh oh!

mann1x Jun 19, 2025 Author

Uh oh!

rowanfuchs Jun 19, 2025 Maintainer

mann1x
Jun 19, 2025

Replies: 1 comment 4 replies

rowanfuchs
Jun 19, 2025
Maintainer

mann1x Jun 19, 2025
Author

rowanfuchs Jun 19, 2025
Maintainer

mann1x Jun 19, 2025
Author

rowanfuchs Jun 19, 2025
Maintainer