Replies: 1 comment 4 replies
-
|
This is great, thank you for contributing this. I've pinned it if that's okay for you. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Since I didn't find much around on how to integrate Ollama, here's my notes.
ollama v0.9.2
lingarr v0.9.7 beta
GPU RTX3090 24GB limited to 220W power
For the setup here my env variables for the docker-compose file:
Adapt your IP address for ollama, model name and your source and target languages.
You need to connect Radarr and Sonarr as well, better via the env variables instead of the UI.
For now, I focused on testing gemma3 models due to the excellent multilingual capabilities.
Unfortunately the 12b version doesn't work well in ollama as of today and it gets stuck sometimes, the translation will randomly fail.
The 27b version has an excellent quality but it doesn't follow the instructions and often outputs an example of a random translation he came up with, not the actual translation.
The only model that works really well is the 4b and the quality is very high despite the low size.
There are 2 versions of gemma3, one is named
qat, that's the version you want to get:https://ollama.com/library/gemma3:4b-it-qat
You can directly use this one, is a Q4_0 quantization that performs almost as well as the q8/fp16 but faster.
It's a lot slower than the non-
qatQ4_0 version; 9 minutes vs 5 minutes.But the tradeoff is very welcome as the quality is very close to the 27b model which takes over 20 minutes and more than 20GB VRAM instead of 6.5GB.
In my example I saved this model with a new name to raise the context to 8192:
It's not necessarily needed as the subtitle translations are by line, just a precaution.
The standard 4K context should be enough.
But gemma3 has an excellent context management so it will not increase the RAM/VRAM usage unless more context is needed.
The system prompt of the ollama model doesn't matter, it's specified in Lingarr.
It's very important and may need to be customized based on the model:
I've added a 2nd example using the uppercase text format otherwise it gets very often lost in translation.
The original had an English to Chinese translation example, I got some Chinese lines in the translations...
I recommend to make the examples using your source and target languages.
Do not enable context prompt, seems to make things worse and much slower.
I tested a bunch of different parameters but found that those in https://ollama.com/zongwei/gemma3-translator:4b works best:
These are good for this 4b model, not necessarily for others.
Weren't good for the 27b as an example.
You need to set them in the Lingarr UI:
Good luck
Beta Was this translation helpful? Give feedback.
All reactions