-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Which version of Nextcloud are you using?
32.0.0
Which version of PHP context_chat are you using?
4.5.0
Which version of backend context_chat are you using?
4.5.0
Which browser are you using? In case you are using the phone App, specify the Android or iOS version and device please.
Chrome 128
Nextcloud deployment method?
docker compose
Describe the Bug
decode: cannot decode batches with this context (calling encode() instead)
init: embeddings required but some input tokens were not marked as outputs -> overriding
the context length of the embedding model is 512 which is exceeded by our config and the chunk size of texts we pass in, which is why the error. It would be tricky to fix without re-indexing but our first effort should be to keep the impact of the change minimal in terms of doc search quality.
one solution would be to just reduce the chunk size and the context size config to match the native context size of the model. The quality of the doc search may not change much with the previously indexed docs since the embedding of the query we would use to search them most of the time would be small enough. For the newer indexed docs, it is yet to be seen since 512 is the no. of token, chunk size would be around this, lesser even with non-english languages.
one other solution would be to use rope scaling and try to increase the context length of the model through the config only. It would allow larger context lengths and a better doc search than the above solution. It is, however, yet to be seen how much we can scale it keeping the results good.
ggml-org/llama.cpp#1965
To Reproduce
embed some docs into the vector db and inspect the output of "<persistent_storage>/logs/em_server.log".
PHP logs (Warning these might contain sensitive information)
No response
Ex-App logs (Warning these might contain sensitive information)
No response
Server logs (if applicable)
No response