Skip to content

Conversation

@kallewoof
Copy link
Contributor

This replaces #10424 and removes the need for deallocation.

@slaren
Copy link
Member

slaren commented Nov 20, 2024

This is still not a good a change. It introduces a breaking API change for no real reason, and in the future if we extend this functionality to support returning the KV in their native types it will force us to keep a copy as string in llama_model. Taking a buffer is the standard way of returning strings in C libraries, and this is not a performance sensitive function either.

@kallewoof
Copy link
Contributor Author

It introduces a breaking API change for no real reason

There is a bug in llama.cpp right now that is caused by this issue, so I disagree about "no real reason". I also wonder how big of an impact it is. Are these functions widely used in a bunch of different projects?

if we extend this functionality to support returning the KV in their native types it will force us to keep a copy as string in llama_model.

I don't see why. As you say yourself these aren't enormous objects. std::stringing on demand seems a-OK to me.

Taking a buffer is the standard way of returning strings in C libraries, and this is not a performance sensitive function either.

I for one have never seen code that requires you to + 1 and - 1 the buffer lengths in the manner required in this code base. Perhaps it's simply a matter of not using strnprintf and using memcpy or something and never null terminate, but that then places the burden of null-termination when necessary on the caller, which is also unusual.

@slaren
Copy link
Member

slaren commented Nov 20, 2024

There are some differences in the way the different functions in the llama.cpp API handle this. llama_token_to_piece does not return NUL-terminated strings, llama_chat_apply_template may return a string without NUL terminator if the buffer is too small, and llama_model_meta_val_str always returns a NUL-terminated string. It would be good to normalize this behavior so that the API always does it in the same way consistently, that may justify a breaking API change.

@ngxson
Copy link
Collaborator

ngxson commented Nov 20, 2024

IMO I still prefer the old way (i.e. user pass the buffer into function). This is pretty much the standard in c (same for sprintf, read,...). Also, all other API calls in llama.cpp work that way, so it's strange to have one single function that doesn't follow this pattern.

Furthermore, if we want to return directly an std::string, then we could also add a whole llama-cpp.h library that directly exposes cpp functions, same idea with ggml-cpp (but it will be reversed, llama.h will be wrapper for llama-cpp.h)

@kallewoof
Copy link
Contributor Author

kallewoof commented Nov 20, 2024

@slaren

It would be good to normalize this behavior so that the API always does it in the same way consistently, that may justify a breaking API change.

Agree. So I think a future tweak to this function might be to follow the llama_chat_apply_template approach, which would "fix" the bug in this case, assuming the caller never assumed NUL term.

@ngxson

Also, all other API calls in llama.cpp work that way, so it's strange to have one single function that doesn't follow this pattern.

Ah, I was not aware of this pattern. Thanks. This probably doesn't make much sense in that case.

@kallewoof kallewoof closed this Nov 20, 2024
@kallewoof kallewoof deleted the 202411-metadata-values-as-charstars branch November 20, 2024 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants