-
Notifications
You must be signed in to change notification settings - Fork 341
Add C API for LLMPipeline #1778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Wovchena
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering to your question about the naming style, stick to openvino style. But instead of ov_ prefix use ov_genai_.
|
build_jenkins |
|
All test passed except @ilya-lavrenov, could you please kindly review it, thanks! |
| from conftest import SAMPLES_PY_DIR, SAMPLES_CPP_DIR, SAMPLES_C_DIR | ||
| from test_utils import run_sample | ||
|
|
||
| class TestGreedyCausalLM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add tests for other C samples as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test for benchmark_genai_c, aligned with the corresponding tests in c++ and python samples. I have not found the chat_sample test under the openvino.genai/tests/python_tests/samples folder for c++ or python. I plan to add it later.
|
build_jenkins |
|
@apinge looks like to fix macOS Node.JS we need to wait for openvinotoolkit/openvino#29320 and then wait for nightly builds. Could you please disable this job temporary to unblock your PR? to that job. |
| const char* inputs, | ||
| const ov_genai_generation_config* config, | ||
| const stream_callback* streamer, | ||
| char* output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify - how to get to know about required sufficient size of output for successful generation?
In case unsifficient memory, it returns only the first part of generated tokens, how can I get the remained part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each token is 2-3 symbols, you can allocate max_new_tokens * num_of_symbols_in_token.
But I agree - maybe it's better to allocate required size inside generate() function and return it to end user? In this case output will not be truncated
In this case, output buffer needs to be freed on app side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's fix in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We think we can allow ov_genai_llm_pipeline_generate's arg output to be NULL, and in this scenario get the result only depending on the streamer? In this way, the output's size is not a limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue with ov_genai_decoded_results_get_string - it does not allow to extract full text from decoded results.
but what if users want output w/o streaming? they need a way to get full untruncated output anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created another PR #1871
One aspect is to obtain the required buffer size from ov_genai_decoded_results_get_string. Another is to allow the ov_genai_llm_pipeline_generate interface to have either the output or the streamer as an option.
Looks like Node.JS is not mandatory for precommit. So, merged as is. |
### Details: - Required for GenAI JS API as GenAI will depend on C API after openvinotoolkit/openvino.genai#1778
### Details: - Required for GenAI JS API as GenAI will depend on C API after openvinotoolkit/openvino.genai#1778
### Details: - Required for GenAI JS API as GenAI will depend on C API after openvinotoolkit/openvino.genai#1778
… sufficient size for the output. (#1871) Based on the discussion in #1778, I have adjusted the LLM pipeline C APIs to ensure it can determine the required sufficient size for the output string. `ov_genai_llm_pipeline_generate_decoded_results` has been removed and `ov_genai_llm_pipeline_generate` has been modified to get decoded results. ```C ov_genai_decoded_results* results = NULL; size_t output_size=0; char* output = NULL; // the caller is responsible for allocating and freeing the memory. ov_genai_llm_pipeline_generate(pipeline, prompt, config, NULL &results); ov_genai_decoded_results_get_string(results,NULL,&output_size); // The function is called with NULL as the output to determine the required buffer size. output = (char*)malloc(output_size); // check.. ov_genai_decoded_results_get_string(results,output,&output_size); // Get the actual output string // print and free ``` Another change is to allow the `ov_genai_llm_pipeline_generate` to have either the `results` or `streamer` as an option, but one of them must not be null. This facilitates users who only need the streamer functionality, preventing them from allocating excessive unnecessary memory.
I've added a test for the C API in chat_sample_c, due to the discussion from #1778
### Details: - Required for GenAI JS API as GenAI will depend on C API after openvinotoolkit/openvino.genai#1778
I've added a C API for the LLMPipeline class. The purpose of the C API is to enable the use of cgo to build a Go wrapper, which will serve as the backend for Ollama.
Closes #888