Skip to content

Conversation

@apinge
Copy link

@apinge apinge commented Feb 21, 2025

I've added a C API for the LLMPipeline class. The purpose of the C API is to enable the use of cgo to build a Go wrapper, which will serve as the backend for Ollama.

Closes #888

@github-actions github-actions bot added category: cmake / build Cmake scripts category: LLM samples GenAI LLM samples category: CPP API Changes in GenAI C++ public headers no-match-files labels Feb 21, 2025
Copy link
Collaborator

@Wovchena Wovchena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering to your question about the naming style, stick to openvino style. But instead of ov_ prefix use ov_genai_.

@apinge apinge marked this pull request as draft February 25, 2025 06:27
@github-actions github-actions bot added the category: GHA CI based on Github actions label Mar 3, 2025
@apinge apinge marked this pull request as ready for review March 3, 2025 07:38
@ilya-lavrenov ilya-lavrenov added this to the 2025.1 milestone Mar 5, 2025
@sammysun0711
Copy link

build_jenkins

@sammysun0711
Copy link

All test passed except ci/jenkins/comment, but it is not related to this PR's change, only indicate that trigger build_jenkins via comment not working as expected.

@ilya-lavrenov, could you please kindly review it, thanks!

from conftest import SAMPLES_PY_DIR, SAMPLES_CPP_DIR, SAMPLES_C_DIR
from test_utils import run_sample

class TestGreedyCausalLM:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add tests for other C samples as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test for benchmark_genai_c, aligned with the corresponding tests in c++ and python samples. I have not found the chat_sample test under the openvino.genai/tests/python_tests/samples folder for c++ or python. I plan to add it later.

@ilya-lavrenov
Copy link
Contributor

build_jenkins

@ilya-lavrenov
Copy link
Contributor

ilya-lavrenov commented Mar 7, 2025

@apinge looks like to fix macOS Node.JS we need to wait for openvinotoolkit/openvino#29320 and then wait for nightly builds.

Could you please disable this job temporary to unblock your PR?
You can add:

if: ${{ false }}

to that job.

const char* inputs,
const ov_genai_generation_config* config,
const stream_callback* streamer,
char* output,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify - how to get to know about required sufficient size of output for successful generation?
In case unsifficient memory, it returns only the first part of generated tokens, how can I get the remained part?

Copy link
Contributor

@ilya-lavrenov ilya-lavrenov Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each token is 2-3 symbols, you can allocate max_new_tokens * num_of_symbols_in_token.

But I agree - maybe it's better to allocate required size inside generate() function and return it to end user? In this case output will not be truncated
In this case, output buffer needs to be freed on app side

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's fix in a separate PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think we can allow ov_genai_llm_pipeline_generate's arg output to be NULL, and in this scenario get the result only depending on the streamer? In this way, the output's size is not a limitation.

Copy link
Contributor

@ilya-lavrenov ilya-lavrenov Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue with ov_genai_decoded_results_get_string - it does not allow to extract full text from decoded results.

but what if users want output w/o streaming? they need a way to get full untruncated output anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created another PR #1871
One aspect is to obtain the required buffer size from ov_genai_decoded_results_get_string. Another is to allow the ov_genai_llm_pipeline_generate interface to have either the output or the streamer as an option.

@ilya-lavrenov ilya-lavrenov merged commit 5636312 into openvinotoolkit:master Mar 7, 2025
44 of 45 checks passed
@ilya-lavrenov
Copy link
Contributor

apinge

Looks like Node.JS is not mandatory for precommit. So, merged as is.

github-merge-queue bot pushed a commit to openvinotoolkit/openvino that referenced this pull request Mar 7, 2025
### Details:
- Required for GenAI JS API as GenAI will depend on C API after
openvinotoolkit/openvino.genai#1778
github-merge-queue bot pushed a commit to openvinotoolkit/openvino that referenced this pull request Mar 7, 2025
### Details:
- Required for GenAI JS API as GenAI will depend on C API after
openvinotoolkit/openvino.genai#1778
AJThePro99 pushed a commit to AJThePro99/openvino that referenced this pull request Mar 9, 2025
### Details:
- Required for GenAI JS API as GenAI will depend on C API after
openvinotoolkit/openvino.genai#1778
ilya-lavrenov added a commit that referenced this pull request Mar 11, 2025
… sufficient size for the output. (#1871)

Based on the discussion in #1778, I have adjusted the LLM pipeline C
APIs to ensure it can determine the required sufficient size for the
output string. `ov_genai_llm_pipeline_generate_decoded_results` has been
removed and `ov_genai_llm_pipeline_generate` has been modified to get
decoded results.
```C
ov_genai_decoded_results* results = NULL;
size_t output_size=0; 
char* output = NULL; // the caller is responsible for allocating and freeing the memory. 
ov_genai_llm_pipeline_generate(pipeline, prompt, config, NULL &results);
ov_genai_decoded_results_get_string(results,NULL,&output_size); // The function is called with NULL as the output to determine the required buffer size.
output = (char*)malloc(output_size);
// check..
ov_genai_decoded_results_get_string(results,output,&output_size); // Get the actual output string
// print and free
 
```
Another change is to allow the `ov_genai_llm_pipeline_generate` to have
either the `results` or `streamer` as an option, but one of them must
not be null. This facilitates users who only need the streamer
functionality, preventing them from allocating excessive unnecessary
memory.
github-merge-queue bot pushed a commit that referenced this pull request Mar 25, 2025
I've added a test for the C API in chat_sample_c, due to the discussion
from #1778
timxu826 pushed a commit to timxu826/openvino that referenced this pull request Apr 7, 2025
### Details:
- Required for GenAI JS API as GenAI will depend on C API after
openvinotoolkit/openvino.genai#1778
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: cmake / build Cmake scripts category: CPP API Changes in GenAI C++ public headers category: GHA CI based on Github actions category: LLM samples GenAI LLM samples no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request for C API Support in openvino.genai

5 participants